spark中spark-submit内部实现原理.docVIP

  • 0
  • 0
  • 约8.67千字
  • 约 5页
  • 2020-04-04 发布于江苏
  • 举报
这两天在看spark-submit之后任务是怎么启动的,写篇文章记录一下自己的理解思路: SparkSubmit?SparkDeploySchedulerBackend?AppClient?tryRegisterAllMaster 1:客户端启动,初始化相关的环境变量,包括Application代码的提交 2:向Master注册Driver 这里需要注意,Master,Worker是已经启动,换句话说,我们的spark集群已经启动。 3: SparkDeploySchedulerBackend中启动Application。向Master注册Application def registerWithMaster() { tryRegisterAllMasters() import context.dispatcher var retries = 0 registrationRetryTimer = Some { context.system.scheduler.schedule(REGISTRATION_TIMEOUT, REGISTRATION_TIMEOUT) { Utils.tryOrExit { retries += 1 if (registered) { registrationRetryTimer.foreach(_.cancel()) } else if (retries = REGISTRATION_RETRIES) { markDead(All masters are unresponsive! Giving up.) } else { tryRegisterAllMasters() } } } } } def tryRegisterAllMasters() { for (masterAkkaUrl - masterAkkaUrls) { logInfo(Connecting to master + masterAkkaUrl + ...) val actor = context.actorSelection(masterAkkaUrl) actor ! RegisterApplication(appDescription) } } 主要是这两段代码:SparkDeploySchedulerBackend中有Application的start函数,进入到APPClient中之后,首先向master注册Application,就是上面代码中的tryRegisterAllMasters(),这里会向Master发送RegisterApplication(appDescription)。在Master端通过AKKA收到消息后处理该消息,处理代码如下: case RegisterApplication(description) = { if (state == RecoveryState.STANDBY) { // ignore, dont send response } else { logInfo(Registering app + ) val app = createApplication(description, sender) registerApplication(app) logInfo(Registered app + + with ID + app.id) persistenceEngine.addApplication(app) sender ! RegisteredApplication(app.id, masterUrl) schedule() } } 进行注册Application,并且发送已经注册App的消息。然后进行schedule()。 Schedule函数如下: /** * Schedule the currently available resources among waiting apps. This method will be called * every

文档评论(0)

1亿VIP精品文档

相关文档