Master HA彻底解密
来源:互联网 发布:毕业论文里的数据作假 编辑:程序博客网 时间:2024/05/22 16:40
Master HA彻底解密
视频学习来源:DT-大数据梦工厂 IMF传奇行动视频(后附王家林老师联系方式)
本期内容:
1、MasterHA解析
2、Master HA的四种方式
3、Master HA的内部工作机制
4、Master HA的源码解析
实际提交程序的时候,提交给作为Leader的Master;
程序在运行前是粗粒度的资源分配模式,一般一个Master挂掉后,并不影响集群的运行。
一、Master HA解析
1、生产环境下一般采用Zookeeper做HA,且建议为三台Master,Zookeeper会自动化管理Masters的切换;
2、采用Zookeeper做HA的时候,Zookeeper会负责保存整个Spark集群运行的时候的元数据:Workers、Drivers、Executor;
3、Zookeeper遇到当前Active级别的Master出现故障的时候会从StandbyMaster中选取出一台作为Active级别的Master,但是需要注意被选举后到成为真正的ActiveMaster之间需要从Zookeeper中获取集群当前运行状态的元数据信息并进行恢复;
4、在Master切换的过程中,所有的已经在运行的程序皆运行正常!因为SparkApplication在运行前已经通过Cluster Manager获取了计算资源,所以在运行时Job本身的调度和处理和Master是没有任何关系的!
5、在Master切换过程中唯一的影响是不能提交新的Job:一方面是不能够提交新的应用程序给集群,因为只有Active Master才能接受新的程序的提交请求;另外一方面,已经运行的程序中也不能够因为因为Active操作触发新的Job的提交请求;
二、Master HA的四大方式
1、Master HA的四种方式分别是:Zookeeper、FILESYSTEM(对实时性、延迟性要求没有那么高)、CUSTOM、NONE;
2、需要说明的是:
a)Zookeeper是自动管理Master;
b)FILESYSTEM的方式在Master出现故障后需要手动重新启动机器,机器启动后会立即成为Active级别的Master来对外提供服务(接收应用程序提交的请求、接收新的Job运行的请求);
c)CUSTOM等的方式运行用户自定义MasterHA的实现,这对于高级用户特别有用;
d)None,这是默认情况,当我们下载安装了Spark集群中就是采用
val (persistenceEngine_, leaderElectionAgent_) = RECOVERY_MODE match { case "ZOOKEEPER" => logInfo("Persisting recovery state to ZooKeeper") val zkFactory = new ZooKeeperRecoveryModeFactory(conf, serializer) (zkFactory.createPersistenceEngine(), zkFactory.createLeaderElectionAgent(this)) case "FILESYSTEM" => val fsFactory = new FileSystemRecoveryModeFactory(conf, serializer) (fsFactory.createPersistenceEngine(), fsFactory.createLeaderElectionAgent(this)) case "CUSTOM" => val clazz = Utils.classForName(conf.get("spark.deploy.recoveryMode.factory")) val factory = clazz.getConstructor(classOf[SparkConf], classOf[Serializer]) .newInstance(conf, serializer) .asInstanceOf[StandaloneRecoveryModeFactory] (factory.createPersistenceEngine(), factory.createLeaderElectionAgent(this)) case _ => (new BlackHolePersistenceEngine(), new MonarchyLeaderAgent(this)) } persistenceEngine = persistenceEngine_ leaderElectionAgent = leaderElectionAgent_ }
4、PersistenceEngine中有一个至关重要的方法persist来实现数据持久化,readPersistData来恢复获取;
/** * Returns the persisted data sorted by their respective ids (which implies that they're * sorted by time of creation). */ final def readPersistedData( rpcEnv: RpcEnv): (Seq[ApplicationInfo], Seq[DriverInfo], Seq[WorkerInfo]) = { rpcEnv.deserialize { () => (read[ApplicationInfo]("app_"), read[DriverInfo]("driver_"), read[WorkerInfo]("worker_")) } } /** * Defines how the object is serialized and persisted. Implementation will * depend on the store used. */ def persist(name: String, obj: Object) /** * Defines how the object referred by its name is removed from the store. */ def unpersist(name: String)5、FILESYSTEM和NONE的方式均采用MonarchyElectionAgent的方式来完成Leader选举
private def completeRecovery() { // Ensure "only-once" recovery semantics using a short synchronization period. if (state != RecoveryState.RECOVERING) { return } state = RecoveryState.COMPLETING_RECOVERY // Kill off any workers and apps that didn't respond to us. workers.filter(_.state == WorkerState.UNKNOWN).foreach(removeWorker) apps.filter(_.state == ApplicationState.UNKNOWN).foreach(finishApplication) // Reschedule drivers which were not claimed by any workers drivers.filter(_.worker.isEmpty).foreach { d => logWarning(s"Driver ${d.id} was not found after master recovery") if (d.desc.supervise) { logWarning(s"Re-launching ${d.id}") relaunchDriver(d) } else { removeDriver(d.id, DriverState.ERROR, None) logWarning(s"Did not re-launch ${d.id} because it was not supervised") } }
DT大数据梦工厂
新浪微博:www.weibo.com/ilovepains/
微信公众号:DT_Spark
博客:http://.blog.sina.com.cn/ilovepains
TEL:18610086859
Email:18610086859@vip.126.com
- Master HA彻底解密
- spark master ha彻底解密
- 第29课 Master HA彻底解密
- Master HA解密
- 大数据IMF传奇行动绝密课程第29课:Master HA彻底解密
- Master HA
- master ha 安装过程
- 浅谈Master的HA
- Spark Master HA思想
- RDD创建彻底解密
- cacheManager彻底解密
- checkpoint彻底解密
- Spark EXecutor彻底解密
- spark core源码分析3 Master HA
- spark core源码分析3 Master HA
- Spark集群master节点实现HA配置
- ActiveMQ与HA架构(master/slave)
- activemq的HA架构(master/slave)
- 欧拉角、四元数和旋转矩阵
- 只用两变量实现两者数据的交换
- 手势
- [CQOI2015]任务查询系统 (可持久化treap)
- RMQ算法
- Master HA彻底解密
- jQuery6(获取元素练习,改变标签样式及内容)
- 使用pycharm时候,借用Anaconda3的编译器
- 面试笔试杂项积累-leetcode 116-120
- Android 获取各种时间组合
- cf#AIM Tech Round -C. Graph and String-贪心/ 二分图染色
- JavaScript菜鸟教程Object2
- java中递归方法的应用--计算n的阶乘
- Intent的两种跳转方式