jstorm源码分析:任务领取
来源:互联网 发布:分身软件 编辑:程序博客网 时间:2024/04/28 23:18
任务领取
每个jstorm的工作机器会定时的扫描zookeeper的任务分配的目录,看是否有自己的任务,如果有,那么把对应的信息写到本地机器的制定目录中,这个工作主要是有SyncSupervisorEvent 线程中的run方法来完成的,我们主要来分析这个函数, 在这个之前,我们来看下这个类的成员,便于后面的分析
//标示supervisor的唯一id,因为一台机器上只有一个supervisor,所以也用这个来判断机器 private String supervisorId; private EventManager processEventManager; private EventManager syncSupEventManager; //storm集群的状态信息(zk上信息操作接口) private StormClusterState stormClusterState; //本地信息接口 private LocalState localState;
run方法:
@Override public void run() { LOG.debug("Synchronizing supervisor, interval seconds:" + TimeUtils.time_delta(lastTime)); lastTime = TimeUtils.current_time_secs(); try { RunnableCallback syncCallback = new EventManagerZkPusher(this, syncSupEventManager); /** * Step 1: get all assignments and register /ZK-dir/assignment and every assignment watch * */ //通过zk目录获取集群中所有的任务 toplogy_id --> assignment Map<String, Assignment> assignments = Cluster.get_all_assignment(stormClusterState, syncCallback); LOG.debug("Get all assignments " + assignments); /** * Step 2: get topologyIds list from STORM-LOCAL-DIR/supervisor/stormdist/ */ //通过本地的目录信息,获取所有在本机上的任务 List<String> downloadedTopologyIds = StormConfig.get_supervisor_toplogy_list(conf); LOG.debug("Downloaded storm ids: " + downloadedTopologyIds); /** * Step 3: get <port,LocalAssignments> from ZK local node's assignment */ //通过zk信息,获取所有分配到本机的所有工作进程(通过遍历所有任务下的所有工作进程,看他的nodeid是不是等于supervisorId) Map<Integer, LocalAssignment> zkAssignment = getLocalAssign(stormClusterState, supervisorId, assignments); Map<Integer, LocalAssignment> localAssignment; Set<String> updateTopologys; /** * Step 4: writer local assignment to LocalState */ try { LOG.debug("Writing local assignment " + zkAssignment); localAssignment = (Map<Integer, LocalAssignment>) localState.get(Common.LS_LOCAL_ASSIGNMENTS); if (localAssignment == null) { localAssignment = new HashMap<Integer, LocalAssignment>(); } //更新状态 localState.put(Common.LS_LOCAL_ASSIGNMENTS, zkAssignment); //比较新老状态,获取需要更新的任务(根据任务时间戳判断) updateTopologys = getUpdateTopologys(localAssignment, zkAssignment, assignments); Set<String> reDownloadTopologys = getNeedReDownloadTopologys(localAssignment); //需要重新下载的也放到更新中去 if (reDownloadTopologys != null) { updateTopologys.addAll(reDownloadTopologys); } } catch (IOException e) { LOG.error("put LS_LOCAL_ASSIGNMENTS " + zkAssignment + " of localState failed"); throw e; } /** * Step 5: download code from ZK */ Map<String, String> topologyCodes = getTopologyCodeLocations(assignments, supervisorId); // downloadFailedTopologyIds which can't finished download binary from nimbus Set<String> downloadFailedTopologyIds = new HashSet<String>(); downloadTopology(topologyCodes, downloadedTopologyIds, updateTopologys, assignments, downloadFailedTopologyIds); /** * Step 6: remove any downloaded useless topology */ //删除无用的toplogy(本地路径中还有信息,但是代码路径中没有了) removeUselessTopology(topologyCodes, downloadedTopologyIds); /** * Step 7: push syncProcesses Event */ // processEventManager.add(syncProcesses); syncProcesses.run(zkAssignment, downloadFailedTopologyIds); // If everything is OK, set the trigger to update heartbeat of // supervisor heartbeat.updateHbTrigger(true); } catch (Exception e) { LOG.error("Failed to Sync Supervisor", e); // throw new RuntimeException(e); } }
从zookeeper获取所有的任务
先看代码
public static Map<String, Assignment> get_all_assignment(StormClusterState stormClusterState, RunnableCallback callback) throws Exception { Map<String, Assignment> ret = new HashMap<String, Assignment>(); // get /assignments {topology_id} //获取zookeeper assignments目录下所有任务 List<String> assignments = stormClusterState.assignments(callback); if (assignments == null) { LOG.debug("No assignment of ZK"); return ret; } //对于每个任务,获取任务的详细信息 for (String topology_id : assignments) { Assignment assignment = stormClusterState.assignment_info(topology_id, callback); if (assignment == null) { LOG.error("Failed to get Assignment of " + topology_id + " from ZK"); continue; } ret.put(topology_id, assignment); } return ret; }
第一步是根据扫秒zk上的任务目录,得到所有任务的名称。具体实现看下面的代码:
@Override public List<String> assignments(RunnableCallback callback) throws Exception { if (callback != null) { assignments_callback.set(callback); } return cluster_state.get_children(Cluster.ASSIGNMENTS_SUBTREE, callback != null); }``` @Override public List<String> get_children(String path, boolean watch) throws Exception { return zkobj.getChildren(zk, path, watch); } public List<String> getChildren(CuratorFramework zk, String path, boolean watch) throws Exception { String npath = PathUtils.normalize_path(path); if (watch) { return zk.getChildren().watched().forPath(npath); } else { return zk.getChildren().forPath(npath); } }<div class="se-preview-section-delimiter"></div>
第二部是根据任务的名称获取任务的详情
“`
@Override
public Assignment assignment_info(String topologyId, RunnableCallback callback) throws Exception {
if (callback != null) {
assignment_info_callback.put(topologyId, callback);
}
String assgnmentPath = Cluster.assignment_path(topologyId); return (Assignment) getObject(assgnmentPath, callback != null);}
“`
根据任务的名称获取任务信息的路径,然后获取任务的信息,最后进行反序列化成Assignment信息
另外这里传入了一个回调的参数:syncCallback ,他是在zk的任务目发生改变的时候进行回调的,具体还需要好好看下(TODO)
获取本地任务信息
通过机器本地的文件,得到本地所有任务信息
@SuppressWarnings("rawtypes") public static List<String> get_supervisor_toplogy_list(Map conf) throws IOException { // get the path: STORM-LOCAL-DIR/supervisor/stormdist/ String path = StormConfig.supervisor_stormdist_root(conf); List<String> topologyids = PathUtils.read_dir_contents(path); return topologyids; }<div class="se-preview-section-delimiter"></div>
本地任务的路径是: Config.STORM_LOCAL_DIR)) + FILE_SEPERATEOR + “supervisor”
然后读取这个目录下所有子目录的名称
public static List<String> read_dir_contents(String dir) { ArrayList<String> rtn = new ArrayList<String>(); if (exists_file(dir)) { File[] list = (new File(dir)).listFiles(); for (File f : list) { rtn.add(f.getName()); } } return rtn; }<div class="se-preview-section-delimiter"></div>
获取分配到本机的所有work
private Map<Integer, LocalAssignment> getLocalAssign(StormClusterState stormClusterState, String supervisorId, Map<String, Assignment> assignments) throws Exception { Map<Integer, LocalAssignment> portLA = new HashMap<Integer, LocalAssignment>(); //遍历所有的任务 for (Entry<String, Assignment> assignEntry : assignments.entrySet()) { String topologyId = assignEntry.getKey(); Assignment assignment = assignEntry.getValue(); //遍历一个任务下的所有worker, 看他是否是在本机(worker->NondeId == supervisorId) Map<Integer, LocalAssignment> portTasks = readMyTasks(stormClusterState, topologyId, supervisorId, assignment); if (portTasks == null) { continue; } // a port must be assigned one storm for (Entry<Integer, LocalAssignment> entry : portTasks.entrySet()) { Integer port = entry.getKey(); LocalAssignment la = entry.getValue(); if (!portLA.containsKey(port)) { portLA.put(port, la); } else { throw new RuntimeException("Should not have multiple topologys assigned to one port"); } } } return portLA; }<div class="se-preview-section-delimiter"></div>
遍历第一步中获取的所有zk上的任务(整个集群任务), 看每个任务下的所有work是否在本地的(通过work的nodeid和supervisor比较是否一致),最终得到所有分配到这台机器上的work
更新本地的work信息
try { LOG.debug("Writing local assignment " + zkAssignment); localAssignment = (Map<Integer, LocalAssignment>) localState.get(Common.LS_LOCAL_ASSIGNMENTS); if (localAssignment == null) { localAssignment = new HashMap<Integer, LocalAssignment>(); } //更新状态 localState.put(Common.LS_LOCAL_ASSIGNMENTS, zkAssignment); //比较新老状态,获取需要更新的任务(根据任务时间戳判断) updateTopologys = getUpdateTopologys(localAssignment, zkAssignment, assignments); Set<String> reDownloadTopologys = getNeedReDownloadTopologys(localAssignment); //需要重新下载的也放到更新中去 if (reDownloadTopologys != null) { updateTopologys.addAll(reDownloadTopologys); } } catch (IOException e) { LOG.error("put LS_LOCAL_ASSIGNMENTS " + zkAssignment + " of localState failed"); throw e; }<div class="se-preview-section-delimiter"></div>
这里主要干三件事情:
一 更新本地的work信息
二 通过对比,得到需要更新的任务
三 通过对比,得到需要重新下载的任务
其中二和三任务都是需要更新对应的拓扑的
如何判断任务更新了呢?
private Set<String> getUpdateTopologys(Map<Integer, LocalAssignment> localAssignments, Map<Integer, LocalAssignment> zkAssignments, Map<String, Assignment> assignments) { Set<String> ret = new HashSet<String>(); if (localAssignments != null && zkAssignments != null) { for (Entry<Integer, LocalAssignment> entry : localAssignments.entrySet()) { Integer port = entry.getKey(); LocalAssignment localAssignment = entry.getValue(); LocalAssignment zkAssignment = zkAssignments.get(port); if (localAssignment == null || zkAssignment == null) continue; Assignment assignment = assignments.get(localAssignment.getTopologyId()); if (localAssignment.getTopologyId().equals(zkAssignment.getTopologyId()) && assignment != null && assignment.isTopologyChange(localAssignment.getTimeStamp())) if (ret.add(localAssignment.getTopologyId())) { LOG.info("Topology-" + localAssignment.getTopologyId() + " has been updated. LocalTs=" + localAssignment.getTimeStamp() + ", ZkTs=" + zkAssignment.getTimeStamp()); } } } return ret; }<div class="se-preview-section-delimiter"></div>
从代码来看,首先任务是更新类型或者是扩容类型,同时本地任务更新时间早于zk上任务更新时间
同样,如果获取需要下载的任务呢?
private Set<String> getNeedReDownloadTopologys(Map<Integer, LocalAssignment> localAssignment) { Set<String> reDownloadTopologys = syncProcesses.getTopologyIdNeedDownload().getAndSet(null); if (reDownloadTopologys == null || reDownloadTopologys.size() == 0) return null; Set<String> needRemoveTopologys = new HashSet<String>(); Map<Integer, String> portToStartWorkerId = syncProcesses.getPortToWorkerId(); for (Entry<Integer, LocalAssignment> entry : localAssignment.entrySet()) { if (portToStartWorkerId.containsKey(entry.getKey())) needRemoveTopologys.add(entry.getValue().getTopologyId()); } LOG.debug("worker is starting on these topology, so delay download topology binary: " + needRemoveTopologys); reDownloadTopologys.removeAll(needRemoveTopologys); if (reDownloadTopologys.size() > 0) LOG.info("Following topologys is going to re-download the jars, " + reDownloadTopologys); return reDownloadTopologys; }<div class="se-preview-section-delimiter"></div>
需要下载的所有任务,排除掉本地已经在启动的任务,剩下的还是需要重新下载
代码下载
Map<String, String> topologyCodes = getTopologyCodeLocations(assignments, supervisorId); // downloadFailedTopologyIds which can't finished download binary from nimbus Set<String> downloadFailedTopologyIds = new HashSet<String>(); downloadTopology(topologyCodes, downloadedTopologyIds, updateTopologys, assignments, downloadFailedTopologyIds);<div class="se-preview-section-delimiter"></div>
第一步是获取有work分配到当前机器的任务
public static Map<String, String> getTopologyCodeLocations(Map<String, Assignment> assignments, String supervisorId) throws Exception { Map<String, String> rtn = new HashMap<String, String>(); for (Entry<String, Assignment> entry : assignments.entrySet()) { String topologyid = entry.getKey(); Assignment assignmenInfo = entry.getValue(); Set<ResourceWorkerSlot> workers = assignmenInfo.getWorkers(); for (ResourceWorkerSlot worker : workers) { String node = worker.getNodeId(); if (supervisorId.equals(node)) { rtn.put(topologyid, assignmenInfo.getMasterCodeDir()); break; } } } return rtn; }<div class="se-preview-section-delimiter"></div>
过程还是类似的,对所有的任务,看他是否有work在当前机器,如果有那么就放到结果中。
第二部分就是下载
public void downloadTopology(Map<String, String> topologyCodes, List<String> downloadedTopologyIds, Set<String> updateTopologys, Map<String, Assignment> assignments, Set<String> downloadFailedTopologyIds) throws Exception { Set<String> downloadTopologys = new HashSet<String>(); //对所有任务进行处理 for (Entry<String, String> entry : topologyCodes.entrySet()) { String topologyId = entry.getKey(); String masterCodeDir = entry.getValue(); //没有下载过 或者 需要更新 if (!downloadedTopologyIds.contains(topologyId) || updateTopologys.contains(topologyId)) { LOG.info("Downloading code for storm id " + topologyId + " from " + masterCodeDir); int retry = 0; while (retry < 3) { try { downloadStormCode(conf, topologyId, masterCodeDir); // Update assignment timeStamp StormConfig.write_supervisor_topology_timestamp(conf, topologyId, assignments.get(topologyId).getTimeStamp()); break; } catch (IOException e) { LOG.error(e + " downloadStormCode failed " + "topologyId:" + topologyId + "masterCodeDir:" + masterCodeDir); } catch (TException e) { LOG.error(e + " downloadStormCode failed " + "topologyId:" + topologyId + "masterCodeDir:" + masterCodeDir); } retry++; } if (retry < 3) { LOG.info("Finished downloading code for storm id " + topologyId + " from " + masterCodeDir); downloadTopologys.add(topologyId); } else { LOG.error("Cann't download code for storm id " + topologyId + " from " + masterCodeDir); downloadFailedTopologyIds.add(topologyId); } } } // clear directory of topologyId is dangerous , so it only clear the topologyId which // isn't contained by downloadedTopologyIds for (String topologyId : downloadFailedTopologyIds) { if (!downloadedTopologyIds.contains(topologyId)) { try { String stormroot = StormConfig.supervisor_stormdist_root(conf, topologyId); File destDir = new File(stormroot); FileUtils.deleteQuietly(destDir); } catch (Exception e) { LOG.error("Cann't clear directory about storm id " + topologyId + " on supervisor "); } } } updateTaskCleanupTimeout(downloadTopologys); }<div class="se-preview-section-delimiter"></div>
从代码来看,我们需要下载的有两种任务: 一是还没有下载过的,二是需要更新的(上面计算得到的)。 真正代码下载就是从zk上下载并写入到本地的文件中,并把任务的时间戳写入到本地文件中。下载成功,那么写入到downloadTopologys, 如果失败,同样进行记录,写到downloadFailedTopologyIds中去。
对于下载失败的,并且不在已经下载中的任务,删除本地的信息。
最后更新所有下载任务超时删除时间: 一个任务超时删除时间首先看任务是否配置,如果任务没有配置,那么就系统统一配置,最后更新到localstatus中
删除无用的拓扑
public void removeUselessTopology(Map<String, String> topologyCodes, List<String> downloadedTopologyIds) { for (String topologyId : downloadedTopologyIds) { if (!topologyCodes.containsKey(topologyId)) { LOG.info("Removing code for storm id " + topologyId); String path = null; try { path = StormConfig.supervisor_stormdist_root(conf, topologyId); PathUtils.rmr(path); } catch (IOException e) { String errMsg = "rmr the path:" + path + "failed\n"; LOG.error(errMsg, e); } } } }
如果一个任务在本地下载的信息中存在,但是在zk上代码路径信息中不存在,那么就认为任务已经无效了,从本地信息中进行删除(删除信息目录)
- jstorm源码分析:任务领取
- jstorm源码分析:提交任务过程
- JStorm与Storm源码分析(二)--任务分配,assignment
- JStorm与Storm源码分析(二)--任务分配,assignmen
- jstorm源码分析
- jstorm源码分析: nimbus
- jstorm 源码分析汇总
- jstorm 源码分析: supervisor
- jstorm源码分析: zookeeper
- jstorm源码分析:work管理
- jstorm任务调度总结
- Jstorm源码分析--kill、rebanlance、activate、deactivate方法流程
- JStorm-2.1.1源码分析--Topology提交(中)
- JStorm-2.1.1源码分析--Topology提交(上)
- JStorm-2.1.1源码分析--Topology提交(下)
- JStorm与Storm源码分析(一)--nimbus-data
- JStorm与Storm源码分析(三)--Scheduler,调度器
- JStorm与Storm源码分析(一)--nimbus-data
- Gerrit 服务器搭建
- Android操作db的坑
- 算法之希尔排序
- 第四次课总结和思考
- 共用体和结构体
- jstorm源码分析:任务领取
- 欢迎使用CSDN-markdown编辑器
- 什么叫做裸设备
- php中JSON的使用与转换
- 访问共享变量
- 6. Zend Studio
- Spring3 MVC请求参数获取的几种方法
- thinkphp学习笔记之实例化类
- java教程、java学习:三十二道Java程序要经典面试题