h2o.ai源码解析(2)—启动流程

来源:互联网 发布:luaeditor mac 编辑:程序博客网 时间:2024/06/14 05:03

上一篇中已经给出h2o.ai的整体介绍以及其核心项目h2o.ai/h2o-3的源码目录,本篇给出h2o启动流程的源码分析。启动过程的时序图如下:
这里写图片描述
下面挑选时序图中的关键接口进行详细功能介绍:

1. 【步骤3】 registerCoreExtensions()加载扩展类
这里写图片描述
利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.AbstractH2OExtension文件中定义的服务类:(这些服务类都继承了AbstractH2OExtension)。汇总了下加载了扩展类如下:

Project File Content h2o-core water.AbstractH2OExtension water.FailedNodeWatchdogExtension h2o-ext-krbstandalone water.AbstractH2OExtension hex.security.KerberosExtension h2o-ext-xgboost water.AbstractH2OExtension hex.tree.xgboost.XGBoostExtension h2o-grpc water.AbstractH2OExtension ai.h2o.api.GrpcExtension

*2.【步骤5,6】 startLocalNode()启动当前节点和当前cloud并将当前节点作为当前cloud的唯一成员*

/** Initializes the local node and the local cloud with itself as the only member. */  private static void startLocalNode() {    // Figure self out; this is surprisingly hard    NetworkInit.initializeNetworkSockets();    // Do not forget to put SELF into the static configuration (to simulate    // proper multicast behavior)    if( !ARGS.client && STATIC_H2OS != null && !STATIC_H2OS.contains(SELF)) {      Log.warn("Flatfile configuration does not include self: " + SELF+ " but contains " + STATIC_H2OS);      STATIC_H2OS.add(SELF);    }    ......}

其中调用【步骤6】的 initializeNetworkSockets() 初始化启动一个jettyServer加载Web API(默认为ip:host为localhost:54321)

 public static void initializeNetworkSockets( ) {    // Assign initial ports    H2O.API_PORT = H2O.ARGS.port == 0 ? H2O.ARGS.baseport : H2O.ARGS.port;    // Late instantiation of Jetty object, if needed.    if (H2O.getJetty() == null && !H2O.ARGS.disable_web) {      H2O.setJetty(new JettyHTTPD());    }    // API socket is only used to find opened port on given ip.    ServerSocket apiSocket = null;    // At this point we would like to allocate 2 consecutive ports    while (true) {      H2O.H2O_PORT = H2O.API_PORT + 1;      try {        if (!H2O.ARGS.disable_web) {          apiSocket = H2O.ARGS.web_ip == null // Listen to any interface                      ? new ServerSocket(H2O.API_PORT)                      : new ServerSocket(H2O.API_PORT, -1, getInetAddress(H2O.ARGS.web_ip));          apiSocket.setReuseAddress(true);        }        // Bind to the UDP socket        _udpSocket = DatagramChannel.open();        _udpSocket.socket().setReuseAddress(true);        InetSocketAddress isa = new InetSocketAddress(H2O.SELF_ADDRESS, H2O.H2O_PORT);        _udpSocket.socket().bind(isa);        // Bind to the TCP socket also        _tcpSocket = ServerSocketChannel.open();        _tcpSocket.socket().setReceiveBufferSize(water.AutoBuffer.TCP_BUF_SIZ);        _tcpSocket.socket().bind(isa);        // Warning: There is a ip:port race between socket close and starting Jetty        if (!H2O.ARGS.disable_web) {          apiSocket.close();          H2O.getJetty().start(H2O.ARGS.web_ip, H2O.API_PORT);        }        break;      } catch (Exception e) {        ...      }      // Try next available port to bound      H2O.API_PORT += 2;      ...  }

3. 【步骤8,9】initializePersistence()初始化持久化层,当前支持以下四种持久化存储

Key Description ICE 分布式本地磁盘存储 HDFS 可对接后端的hadoop-hdfs集群 S3 Amazon S3对象存储 NFS 标准文件系统
static void initializePersistence() {    _PM = new PersistManager(ICE_ROOT);}
public PersistManager(URI iceRoot) {    I = new Persist[MAX_BACKENDS];    stats = new PersistStatsEntry[MAX_BACKENDS];    for (int i = 0; i < stats.length; i++) {      stats[i] = new PersistStatsEntry();    }    ...    ...    I[Value.ICE ] = ice;    I[Value.NFS ] = new PersistNFS();    try {      Class klass = Class.forName("water.persist.PersistHdfs");      java.lang.reflect.Constructor constructor = klass.getConstructor();      I[Value.HDFS] = (Persist) constructor.newInstance();      Log.info("HDFS subsystem successfully initialized");    }    catch (Throwable ignore) {      Log.info("HDFS subsystem not available");    }    try {      Class klass = Class.forName("water.persist.PersistS3");      java.lang.reflect.Constructor constructor = klass.getConstructor();      I[Value.S3] = (Persist) constructor.newInstance();      Log.info("S3 subsystem successfully initialized");    } catch (Throwable ignore) {      Log.info("S3 subsystem not available");    }  }

4. 【步骤11】startNetworkServices()初始化网络服务,启动UDPReceiver, TCPReceiver, heartbeat, Cleaner(将K/V store数据落到持久化存储中)等网络服务线程。
这里写图片描述

5. 【步骤14】getAllProviderNames(true)加载数据源解析器
利用ServiceLoader加载当前项目目录中所有/resources/META-INF/目录下water.parser.ParserProvider文件中定义的数据源解析器,支持的数据源类型有以下几种:

Project File Content Source h2o-core water.parser.ParserProvider water.parser.DefaultParserProviders$ArffParserProvider 默认支持的数据格式有ARFF,XSL,CSV,SVMLight(GUESS并不是一种数据格式) water.parser.DefaultParserProviders$XlsParserProvider water.parser.DefaultParserProviders$SVMLightParserProvider water.parser.DefaultParserProviders$CsvParserProvider water.parser.DefaultParserProviders$GuessParserProvider h2o-orc-parser water.parser.ParserProvider water.parser.orc.OrcParserProvider Apache ORC h2o-parquet-parser water.parser.ParserProvider water.parser.parquet.ParquetParserProvider Apache Parquet h2o-avro-parser water.parser.ParserProvider water.parser.avro.AvroParserProvider Apache Avro



【注】:(1)GuessParserProvider并不是解析GUESS格式的源数据,这个解析器的作用是在不知道数据源格式的情况下,根据解析器的优先级依次试着解析源数据
(2)OrcParserProvider默认是不加载的,因为gradle打包时默认不打包h2o-orc-parser模块(参考build.gradle文件中的编译打包处理)
这里写图片描述
在gradle.properties文件中有如下配置:
这里写图片描述

6. 【步骤17】registerResourceRoot()加载WEB静态资源
加载h2o-web/src/main/resources/www和h2o-core/src/main/resources/www目录下的静态WEB资源
这里写图片描述

7.【步骤18】registerRestApiExtensions()注册Rest API资源
(1)利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.api.RestApiExtension文件中定义的REST API注册服务类,这些服务类都继承自AbstractRegister,并重写了registerEndPoints方法注册一系列主要包含httpMethodURI - handlerClass - handlerMethod的RestAPI接口。
这里写图片描述

Project File Content h2o-ext-xgboost water.api.RestApiExtension hex.api.xgboost.RegisterRestApi h2o-core water.api.RestApiExtension water.api.RegisterV3Api
water.api.RegisterV4Api h2o-automl water.api.RestApiExtension water.automl.RegisterRestApi h2o-algos water.api.RestApiExtension hex.api.RegisterAlgos

(2) 利用Java ServiceLoader的原理加载当前项目目录中所有/resources/META-INF/目录下water.api.Schema中所有的Schema实体类(Schema就是所有Rest API接口需要的POJO),不再赘述

8.【步骤20,21】startServingRestApi()启动H2O WEB服务
这里写图片描述