Husky中文文档-部署

来源：互联网发布：淘宝上自营店是正品吗编辑：程序博客网时间：2024/05/10 00:43

部署

依赖

Husky依赖以下软件包：

CMake
ZeroMQ (libzmq and cppzmq)
Boost
一种C++编译器(clang/gcc/icc/MSVC)
TCMalloc
PSSH

部分可选择的依赖：

Hadoop
libhdfs3
HBase
Kafka
MongoDB

Husky已经实现了与HDFS，MongoDB，HBase，Kafka的对接，并在添加对其他系统的支持。本文以Linux操作系统为例，展示如何部署Husky。

构建

获取Husky最新代码压缩包并解压。假设解压出来的目录为Husky的根目录，定义为变量HUSKY_ROOT，将目录变换到$HUSKY_ROOT下，利用 CMake来编译源码：

在$HUSKY_ROOT下面创建并进入release目录
```
mkdir releasecd release
```
使用CMake进行编译
```
cmake -DCMAKE_BUILD_TYPE=Release ..make help           # 列出可编译目标make -j8 Master     # 编译Master
```
Master 可执行文件用于调度Husky应用程序的运行。每启动一个Husky应用程序，我们需要启动Master程序。
编译Husky应用程序。例如，我们可以编译examples下的PageRankWorkflow。PageRankWorkflow计算一个图的各个顶点的PageRank值，然后运行TopK和kNN分析PageRank的结果。
```
make PageRankWorkflow
```
用户亦可于examples/目录下开发自己的应用, 并于examples/CMakeLists.txt文件给该应用添加依赖。例如，假设examples下添加了pi.cpp应用来计算pi值，用户可以添加以下几行于examples/CMakeLists.txt：
```
# PIadd_executable(PI pi.cpp)target_link_libraries(PI ${husky})target_link_libraries(PI ${EXTERNAL_LIB})set_property(TARGET PI PROPERTY CXX_STANDARD 14)
```
之后，用户可以使用make PI来编译改应用

配置

配置文件

用户可使用 python scripts/gen_config.py 快速产生一个配置文件。Husky亦可运行于单机之上，仅需使用该机器的主机名同时作为master和worker.

分布式环境下，需要根据实际的配置复制和修改 $HUSKY_ROOT/exec.sh。首先，我们需要在 $HUSKY_ROOT/ 下创建 conf/ 文件夹，例如使用以下例子产生一个单机的配置文件。

python scripts/gen_config.pyThis script helps you generate a Husky config fileplease input the hostname of the master node:master    # 输入master的主机名please input the hdfs namenode:master                  # 输入HDFS的namenodeplease input the hdfs namenode port:9000               # 输入namenode的端口号Do you have a file that lists all your worker nodes(one hostname per line)? (y/n):nPlease input the number of worker nodes:1              # 仅使用一台机器Please input the hostname of worker 1:master           # 改机器亦作为workerPlease input the number of Husky threads you wish to have on each machine:2      # 使用两条线程Please provide the Husky root directory:/path/to/HUSKY_ROOT                      # 输入$HUSKY_ROOT的绝对路径Please enter a prefix of the config files. You will have <prefix>.conf, <prefix>-socket.txt, and generated in <husky-root>/conf:test                                                           # 给配置文件起名字Done!

至此，我们可以产生一个 conf/test.conf 文件:

master_host:mastermaster_port:14366                # 随机输入一个master端口comm_port:14162                  # 随机输入一个通讯端口hdfs_namenode:masterhdfs_namenode_port:9000socket_file:test-socket.txt      # test-socket.txt 是刚产生的socket file# list your own parameter here:input:/tmp/toy                   # 可输入自定义参数

此处是产生的 conf/test-socket.txt 文件：

master:2     # 使用一个worker，每个worker使用2条线程

类似，若我们使用worker1，worker2两个worker，每个使用4条线程，可以如下编辑socket文件：

worker1:4worker2:4

指定参数

我们可以在配置文件指定更多的参数，例如：

num_iters:10

在Husky应用程序里面可获取对应的参数值：

Husky::Context::get_params("num_iters")

运行一个Husky应用程序

首先启动Master

./Master /path/to/your/conf

单机运行

若socket文件仅指定了一台机器，以下命令开启一个单机（可多线程）程序

./<executable> /path/to/your/conf

分布式运行

我们使用 pssh 来分布式运行一个应用程序。pssh 需要指定运行该程序的机器。假设我们需要使用worker1和worker2，每个使用4条线程。

创建一个 machine.cfg.2 文件，添加全部的机器主机名
```
worker1worker2
```
编辑socket文件：
```
worker1:4worker2:4
```

编辑 exec.sh.

MACHINE_CFG=machine.cfg.2time pssh -t 0 -P -h ${MACHINE_CFG} -x "-t -t" "ulimit -c unlimited && cd $HUSKY_ROOT && ./$1 $2"

复制应用的可执行文件和 conf/ 文件夹到每台机器（worker1和worker2）的 /tmp 目录下。
使用如下命令运行一个分布式程序：
```
./exec.sh <executable> /path/to/your/conf
```

开发文档

使用以下命令产生开发文档：

doxygen doxygen.config

进入 html/ 目录获取HTML文档，或者进入 latex/ 目录获取LaTeX文档。

4 0