Hibench使用

来源:互联网 发布:淘宝订单险加入条件 编辑:程序博客网 时间:2024/05/25 16:39

Hibench是一个大数据 benchmark 套件,用来测试各种大数据框架的速度,吞吐量,系统资源利用率。
它支持的框架有:hadoopbench、sparkbench、stormbench、flinkbench、gearpumpbench。

参考网址:

https://github.com/intel-hadoop/HiBenchhttps://github.com/intel-hadoop/HiBench/blob/master/docs/build-hibench.mdhttps://github.com/intel-hadoop/HiBench/blob/master/docs/run-hadoopbench.md

问题: 测试无法生成hibench.report
因为测试的节点上没有安装bc工具

问题: org.apache.Hadoop.dfs.SafeModeException: Cannot delete xxxxxx. Name node is in safe mode
在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全模式结束。安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。
现在就清楚了,那现在要解决这个问题,我想让Hadoop不处在safe mode 模式下,能不能不用等,直接解决呢?
答案是可以的,只要在Hadoop的目录下输入:

bin/hadoop dfsadmin -safemode leave 

也就是关闭Hadoop的安全模式,这样问题就解决了。

下载

$ git clone https://github.com/intel-hadoop/HiBench.git

编译

首先,要安装maven,其次,要有网络

编译所有框架和模块

mvn clean package

编译特定框架

mvn -Phadoopbench -Psparkbench  clean package

编译单个模块

mvn -Phadoopbench -Dmodules -Psql -Dspark=2.1 -Dscala=2.11 clean package

支持的模块有:micro, ml(machine learning), sql, websearch, graph, streaming, structuredStreaming(spark 2.0 or 2.1).

编译Structured Streaming
默认不会编译Structured Streaming,只支持Spark 2.0 and Spark 2.1

mvn -Psparkbench -Dmodules -PstructuredStreaming clean package

运行hadoopbench

前提
运行测试的节点上要求
+ Python 2.x(>=2.6) is required.

  • bc is required to generate the HiBench report.

  • Supported Hadoop version: Apache Hadoop 2.x, CDH5.x, HDP

  • Build HiBench according to build HiBench.

  • Start HDFS, Yarn in the cluster.

配置

创建和编辑conf/hadoop.conf

cp conf/hadoop.conf.template conf/hadoop.conf
Property                           Meaninghibench.hadoop.home                The Hadoop installation locationhibench.hadoop.executable          The path of hadoop executable. For Apache Hadoop, it is /YOUR/HADOOP/HOME/bin/hadoophibench.hadoop.configure.dir       Hadoop configuration directory. For Apache Hadoop, it is /YOUR/HADOOP/HOME/etc/hadoophibench.hdfs.master                The root HDFS path to store HiBench data, i.e. hdfs://localhost:8020/user/usernamehibench.hadoop.release             Hadoop release provider. Supported value: apache, cdh5, hdp

我的配置是

# Hadoop homehibench.hadoop.home     /usr/local/hadoop# The path of hadoop executablehibench.hadoop.executable     ${hibench.hadoop.home}/bin/hadoop# Hadoop configraution directoryhibench.hadoop.configure.dir  ${hibench.hadoop.home}/etc/hadoop# The root HDFS path to store HiBench datahibench.hdfs.master       hdfs://172.17.0.2:9000#hibench.hdfs.master       hdfs://localhost:50070# Hadoop release provider. Supported value: apache, cdh5, hdphibench.hadoop.release    apache

运行
单个测试实例运行

bin/workloads/micro/wordcount/prepare/prepare.shbin/workloads/micro/wordcount/hadoop/run.sh

运行所有在conf/benchmarks.lst 和 conf/frameworks.lst配置的测试实例.

bin/run_all.sh

查看报告

<HiBench_Root>/report/hibench.report 总结报告<workload>/hadoop/bench.log: Raw logs on client side.<workload>/hadoop/monitor.html: System utilization monitor results.<workload>/hadoop/conf/<workload>.conf: Generated environment variable configurations for this workload.

配置输入数据规模
onf/hibench.conf的字段hibench.scale.profile
可以配置tiny, small, large, huge, gigantic and bigdata
值的定义可以在对应测试实例的配置文件中找到,例如,wordcount的在
conf/workloads/micro/wordcount.conf

配置并行数
conf/hibench.conf

Property Meaning hibench.default.map.parallelism Mapper number in hadoop hibench.default.shuffle.parallelism Reducer number in hadoop
原创粉丝点击