hadoop C++ pipes 运行实现

来源：互联网发布：路由器选择知乎编辑：程序博客网时间：2024/05/17 08:20

花了好长时间，调掉所有的bug，终于把hadoop pipes C++实现弄好了。写篇文章来祭奠下逝去的光阴，阿门。

准备：

1.按照教程：http://blog.sina.com.cn/s/blog_9c43254d0101ngug.html 安装好hadoop并可以jps到6个

2.terminal熟悉

开始：

1.先写好C++程序，这里忘记从哪copy来的一份了，反正可以用就好了,我的名字叫main.cpp

#include <algorithm>#include <limits>#include <string> #include  "stdint.h"  // <--- to prevent uint64_t errors! #include "hadoop/Pipes.hh"#include "hadoop/TemplateFactory.hh"#include "hadoop/StringUtils.hh" using namespace std;class WordCountMapper : public HadoopPipes::Mapper {public:  // constructor: does nothing  WordCountMapper( HadoopPipes::TaskContext& context ) {  }  // map function: receives a line, outputs (word,"1")  // to reducer.  void map( HadoopPipes::MapContext& context ) {    //--- get line of text ---    string line = context.getInputValue();    //--- split it into words ---    vector< string > words =      HadoopUtils::splitString( line, " " );    //--- emit each word tuple (word, "1" ) ---    for ( unsigned int i=0; i < words.size(); i++ ) {      context.emit( words[i], HadoopUtils::toString( 1 ) );    }  }}; class WordCountReducer : public HadoopPipes::Reducer {public:  // constructor: does nothing  WordCountReducer(HadoopPipes::TaskContext& context) {  }  // reduce function  void reduce( HadoopPipes::ReduceContext& context ) {    int count = 0;    //--- get all tuples with the same key, and count their numbers ---    while ( context.nextValue() ) {      count += HadoopUtils::toInt( context.getInputValue() );    }    //--- emit (word, count) ---    context.emit(context.getInputKey(), HadoopUtils::toString( count ));  }}; int main(int argc, char *argv[]) {  return HadoopPipes::runTask(HadoopPipes::TemplateFactory<                              WordCountMapper,                              WordCountReducer >() );}

2.在和main.cpp同目录底下新建makefile文件，内容如下：

CC = g++HADOOP_INSTALL = /usr/local/hadoopPLATFORM = Linux-i386-32CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include -I$(HADOOP_INSTALL)/src/c++/install/include -L$(HADOOP_INSTALL)/src/c++/install/lib -lhadooputils -lhadooppipes -lcrypto -lssl -lpthreadwordcount: main.cpp$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \-lhadooputils -lcrypto -lpthread -g -O2 -o $@

注意我的hadoop_install目录可能和你的不一样，还有platform也可能不一样，自己去搜怎么确定这两个的值

然后make产生可执行文件wordcount

hello.txt

hello world

3.配置hadoop远端文件：

A.在/etc/environment中加入PATH路径 /usr/local/hadoop/bin ，这个是我的安装目录底下bin文件夹，为的是方便后面操作

B. 运行./start-all.sh开始hadoop远端操作

C. 运行如下脚本

hadoop dfs -mkdir /home

hadoop dfs -mkdir /bin

意思是在hadoop远端根目录地下产生两个文件夹

然后将可执行文件wordcount和源文件hello.txt上传到远端，命令

hadoop dfs -copyFromLocal ./wordcount /bin/

hadoop dfs -copyFromLocal ./hello.txt /home

在远端可以通过命令

hadoop dfs -ls /home

hadoop dfs -ls /bin

查看已经上传的文件

D.准备工作就绪可以跑程序了执行命令

hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /home/hello.txt -output /home/result -program /bin/wordcount

然后就可以到远端的/home/result中ls命令之后cat或者复制回本地了