hadoop C++ pipes 运行实现
来源:互联网 发布:路由器选择 知乎 编辑:程序博客网 时间:2024/05/17 08:20
花了好长时间,调掉所有的bug,终于把hadoop pipes C++实现弄好了。写篇文章来祭奠下逝去的光阴,阿门。
准备:
1.按照教程:http://blog.sina.com.cn/s/blog_9c43254d0101ngug.html 安装好hadoop并可以jps到6个
2.terminal熟悉
开始:
1.先写好C++程序,这里忘记从哪copy来的一份了,反正可以用就好了,我的名字叫main.cpp
#include <algorithm>#include <limits>#include <string> #include "stdint.h" // <--- to prevent uint64_t errors! #include "hadoop/Pipes.hh"#include "hadoop/TemplateFactory.hh"#include "hadoop/StringUtils.hh" using namespace std;class WordCountMapper : public HadoopPipes::Mapper {public: // constructor: does nothing WordCountMapper( HadoopPipes::TaskContext& context ) { } // map function: receives a line, outputs (word,"1") // to reducer. void map( HadoopPipes::MapContext& context ) { //--- get line of text --- string line = context.getInputValue(); //--- split it into words --- vector< string > words = HadoopUtils::splitString( line, " " ); //--- emit each word tuple (word, "1" ) --- for ( unsigned int i=0; i < words.size(); i++ ) { context.emit( words[i], HadoopUtils::toString( 1 ) ); } }}; class WordCountReducer : public HadoopPipes::Reducer {public: // constructor: does nothing WordCountReducer(HadoopPipes::TaskContext& context) { } // reduce function void reduce( HadoopPipes::ReduceContext& context ) { int count = 0; //--- get all tuples with the same key, and count their numbers --- while ( context.nextValue() ) { count += HadoopUtils::toInt( context.getInputValue() ); } //--- emit (word, count) --- context.emit(context.getInputKey(), HadoopUtils::toString( count )); }}; int main(int argc, char *argv[]) { return HadoopPipes::runTask(HadoopPipes::TemplateFactory< WordCountMapper, WordCountReducer >() );}
2.在和main.cpp同目录底下新建makefile文件,内容如下:
CC = g++HADOOP_INSTALL = /usr/local/hadoopPLATFORM = Linux-i386-32CPPFLAGS = -m32 -I$(HADOOP_INSTALL)/c++/$(PLATFORM)/include -I$(HADOOP_INSTALL)/src/c++/install/include -L$(HADOOP_INSTALL)/src/c++/install/lib -lhadooputils -lhadooppipes -lcrypto -lssl -lpthreadwordcount: main.cpp$(CC) $(CPPFLAGS) $< -Wall -L$(HADOOP_INSTALL)/c++/$(PLATFORM)/lib -lhadooppipes \-lhadooputils -lcrypto -lpthread -g -O2 -o $@
注意我的hadoop_install目录可能和你的不一样,还有platform也可能不一样,自己去搜怎么确定这两个的值
然后make产生可执行文件wordcount
hello.txt
hello world
3.配置hadoop远端文件:
A.在/etc/environment中加入PATH路径 /usr/local/hadoop/bin ,这个是我的安装目录底下bin文件夹,为的是方便后面操作
B. 运行./start-all.sh开始hadoop远端操作
C. 运行如下脚本
hadoop dfs -mkdir /home
hadoop dfs -mkdir /bin
意思是在hadoop远端根目录地下产生两个文件夹
然后将可执行文件wordcount和源文件hello.txt上传到远端,命令
hadoop dfs -copyFromLocal ./wordcount /bin/
hadoop dfs -copyFromLocal ./hello.txt /home
在远端可以通过命令
hadoop dfs -ls /home
hadoop dfs -ls /bin
查看已经上传的文件
D.准备工作就绪可以跑程序了执行命令
hadoop pipes -D hadoop.pipes.java.recordreader=true -D hadoop.pipes.java.recordwriter=true -input /home/hello.txt -output /home/result -program /bin/wordcount
然后就可以到远端的/home/result中ls命令之后cat或者复制回本地了
- hadoop C++ pipes 运行实现
- hadoop c++ pipes接口实现
- Hadoop:Pipes接口实现
- Hadoop Pipes
- Hadoop pipes
- Hadoop Pipes编程之C++实现WordCount
- Hadoop:基于Pipes实现作业提交
- Hadoop实战 Hadoop Pipes运行C++程序问题解决
- hadoop 3.0.0 上运行pipes c++ 程序
- Hadoop pipes编程
- Hadoop pipes设计原理
- Hadoop pipes编程
- Hadoop pipes设计原理
- Hadoop pipes编程
- Hadoop pipes编程
- hadoop pipes
- Hadoop pipes设计原理
- Hadoop Pipes & Streaming
- Java程序员集合框架面试题
- 关于webrtc的VAD(voice activity dectctor)算法说明
- IDE工具之myEclipse篇三::配合Git管理项目,项目提交不上去
- Browser缓存机制浅析
- 级联菜单1
- hadoop C++ pipes 运行实现
- QueryRunner-2
- 网页编码就是那点事
- 我的内核学习笔记2:platform设备模型
- Everything研究之快速获取USN记录的文件路径
- X11/extensions/XTest.h: No such file or directory
- Handler sendMessage 与 obtainMessage (sendToTarget)比较
- linux 下 qt 控制台应用 终端 无输出
- java序列化与反序列化中transient和static成员剖析