Hadoop Streaming
来源:互联网 发布:网络四大才子,龙傲天 编辑:程序博客网 时间:2024/05/22 05:56
(From the book Hadoop in Action, Section 4.5)
1. Streaming with Unix commands
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -input input/input.txt -output output -mapper 'cut -f 2 -d ,' -reducer 'uniq'$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -D mapred.reduce.tasks=0 -input output -output output_a -mapper 'wc -l'
The mapper directly output the record count without any reducer, so we set mapred.reduce.tasks to 0 and do not specify the -reducer option at all.
2. Streaming with scripts
For example, apply a python script in Hadoop to get a smaller sample of a data set. Below is RandomSample.py:
#!/usr/bin/env pythonimport sys, randomfor line in sys.stdin:if (random.randint(1, 100) <= int(sys.argv[1])):print(line.strip())
Then, execute with the following command (on Cygwin in Windows):
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -D mapred.reduce.tasks=1 -input workspace/data/cite75_99.txt -output workspace/outputa -mapper 'python2.7.exe workspace/RandomSample.py 10' -file workspace/RandomSample.py
Hadoop Streaming supports a -file option to package your executable file as part of the job submission.
As we have not specified any particular reducer, it will use the default IdentityReducer.
C++ code can also be applied: (compile .cpp to get .exe)
$ bin/hadoop jar contrib/streaming/hadoop-streaming-1.0.4.jar -D mapred.reduce.tasks=1 -input workspace/data/cite75_99.txt -output workspace/outputc -mapper 'workspace/RandomSample.exe 10' -file workspace/RandomSample.exe
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- hadoop streaming
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- hadoop streaming
- Hadoop Streaming
- hadoop streaming
- Hadoop Streaming
- Hadoop Streaming
- Hadoop Streaming
- hadoop streaming
- debian7安装oracle11g
- RQNOJ 传纸条
- Android ADT更新后无法编译生成R.java的问题解决方案
- IELTS vacabulary - H
- Biden
- Hadoop Streaming
- 轻量级前后端框架 jFinal+AngularJs介绍
- Android创建工程时不能生成R文件的处理方法
- J2EE的Servlet实现SSL(安全套接字层)会话
- MySQL中确定表的自增字段的下一个自增值
- delphi中单独编译pas生成dcu文件
- redhat-linux包管理器-rpm
- WCF学习笔记-KnowTypeAttribute用法
- Ubuntu 12.04 升级至13.04 时显卡不兼容问题解决方案