docker体验hadoop
来源:互联网 发布:邮箱大师缓存数据清理 编辑:程序博客网 时间:2024/06/07 18:50
1. 拉取镜像
docker pull sequenceiq/hadoop-docker
2. 运行镜像
docker run -it -p 50070:50070 sequenceiq/hadoop-docker:2.7.0 /etc/bootstrap.sh -bash
在镜像中运行如下命令。
开放50070是为了从web端查看hadoop运行状况。
3. build项目 (WordCount)
新建工作目录取名为 test, 进入test目录后运行如下命令:
3.1 编写程序
vim WordCount.java
hadoop中的HelloWorld程序。
import java.io.IOException;import java.util.StringTokenizer;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.mapreduce.Reducer;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;import org.apache.hadoop.util.GenericOptionsParser;public class WordCount { public static class TokenizerMapper extends Mapper<Object, Text, Text, IntWritable>{ private final static IntWritable one = new IntWritable(1); private Text word = new Text(); public void map(Object key, Text value, Context context ) throws IOException, InterruptedException { StringTokenizer itr = new StringTokenizer(value.toString()); while (itr.hasMoreTokens()) { word.set(itr.nextToken()); context.write(word, one); } } } public static class IntSumReducer extends Reducer<Text,IntWritable,Text,IntWritable> { private IntWritable result = new IntWritable(); public void reduce(Text key, Iterable<IntWritable> values, Context context ) throws IOException, InterruptedException { int sum = 0; for (IntWritable val : values) { sum += val.get(); } result.set(sum); context.write(key, result); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); String[] otherArgs = new GenericOptionsParser(conf, args).getRemainingArgs(); if (otherArgs.length < 2) { System.err.println("Usage: wordcount <in> [<in>...] <out>"); System.exit(2); } Job job = Job.getInstance(conf, "word count"); job.setJarByClass(WordCount.class); job.setMapperClass(TokenizerMapper.class); job.setCombinerClass(IntSumReducer.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); for (int i = 0; i < otherArgs.length - 1; ++i) { FileInputFormat.addInputPath(job, new Path(otherArgs[i])); } FileOutputFormat.setOutputPath(job, new Path(otherArgs[otherArgs.length - 1])); System.exit(job.waitForCompletion(true) ? 0 : 1); }}
3.2 打成jar包
进入到 $HADOOP_PREFIX目录下 cd $HADOOP_PREFIX
以下4个jar包是编译WordCount.java的依赖包
/usr/local/hadoop/share/hadoop/common/hadoop-common-2.7.0.jar/usr/local/hadoop/share/hadoop/common/lib/hadoop-annotations-2.7.0.jar/usr/local/hadoop/share/hadoop/common/lib/commons-cli-1.2.jar/usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.7.0.jar
将它们放入工作目录test中, 然后运行如下命令进行编译:
javac -classpath hadoop-common-2.7.0.jar:hadoop-annotations-2.7.0.jar:commons-cli-1.2.jar:hadoop-mapreduce-client-core-2.7.0.jar WordCount.java
编译后test目录出现如下几个class文件, 之后打包class文件:
jar -cvf wordcount.jar *.class
test目录下出现 wordcount.jar文件。
4. 运行任务
编辑一个输入的文件
vim input_files/hellohello worldhello worldhello hi
将这个输入文件传入hadoop fs里面
cd $HADOOP_PREFIX bin/hadoop fs -put input_files/* input
运行jar包:
bin/hadoop jar wordcount.jar WordCount input output
查看输出结果
bin/hadoop fs -cat output/*
结果如下:
hello 3
hi 1
world 2
5.访问web界面
在浏览器输入 localhost:50070
阅读全文
1 0
- docker体验hadoop
- 体验 Docker
- docker hadoop
- Docker初体验
- docker 1.12 体验
- docker安装与体验
- Docker 初体验
- Docker初体验
- 手把手体验 Docker HelloWorld
- docker 初步体验
- Docker初体验
- Docker初体验
- docker初体验
- Docker学习笔记(一)Docker初体验
- docker与Hadoop
- docker 配置hadoop问题
- docker and hadoop
- Docker部署Hadoop集群
- 《机器学习实战》学习笔记(二)之决策树(上)决策树的生成及修剪,ID3,C4.5CART算法
- 如何建立自己的光纤网络来连接数据中心
- github提交时没有记录Contributions
- Rsync+Inotify架构实现实时同步
- uic命令
- docker体验hadoop
- MFC利用线程更新界面
- Java中几种常量池的区分
- 配置Tomcat服务器环境+eclipse下如何配置tomcat
- Leetcode刷题之路(Python)—— 654. Maximum Binary Tree
- 关于R中data()函数
- Python(九) 面向对象
- 动态规划算法
- MySQL的基本操作