java实现《Hadoop权威指南第3版》MaxTemperature
来源:互联网 发布:如何成为淘宝促销员 编辑:程序博客网 时间:2024/06/14 03:53
【参考链接】http://www.cnblogs.com/shishanyuan/archive/2014/12/22/4177908.html
原始数据和代码准备完成,下一步开始
【1】创建输入文件夹input
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir input
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
A22811459:/home/longhui/hadoop # hadoop dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 13:00 /tmp
drwxr-xr-x - root supergroup 0 2016-12-15 14:06 /user
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user/root
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
【2这一步可忽略】创建输出文件夹output
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir /user/root/output
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
drwxr-xr-x - root supergroup 0 2016-12-15 16:16 /user/root/output
删除目录A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -rmr /user/root/output
Deleted hdfs://A22811459:9000/user/root/output
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:18 /user/root/input
输出目录是自动生成的,不需要手动创建,所以要删除
【3】将气象数据复制到HDFS文件系统的输入目录下
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -copyFromLocal sample.txt /user/root/input
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/input/
Found 1 items
-rw-r--r-- 1 root supergroup 529 2016-12-15 16:18 /user/root/input/sample.txt
【4】根据java文件生成class文件,再生成jar文件
max_temperature.sh文件内容如下面四行所示:
A22811459:/home/longhui/hadoop/codes/1maxTemperature # sh max_temperature.sh
added manifest
adding: MaxTemperature.class(in = 1418) (out= 800)(deflated 43%)
adding: MaxTemperatureMapper.class(in = 1876) (out= 804)(deflated 57%)
adding: MaxTemperatureReducer.class(in = 1660) (out= 704)(deflated 57%)
adding: MaxTemperatureWithCombiner.class(in = 1494) (out= 829)(deflated 44%)
【5】运行程序
类名 输入文件 输出文件夹
16/12/15 16:29:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/12/15 16:29:59 INFO input.FileInputFormat: Total input paths to process : 1
16/12/15 16:29:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/12/15 16:29:59 WARN snappy.LoadSnappy: Snappy native library not loaded
16/12/15 16:30:00 INFO mapred.JobClient: Running job: job_201612151254_0003
16/12/15 16:30:01 INFO mapred.JobClient: map 0% reduce 0%
16/12/15 16:30:05 INFO mapred.JobClient: map 100% reduce 0%
16/12/15 16:30:12 INFO mapred.JobClient: map 100% reduce 33%
16/12/15 16:30:14 INFO mapred.JobClient: map 100% reduce 100%
16/12/15 16:30:14 INFO mapred.JobClient: Job complete: job_201612151254_0003
16/12/15 16:30:14 INFO mapred.JobClient: Counters: 29
16/12/15 16:30:14 INFO mapred.JobClient: Job Counters
16/12/15 16:30:14 INFO mapred.JobClient: Launched reduce tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4214
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Launched map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: Data-local map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8510
16/12/15 16:30:14 INFO mapred.JobClient: File Output Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Written=17
16/12/15 16:30:14 INFO mapred.JobClient: FileSystemCounters
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_READ=61
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_READ=642
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=106464
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17
16/12/15 16:30:14 INFO mapred.JobClient: File Input Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Read=529
16/12/15 16:30:14 INFO mapred.JobClient: Map-Reduce Framework
16/12/15 16:30:14 INFO mapred.JobClient: Map output materialized bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Map input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce shuffle bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Spilled Records=10
16/12/15 16:30:14 INFO mapred.JobClient: Map output bytes=45
16/12/15 16:30:14 INFO mapred.JobClient: CPU time spent (ms)=2500
16/12/15 16:30:14 INFO mapred.JobClient: Total committed heap usage (bytes)=218759168
16/12/15 16:30:14 INFO mapred.JobClient: Combine input records=0
16/12/15 16:30:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=113
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input groups=2
16/12/15 16:30:14 INFO mapred.JobClient: Combine output records=0
16/12/15 16:30:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=216408064
16/12/15 16:30:14 INFO mapred.JobClient: Reduce output records=2
16/12/15 16:30:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=752107520
16/12/15 16:30:14 INFO mapred.JobClient: Map output records=5
【6】查看结果
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/output
Found 3 items
-rw-r--r-- 1 root supergroup 0 2016-12-15 16:30 /user/root/output/_SUCCESS
drwxr-xr-x - root supergroup 0 2016-12-15 16:30 /user/root/output/_logs
-rw-r--r-- 1 root supergroup 17 2016-12-15 16:30 /user/root/output/part-r-00000
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -cat /user/root/output/part-r-00000
1949 111
1950 22
【7】通过web界面查看
【7.1】http://10.17.35.110:50030/jobtracker.jsp
【7.2】http://10.17.35.110:50070/
【0.1】原始数据sample.txt
0067011990999991950051507004+68750+023550FM-12+038299999V0203301N00671220001CN9999999N9+00001+999999999990043011990999991950051512004+68750+023550FM-12+038299999V0203201N00671220001CN9999999N9+00221+999999999990043011990999991950051518004+68750+023550FM-12+038299999V0203201N00261220001CN9999999N9-00111+999999999990043012650999991949032412004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+01111+999999999990043012650999991949032418004+62300+010750FM-12+048599999V0202701N00461220001CN0500001N9+00781+99999999999【0.2】MaxTemperature.java
// cc MaxTemperature Application to find the maximum temperature in the weather dataset// vv MaxTemperatureimport org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;public class MaxTemperature { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperature <input path> <output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperature.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }}// ^^ MaxTemperature【0.3】MaxTemperatureMapper.java
// cc MaxTemperatureMapper Mapper for maximum temperature example// vv MaxTemperatureMapperimport java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;public class MaxTemperatureMapper extends Mapper<LongWritable, Text, Text, IntWritable> { private static final int MISSING = 9999; @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String line = value.toString(); String year = line.substring(15, 19); int airTemperature; if (line.charAt(87) == '+') { // parseInt doesn't like leading plus signs airTemperature = Integer.parseInt(line.substring(88, 92)); } else { airTemperature = Integer.parseInt(line.substring(87, 92)); } String quality = line.substring(92, 93); if (airTemperature != MISSING && quality.matches("[01459]")) { context.write(new Text(year), new IntWritable(airTemperature)); } }}// ^^ MaxTemperatureMapper
【0.4】MaxTemperatureReducer.java
// cc MaxTemperatureReducer Reducer for maximum temperature example// vv MaxTemperatureReducerimport java.io.IOException;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;public class MaxTemperatureReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int maxValue = Integer.MIN_VALUE; for (IntWritable value : values) { maxValue = Math.max(maxValue, value.get()); } context.write(key, new IntWritable(maxValue)); }}// ^^ MaxTemperatureReducer
【0.5】MaxTemperatureWithCombiner.java
// cc MaxTemperatureWithCombiner Application to find the maximum temperature, using a combiner function for efficiencyimport org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;// vv MaxTemperatureWithCombinerpublic class MaxTemperatureWithCombiner { public static void main(String[] args) throws Exception { if (args.length != 2) { System.err.println("Usage: MaxTemperatureWithCombiner <input path> " + "<output path>"); System.exit(-1); } Job job = new Job(); job.setJarByClass(MaxTemperatureWithCombiner.class); job.setJobName("Max temperature"); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); job.setMapperClass(MaxTemperatureMapper.class); /*[*/job.setCombinerClass(MaxTemperatureReducer.class)/*]*/; job.setReducerClass(MaxTemperatureReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); System.exit(job.waitForCompletion(true) ? 0 : 1); }}// ^^ MaxTemperatureWithCombiner
原始数据和代码准备完成,下一步开始
【1】创建输入文件夹input
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir input
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
A22811459:/home/longhui/hadoop # hadoop dfs -ls /
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 13:00 /tmp
drwxr-xr-x - root supergroup 0 2016-12-15 14:06 /user
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root
A22811459:/home/longhui/hadoop # hadoop dfs -ls /user/root
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
【2这一步可忽略】创建输出文件夹output
A22811459:/home/longhui/hadoop # hadoop dfs -mkdir /user/root/output
A22811459:/home/longhui/hadoop # hadoop dfs -ls
Found 2 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:09 /user/root/input
drwxr-xr-x - root supergroup 0 2016-12-15 16:16 /user/root/output
删除目录A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -rmr /user/root/output
Deleted hdfs://A22811459:9000/user/root/output
A22811459:/home/longhui/hadoop/hadoop-1.2.1 # hadoop dfs -ls
Found 1 items
drwxr-xr-x - root supergroup 0 2016-12-15 16:18 /user/root/input
输出目录是自动生成的,不需要手动创建,所以要删除
【3】将气象数据复制到HDFS文件系统的输入目录下
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -copyFromLocal sample.txt /user/root/input
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/input/
Found 1 items
-rw-r--r-- 1 root supergroup 529 2016-12-15 16:18 /user/root/input/sample.txt
【4】根据java文件生成class文件,再生成jar文件
max_temperature.sh文件内容如下面四行所示:
CLASSPATH=/home/longhui/hadoop/hadoop-1.2.1/hadoop-core-1.2.1.jarrm -f *.classjavac -classpath $CLASSPATH *.javajar cvf MaxTemperature.jar *.class运行生成jar文件
A22811459:/home/longhui/hadoop/codes/1maxTemperature # sh max_temperature.sh
added manifest
adding: MaxTemperature.class(in = 1418) (out= 800)(deflated 43%)
adding: MaxTemperatureMapper.class(in = 1876) (out= 804)(deflated 57%)
adding: MaxTemperatureReducer.class(in = 1660) (out= 704)(deflated 57%)
adding: MaxTemperatureWithCombiner.class(in = 1494) (out= 829)(deflated 44%)
【5】运行程序
类名 输入文件 输出文件夹
hadoop jar MaxTemperature.jar MaxTemperature /user/root/input/sample.txt /user/root/outputA22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop jar MaxTemperature.jar MaxTemperature /user/root/input/sample.txt /user/root/output
16/12/15 16:29:59 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/12/15 16:29:59 INFO input.FileInputFormat: Total input paths to process : 1
16/12/15 16:29:59 INFO util.NativeCodeLoader: Loaded the native-hadoop library
16/12/15 16:29:59 WARN snappy.LoadSnappy: Snappy native library not loaded
16/12/15 16:30:00 INFO mapred.JobClient: Running job: job_201612151254_0003
16/12/15 16:30:01 INFO mapred.JobClient: map 0% reduce 0%
16/12/15 16:30:05 INFO mapred.JobClient: map 100% reduce 0%
16/12/15 16:30:12 INFO mapred.JobClient: map 100% reduce 33%
16/12/15 16:30:14 INFO mapred.JobClient: map 100% reduce 100%
16/12/15 16:30:14 INFO mapred.JobClient: Job complete: job_201612151254_0003
16/12/15 16:30:14 INFO mapred.JobClient: Counters: 29
16/12/15 16:30:14 INFO mapred.JobClient: Job Counters
16/12/15 16:30:14 INFO mapred.JobClient: Launched reduce tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=4214
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
16/12/15 16:30:14 INFO mapred.JobClient: Launched map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: Data-local map tasks=1
16/12/15 16:30:14 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=8510
16/12/15 16:30:14 INFO mapred.JobClient: File Output Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Written=17
16/12/15 16:30:14 INFO mapred.JobClient: FileSystemCounters
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_READ=61
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_READ=642
16/12/15 16:30:14 INFO mapred.JobClient: FILE_BYTES_WRITTEN=106464
16/12/15 16:30:14 INFO mapred.JobClient: HDFS_BYTES_WRITTEN=17
16/12/15 16:30:14 INFO mapred.JobClient: File Input Format Counters
16/12/15 16:30:14 INFO mapred.JobClient: Bytes Read=529
16/12/15 16:30:14 INFO mapred.JobClient: Map-Reduce Framework
16/12/15 16:30:14 INFO mapred.JobClient: Map output materialized bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Map input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce shuffle bytes=61
16/12/15 16:30:14 INFO mapred.JobClient: Spilled Records=10
16/12/15 16:30:14 INFO mapred.JobClient: Map output bytes=45
16/12/15 16:30:14 INFO mapred.JobClient: CPU time spent (ms)=2500
16/12/15 16:30:14 INFO mapred.JobClient: Total committed heap usage (bytes)=218759168
16/12/15 16:30:14 INFO mapred.JobClient: Combine input records=0
16/12/15 16:30:14 INFO mapred.JobClient: SPLIT_RAW_BYTES=113
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input records=5
16/12/15 16:30:14 INFO mapred.JobClient: Reduce input groups=2
16/12/15 16:30:14 INFO mapred.JobClient: Combine output records=0
16/12/15 16:30:14 INFO mapred.JobClient: Physical memory (bytes) snapshot=216408064
16/12/15 16:30:14 INFO mapred.JobClient: Reduce output records=2
16/12/15 16:30:14 INFO mapred.JobClient: Virtual memory (bytes) snapshot=752107520
16/12/15 16:30:14 INFO mapred.JobClient: Map output records=5
【6】查看结果
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -ls /user/root/output
Found 3 items
-rw-r--r-- 1 root supergroup 0 2016-12-15 16:30 /user/root/output/_SUCCESS
drwxr-xr-x - root supergroup 0 2016-12-15 16:30 /user/root/output/_logs
-rw-r--r-- 1 root supergroup 17 2016-12-15 16:30 /user/root/output/part-r-00000
A22811459:/home/longhui/hadoop/codes/1maxTemperature # hadoop dfs -cat /user/root/output/part-r-00000
1949 111
1950 22
【7】通过web界面查看
【7.1】http://10.17.35.110:50030/jobtracker.jsp
【7.2】http://10.17.35.110:50070/
0 0
- java实现《Hadoop权威指南第3版》MaxTemperature
- 权威指南中Maxtemperature
- Hadoop权威指南---第二章MaxTemperature例题源码
- Hadoop权威指南(第2版)
- Hadoop权威指南(第2版)--第1章
- Hadoop权威指南(第2版)--第2章
- Hadoop权威指南(中文第2版)PDF
- Hadoop权威指南(第4版)-OReilly 2016 读书笔记
- 笔记:Hadoop权威指南 第3章 HDFS
- HADOOP权威指南 第3版 PDF电子书下载 带目录书签 完整版
- hadoop 权威指南【第三版】
- 《Hadoop权威指南》- 3、Hadoop 分布式文件系统
- 笔记:Hadoop权威指南 第1章 初识Hadoop
- 笔记:Hadoop权威指南 第4章 Hadoop I/O
- 笔记:Hadoop权威指南 第9章 构建Hadoop集群
- 笔记:Hadoop权威指南 第10章 管理Hadoop
- 【Hadoop权威指南】初识Hadoop(第一天)
- 【Hadoop权威指南】Hadoop分布式文件系统(第三天)
- ionic2 下集成highchart与chart
- 【Fragment】FragmentManager和FragmentTransaction使用
- Android中LocalSocket使用
- 树&&二叉树(递归实现)
- shell 脚本中获取命令的输出
- java实现《Hadoop权威指南第3版》MaxTemperature
- vi的复制粘贴命令
- python装饰器
- Android屏幕适配全攻略(最权威的官方适配指导)
- docker安装 centos6
- java访问Linux服务器读取文件
- Java——成员变量和局部变量的区别
- Spring MVC @CookieValue学习
- HTTP/2笔记之流和多路复用