Hadoop工作原理图-WordCount示例
来源:互联网 发布:php好找工作么 编辑:程序博客网 时间:2024/04/30 10:42
一个Mapper对应一个碎片段。
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.LongWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Mapper;import org.apache.hadoop.util.StringUtils;import java.io.IOException;/** * author: test * date: 2015/1/25. */public class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> { /** * 输入: * 行所在的下标为key,类型为LongWritable * 行的内容为value,类型为Text * * 输出: * key: Text * value: IntWritable */ //此方法循环调用,从文件的split中,读取每行调用一次,把该行所在的下标为key,以该行的值(内容)为value, protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] words = StringUtils.split(value.toString(), ' '); for (String word : words) { context.write(new Text(word), new IntWritable(1)); } }}
import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Reducer;import java.io.IOException;/** * author: test * date: 2015/1/25. */public class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> { /** * 此方法循环调用,每组调用一次 * 这组的特点是:key相同,value可能有多个 */ protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int sum = 0; for (IntWritable value : values) { sum += value.get(); } context.write(new Text(key), new IntWritable(sum)); }}
import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;import org.apache.hadoop.io.IntWritable;import org.apache.hadoop.io.Text;import org.apache.hadoop.mapreduce.Job;import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;/** * author: test * date: 2015/1/25. */public class RunJob { public static void main(String[] args) { Configuration conf = new Configuration();//装在src或者classPath下的所有配置文件 try { Job job = Job.getInstance(); job.setJarByClass(RunJob.class); job.setJobName("WordCount"); job.setMapperClass(WordCountMapper.class); job.setReducerClass(WordCountReducer.class); job.setMapOutputKeyClass(Text.class); job.setMapOutputValueClass(IntWritable.class); FileSystem fs = FileSystem.get(conf); FileInputFormat.addInputPath(job, new Path("D:/hadoop/input/input")); Path output = new Path("D:/hadoop/output/wc"); if (fs.exists(output)) { fs.delete(output, true);//递归删除 } FileOutputFormat.setOutputPath(job, output); if (job.waitForCompletion(true)) { System.out.println("Job Done!"); } } catch (Exception e) { e.printStackTrace(); } }}
执行:
1.打jar包,名字为wc.jar
2.hadoop jar wc.jar com.xxx.RunJob(入口类)
how to kill a MapReduce job
Depending on the version, do:
version <2.3.0
Kill a hadoop job:
hadoop job -kill $jobId
You can get a list of all jobId's doing:
hadoop job -list
version >=2.3.0
Kill a hadoop job:
yarn application -kill $ApplicationId
You can get a list of all ApplicationId's doing:
yarn application -list
hadoop与job相关的命令:
1.查看 Job 信息:
hadoop job -list
2.杀掉 Job:
hadoop job –kill job_id
3.指定路径下查看历史日志汇总:
hadoop job -history output-dir
4.作业的更多细节:
hadoop job -history all output-dir
5.打印map和reduce完成百分比和所有计数器:
hadoop job –status job_id
6.杀死任务。被杀死的任务不会不利于失败尝试:
hadoop jab -kill-task <task-id>
7.使任务失败。被失败的任务会对失败尝试不利:
hadoop job -fail-task <task-id>
阅读全文
0 0
- Hadoop工作原理图-WordCount示例
- hadoop中的wordcount示例
- hadoop Wordcount示例 出错
- Hadoop入门-WordCount示例
- hadoop示例WordCount
- Hadoop示例程序WordCount详解
- Hadoop示例程序WordCount详解
- Hadoop示例程序WordCount详解
- hadoop示例程序wordcount分析
- hadoop 0.20.2 wordcount 示例
- hadoop示例程序 WordCount详解
- 运行Hadoop示例程序WordCount
- Hadoop配置以及WordCount示例
- hadoop自带示例wordcount
- Hadoop MapReduce初窥-wordcount示例
- IDEA 运行 Hadoop WordCount示例
- Hadoop笔记之Split工作原理图
- Hadoop示例程序WordCount详解及实例
- 闭包与回调
- Python的import
- windows下tensorflow的安装与学习(基于anaconda的python3.5)
- rabbitmq的java调用实例
- Problem C: 给我一台计算机吧!
- Hadoop工作原理图-WordCount示例
- git将代码上传到coding分支
- 网络编程
- java学习之接口
- Hibernate入门05_一对一映射
- Oracle 事务隔离级别
- 数据挖掘岗面试总结
- JaveScript之函数传参
- ffmpeg从网上保存视频流到本地文件