mapreduce 多种输入
来源:互联网 发布:惠州干部网络大学堂 编辑:程序博客网 时间:2024/05/22 16:59
http://blog.csdn.net/july_2/article/details/8211577
1.多路径输入
1)FileInputFormat.addInputPath 多次调用加载不同路径
FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"));
FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path2"));
2)FileInputFormat.addInputPaths一次调用加载 多路径字符串用逗号隔开
FileInputFormat.addInputPaths(job, "hdfs://RS5-112:9000/cs/path1,hdfs://RS5-112:9000/cs/path2");
2.多种输入
MultipleInputs可以加载不同路径的输入文件,并且每个路径可用不同的maper
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);
例子:
package example;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* 多类型文件输入
* @author lijl
*
*/
public class MultiTypeFileInputMR {
static class MultiTypeFileInput1Mapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key,Text value,Context context){
try {
String[] str = value.toString().split("\\|");
context.write(new Text(str[0]), new Text(str[1]));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
static class MultiTypeFileInput3Mapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key,Text value,Context context){
try {
String[] str = value.toString().split("");
context.write(new Text(str[0]), new Text(str[1]));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
static class MultiTypeFileInputReducer extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context){
try {
for(Text value:values){
context.write(key,value);
}
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
conf.set("mapred.textoutputformat.separator", ",");
Job job = new Job(conf,"MultiPathFileInput");
job.setJarByClass(MultiTypeFileInputMR.class);
FileOutputFormat.setOutputPath(job, new Path("hdfs://RS5-112:9000/cs/path6"));
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(MultiTypeFileInputReducer.class);
job.setNumReduceTasks(1);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
- mapreduce 多种输入
- mapreduce 多种输入
- mapreduce 多种输入
- MapReduce多种输入格式
- 多种输入的MapReduce程序实例
- MapReduce多种输出格式
- MapReduce工作流多种实现方式
- 多种数据输入
- Mapreduce的输入格式
- Mapreduce的输入格式
- mapreduce自定义输入
- MapReduce输入路径
- MapReduce自定义输入格式
- MapReduce之多数据源输入
- 2.1MapReduce输入
- Hadoop MapReduce Job 提交的多种方案
- 只能输入数字多种方法
- hadoop mapreduce 多输入路径
- 用go开发的足球预测分析网站上线了
- Androidx学习笔记(42)--- 多线程下载(java项目)
- c++指针
- wsimport命令的使用
- gulp:入门简介(我是抄来滴,^_^)
- mapreduce 多种输入
- iOS开发之横道图
- WEB小结(1)——使用js设置ip地址对话框
- HDU2064 汉诺塔III
- 层级查询高级用法, 执行计划hash group by--工作备忘2016/02/02
- Google's BigTable 原理(Google三大论文之一)
- 输入/输出函数
- Swift中ViewController类与storyboard绑定报错
- 微软财报躲猫猫?想让人相信它是云计算老大却只有亚马逊AWS的五分之一!