hadoop mapreduce 多输入路径
来源:互联网 发布:win10 自动维护 知乎 编辑:程序博客网 时间:2024/05/21 16:33
1.多路径输入
1)FileInputFormat.addInputPath 多次调用加载不同路径
FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"));
FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path2"));
or
FileInputFormat.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/*"));
2)FileInputFormat.addInputPaths一次调用加载 多路径字符串用逗号隔开
FileInputFormat.addInputPaths(job, "hdfs://RS5-112:9000/cs/path1,hdfs://RS5-112:9000/cs/path2");
2.多种输入
MultipleInputs可以加载不同路径的输入文件,并且每个路径可用不同的maper
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);
例子:
package example;
import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* 多类型文件输入
* @author lijl
*
*/
public class MultiTypeFileInputMR {
static class MultiTypeFileInput1Mapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key,Text value,Context context){
try {
String[] str = value.toString().split("\\|");
context.write(new Text(str[0]), new Text(str[1]));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
static class MultiTypeFileInput3Mapper extends Mapper<LongWritable, Text, Text, Text>{
public void map(LongWritable key,Text value,Context context){
try {
String[] str = value.toString().split("");
context.write(new Text(str[0]), new Text(str[1]));
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
static class MultiTypeFileInputReducer extends Reducer<Text, Text, Text, Text>{
public void reduce(Text key,Iterable<Text> values,Context context){
try {
for(Text value:values){
context.write(key,value);
}
} catch (IOException e) {
e.printStackTrace();
} catch (InterruptedException e) {
e.printStackTrace();
}
}
}
public static void main(String[] args) throws IOException, InterruptedException, ClassNotFoundException {
Configuration conf = new Configuration();
conf.set("mapred.textoutputformat.separator", ",");
Job job = new Job(conf,"MultiPathFileInput");
job.setJarByClass(MultiTypeFileInputMR.class);
FileOutputFormat.setOutputPath(job, new Path("hdfs://RS5-112:9000/cs/path6"));
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(Text.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);
job.setReducerClass(MultiTypeFileInputReducer.class);
job.setNumReduceTasks(1);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path1"), TextInputFormat.class,MultiTypeFileInput1Mapper.class);
MultipleInputs.addInputPath(job, new Path("hdfs://RS5-112:9000/cs/path3"), TextInputFormat.class,MultiTypeFileInput3Mapper.class);
System.exit(job.waitForCompletion(true)?0:1);
}
}
- hadoop mapreduce 多输入路径
- [Hadoop]MapReduce多路径输入与多个输入
- Hadoop MapReduce多路径输入和多个类型输入
- Hadoop MapReduce多路径输入与多个输入 例子
- Hadoop streaming mapreduce多文件输入使用方法
- 多输入路径MapReduce完整代码详解
- mapreduce多路径输入单文件输出
- MapReduce输入路径
- Hadoop streaming 编写MapReduce程序-二次排序,多文件输入
- MapReduce多路径输入与多文件输出
- MapReduce多路径输入与多文件输出
- 自定义 hadoop MapReduce InputFormat 切分输入文件
- 自定义 hadoop MapReduce InputFormat 切分输入文件
- Hadoop源码分析1 MapReduce的输入
- 自定义 hadoop MapReduce InputFormat 切分输入文件
- 自定义 hadoop MapReduce InputFormat 切分输入文件
- hadoop之MapReduce输入(split)输出
- 自定义 hadoop MapReduce InputFormat 切分输入文件
- code review
- 关注C++细节——含有本类对象指针的类的构造函数、析构函数、拷贝构造函数、赋值运算符的例子
- svn http支持
- 给定单向链表的头指针和一个结点指针,定义一个函数在O(1)时间删除该结点
- 基于安卓的课程设计——加速度检测应用
- hadoop mapreduce 多输入路径
- SQL Server2008安装错误
- Java实现把一个数组中的某个数向后移动
- 离线升级/安装最新ADT插件
- 云安全三大趋势:纵深防御、软件定义安全、设备虚拟化
- Oracle数据恢复顾问(DRA)使用测试 (之二)
- crash by cache read data, object at index beyond
- 引用和const引用
- spring装配bean