hadoop学习004

来源:互联网 发布:网络摄像头安装教程 编辑:程序博客网 时间:2024/05/16 12:16

1,《Hadoop in Action》笔记

  • 实现了PutMerge,即合并本地文件后放入HDFS。程序中大多数类库采用Hadoop中定义好的

import java.io.IOException;import org.apache.hadoop.conf.Configuration;import org.apache.hadoop.fs.FSDataInputStream;import org.apache.hadoop.fs.FSDataOutputStream;import org.apache.hadoop.fs.FileStatus;import org.apache.hadoop.fs.FileSystem;import org.apache.hadoop.fs.Path;public static void main(String[] args) throws IOException{Configuration conf = new Configuration();<span style="white-space:pre">// 资源配置FileSystem hdfs = FileSystem.get(conf);<span style="white-space:pre">// 操作HDFS文件系统FileSystem local = FileSystem.getLocal(conf);<span style="white-space:pre">// 操作本地文件系统Path inputDir = new Path(args[0]);<span style="white-space:pre">// 合并路径Path hdfsFile = new Path(args[1]);<span style="white-space:pre">// 输出文件路径try{FileStatus[] inputFiles = local.listStatus(inputDir);<span style="white-space:pre">// 获取目录中所有文件FSDataOutputStream out = hdfs.create(hdfsFile);<span style="white-space:pre">// 创建输出文件for(int i=0; i<inputFiles.length; i++){FSDataInputStream in = local.open(inputFiles[i].getPath());<span style="white-space:pre">// 打开输入文件byte buffer[] = new byte[256];int bytesRead = 0;while((bytesRead = in.read(buffer)) > 0){out.write(buffer, 0, butesRead);<span style="white-space:pre">// 写入数据}in.close();}out.close();} catch(IOException e){e.printStackTrace();}}
  • MapReduce程序通过键/值对来处理数据,为了使键/值对的数据能够在集群上移动,MapReduce框架提供一种序列化键/值对的方法,因此只有支持这种序列化的类才能充当键或值
  • Hadoop自带一些预定义的键/值对类型,程序可以通过实现Writable借口自定义值类型,通过实现WritableComparable接口自定义键值类型,如下面类型:

public class Edge implements WritableComparable<Edge>{private String departureNode;private String arrivalNode;public String getDepartureNode() { return departureNode; }@Overridepublic void readFields(DataInput in) throws IOException{departureNode = in.readUTF();arrivalNode = in.readUTF();}@Overridepublic void write(DataOutput out) throws IOException{out.writeUTF(departureNode);out.writeUTF(arrivalNode);}@Overridepublic int compareTo(Edge o){return (departureNode.compareTo(o.departureNode) != 0)? departureNode.compareTo(o.departureNode): arrivalNode.compareTo(o.arrivalNode);}}
  • Hadoop默认通过HashPartitioner类对键进行散列来确定reducer,可通过重新实现Partitioner类来自定义分配方式: 

public class EdgePartitioner implements Partitioner<Edge, Writable>{@Overridepublic int getPartition(Edge key, Writable value, int numPartitions){return key.getDepartureNode().hashCode() % numPartitions;}@Overridepublic void configure(JobConf conf){}}





0 0