java程序员的大数据之路(4):编程调用HDFS

来源:互联网 发布:高中毕业学java 编辑:程序博客网 时间:2024/06/03 23:40

背景

之前的文章中和大家分享了我用maven构建Hadoop项目的过程,有一个遗留的问题就是,Hadoop运行时,如果输出文件已经存在,那么运行会报错。在上一篇文章中,我也写了FileUtil来解决问题,但是只能用于读取本地文件,并不支持读取HDFS文件。因此,今天我又来分享如何编程调用HDFS啦!

正式开始

包结构

先来看一下包结构

  • HdfsDao:提供HDFS相应配置
  • IHdfsService:提供调用HDFS命令的接口
  • HdfsService:IHdfsService的具体实现
  • TestHdfs:测试类

代码实现

  1. HdfsDao
public class HdfsDao {    private static final String HDFS = "hdfs://127.0.0.1:9000";    public HdfsDao(Configuration config) {        this(HDFS,config);    }    public HdfsDao(String hdfs,Configuration config) {        this.hdfsPath = hdfs;        this.config = config;    }    //HDFS path    private String hdfsPath;    //Hadoop System Configuration    private Configuration config;    public static JobConf config() {        JobConf conf = new JobConf(HdfsDao.class);        conf.setJobName("HdfsDAO");        conf.addResource("classpath:/hadoop/core-site.xml");        conf.addResource("classpath:/hadoop/hdfs-site.xml");        conf.addResource("classpath:/hadoop/mapred-site.xml");        return conf;    }    public String getHdfsPath() {        return hdfsPath;    }    public void setHdfsPath(String hdfsPath) {        this.hdfsPath = hdfsPath;    }    public Configuration getConfig() {        return config;    }    public void setConfig(Configuration config) {        this.config = config;    }}
  1. HdfsService
public class HdfsService implements IHdfsService {    public void ls(String folder) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        Path path = new Path(folder);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        FileStatus[] list = fs.listStatus(path);        System.out.println("ls " + folder);        System.out.println("=====================");        for(FileStatus f : list) {            System.out.printf("name: %s, folder: %s, size: %d\n", f.getPath(), f.isDir(), f.getLen());        }        System.out.println("=====================");    }    public void mkdirs(String folder) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        Path path = new Path(folder);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        if(!fs.exists(path)) {            fs.mkdirs(path);            System.out.println("Create: " + folder);        }        fs.close();    }    public void rmr(String folder) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        Path path = new Path(folder);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        fs.deleteOnExit(path);        System.out.println("Delete: " + folder);        fs.close();    }    public void copyFromLocal(String local, String remote) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        fs.copyFromLocalFile(new Path(local), new Path(remote));        System.out.println("upload from " + local + " to " + remote);        fs.close();    }    public void cat(String remoteFile) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        Path path = new Path(remoteFile);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        FSDataInputStream fsdis = null;        try {            fsdis = fs.open(path);            IOUtils.copyBytes(fsdis,System.out,4096,false);        }finally {            IOUtils.closeStream(fsdis);            fs.close();        }    }    public void copyToLocal(String remote, String local) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        Path path = new Path(remote);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        fs.copyToLocalFile(path, new Path(local));        System.out.println("Download from " + remote + " to " + local);        fs.close();    }    public void createFile(String file, String content) throws IOException {        JobConf conf = HdfsDao.config();        HdfsDao dao = new HdfsDao(conf);        FileSystem fs = FileSystem.get(URI.create(dao.getHdfsPath()),conf);        byte[] buff = content.getBytes();        FSDataOutputStream os = null;        try {            os = fs.create(new Path(file));            os.write(buff,0,buff.length);            System.out.println("Create file: " + file);        }finally {            if(os != null) {                os.close();            }        }        fs.close();    }}

测试

首先利用之前运行过的WordCount类进行测试,运行前不删除之前的输出目录,在运行代码前加入以下代码:

        IHdfsService hdfs = new HdfsService();        hdfs.rmr("/user/jackeyzhe/output");

运行后在控制台可以看到

并且运行成功,证明删除目录方法没有问题。
测试cat和ls方法,控制台打印如下:

haha    3hehe    1jackey  1jackeyzhe   1ls=====================name: hdfs://127.0.0.1:9000/user/jackeyzhe/output/_SUCCESS, folder: false, size: 0name: hdfs://127.0.0.1:9000/user/jackeyzhe/output/part-00000, folder: false, size: 35=====================

其他的测试不一一列举。
至此我们已经可以通过编程调用HDFS了

参考文章

Hadoop编程调用HDFS

阅读全文
0 0
原创粉丝点击