hadoop hdfs API操作

来源:互联网 发布:js radio 选中 编辑:程序博客网 时间:2024/05/16 00:28

hadoop的hdfs API的基本操作

简单的介绍

hadoop为我们提供了hdfs非常方便的shell命令(类似于Linux文件操作的命令),再者。hadoop还为我们提供hdfsAPI,使我们开发人员可以对hfds进行一些操作。如:copy文件(从本地到hdfs,从hdfs到本地)、删除文件或者目录、读取文件的内容、看文件的相关信息、列出文件的所有子目录,在文件后面追加内容。(注意:hdfs不支持文件中某一行的修改,只允许追加内容到文件的后面)。

首先我初始化hdfs,最后将hdfs关闭:
<span style="white-space:pre"></span>private static final String HDFS_PATH = "hdfs://localhost:8020";private Configuration conf = null;private FileSystem fs = null;@Beforepublic void beforeClass() throws IOException {conf = new Configuration();fs = FileSystem.get(URI.create(HDFS_PATH), conf);}@Afterpublic void AfterClass() throws IOException {fs.close();}


从本地copy文件到hdfs或者是从hdfs copy文件到本地

@Testpublic void testCopyLocalFileToHDFS() throws IOException {String[] args = { "/test.txt1","hdfs://localhost:8020/user/root/test.txt" };if (args.length != 2) {System.err.println("Usage: filecopy <source> <target>");System.exit(2);}InputStream in = new BufferedInputStream(new FileInputStream(args[0]));FileSystem fs = FileSystem.get(URI.create(args[1]), conf);OutputStream out = fs.create(new Path(args[1]));IOUtils.copyBytes(in, out, conf);// fs.copyFromLocalFile(new// Path("/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"), new// Path(HDFS_PATH+"/user/root/"));fs.copyToLocalFile(new Path("hdfs://localhost:8020/user/root/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"),new Path("/user/"));}

删除文件

@Testpublic void deleteFile() throws IOException {fs.delete(new Path("hdfs://localhost:8020/user/root/out1"), true);}

读取文件到输出流

@Testpublic void readFile() {InputStream in = null;try {in = fs.open(new Path(HDFS_PATH + "/user/root/test.txt"));IOUtils.copyBytes(in, System.out, conf);} catch (IOException e) {e.printStackTrace();} finally {IOUtils.closeStream(in);}}

获取文件的信息

@Testpublic void getFileInfo() throws IllegalArgumentException, IOException {FileStatus fSta = fs.getFileStatus(new Path(HDFS_PATH+ "/user/root/test.txt"));System.out.println(fSta.getAccessTime());System.out.println(fSta.getBlockSize());System.out.println(fSta.getModificationTime());System.out.println(fSta.getOwner());System.out.println(fSta.getGroup());System.out.println(fSta.getLen());System.out.println(fSta.getPath());System.out.println(fSta.isSymlink());}

列出目录下的所有文件

@Testpublic void listFile() throws FileNotFoundException,IllegalArgumentException, IOException {RemoteIterator<LocatedFileStatus> iterator = fs.listFiles(new Path(HDFS_PATH + "/user/root/"), true);while (iterator.hasNext()) {System.out.println(iterator.next());}FileStatus[] fss = fs.listStatus(new Path(HDFS_PATH + "/"));Path[] ps = FileUtil.stat2Paths(fss);for (Path p : ps) {System.out.println(p);}FileStatus sta = fs.getFileStatus(new Path("hdfs://localhost:8020/user/root/eclipse-jee-luna-R-linux-gtk-x86_64.tar.gz"));BlockLocation[] bls = fs.getFileBlockLocations(sta, 0, sta.getLen());for (BlockLocation b : bls) {for (String s : b.getTopologyPaths())System.out.println(s);for (String s : b.getHosts())System.out.println(s);}

在文件的后头追加东西 append

首先我们要设置hdfs支持在文件后头追加内容
在hdfs-site.xml 加入
<property>        <name>dfs.support.append</name>        <value>true</value>   </property>
代码实现是:
@Testpublic void appendFile() {String hdfs_path = "hdfs://localhost:8020/user/root/input/test.txt";// 文件路径// conf.setBoolean("dfs.support.append", true);String inpath = "/test.txt1";try {// 要追加的文件流,inpath为文件InputStream in = new BufferedInputStream(new FileInputStream(inpath));OutputStream out = fs.append(new Path(hdfs_path));IOUtils.copyBytes(in, out, 4096, true);} catch (IOException e) {e.printStackTrace();}}






1 0
原创粉丝点击