hadoop fs 与 hadoop dfs 的区别

来源:互联网 发布:vb平方根函数 编辑:程序博客网 时间:2024/04/30 10:24

The Difference Between 'Hadoop DFS' and 'Hadoop FS'

While exploring HDFS, I came across these two syntaxes for querying HDFS:
> hadoop dfs
> hadoop fs

Initally I couldn't differentiate between the two, and kept wondering why we have two different syntaxes for a common purpose. I found a number of people online with the same question -- their thoughts are below:
起初两个命令的区别并不明显,很好奇为什么会对同一个功能提供两种命令标记,搜索发现许多人对这一个问题的 想法如下:
Per Chris's explanation: it seems like there's no difference between the two syntaxes. If we look at the definitions of the two commands (hadoop fs and hadoop dfs) in $HADOOP_HOME/bin/hadoop
Per Chris的解释:从两个命令的定义中(在$HADOOP_HOME/bin/hadoop)可以看到这两者之间似乎没有什么区别。
...elif [ "$COMMAND" = "datanode" ] ; then  CLASS='org.apache.hadoop.hdfs.server.datanode.DataNode'  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_DATANODE_OPTS"elif [ "$COMMAND" = "fs" ] ; then  CLASS=org.apache.hadoop.fs.FsShell  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"elif [ "$COMMAND" = "dfs" ] ; then  CLASS=org.apache.hadoop.fs.FsShell  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"elif [ "$COMMAND" = "dfsadmin" ] ; then  CLASS=org.apache.hadoop.hdfs.tools.DFSAdmin  HADOOP_OPTS="$HADOOP_OPTS $HADOOP_CLIENT_OPTS"...

That was his reasoning. Unconvinced, I kept looking for a more persuasive answer, and these excerpts made more sense to me:

FS relates to a generic file system which can point to any file systems like local, HDFS etc. But dfs is very specific to HDFS. So when we use FS it?can perform operation with from/to local or hadoop distributed file system to destination. But specifying DFS operation relates to?HDFS.
这个理由并没有让我完全信服,我继续找一些更有说服力的答案,下面这个解释貌似更有道理:FS涉及到一个通用的文件系统,可以指向任何的文件系统如local,HDFS等。但是DFS仅是针对HDFS的。那么什么时候用FS呢?可以在本地与hadoop分布式文件系统的交互操作中使用。特定的DFS指令与HDFS有关。

Below are two excerpts from the Hadoop documentation that describe these two as different shells.

下面是两个摘录Hadoop文档,描述这两个不同的shell。


FS ShellThe FileSystem (FS) shell is invoked by bin/hadoop fs. All the FS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenodehost/parent/child or simply as /parent/child (given that your configuration is set to point to hdfs://namenodehost). Most of the commands in FS shell behave like corresponding Unix commands.?

DFShellThe HDFS shell is invoked by bin/hadoop dfs. All the HDFS shell commands take path URIs as arguments. The URI format is scheme://autority/path. For HDFS the scheme is hdfs, and for the local filesystem the scheme is file. The scheme and authority are optional. If not specified, the default scheme specified in the configuration is used. An HDFS file or directory such as /parent/child can be specified as hdfs://namenode:namenodeport/parent/child or simply as /parent/child (given that your configuration is set to point to namenode:namenodeport). Most of the commands in HDFS shell behave like corresponding Unix commands.?

So, based on the above, we can conclude that it all depends on the scheme configuration. When using these two commands with absolute URI (i.e. scheme://a/b) the?behavior?shall be identical. Only it's the default configured scheme value for file and hdfs for fs and dfs respectively, which is the cause for difference in?behavior. 
由上述内容可以看出来,这两个命令依赖于模式的配置。当使用绝对URI(如scheme://a/b)这两个命令式相同的。只有默认的模式配置参数对dfs和fs起作用。

From:http://java.dzone.com/articles/difference-between-hadoop-dfs