spark load file的几种方式

来源:互联网 发布:手机淘宝千牛怎么装修 编辑:程序博客网 时间:2024/05/18 02:05

spark load file的几种方式:

1、直接导入localfile,而不是HDFS
sc.textFile("file:///path to the file/")
如sc.textFile("file:///home/spark/Desktop/README.md")
注意:
当设置了HADOOP_CONF_DIR的时候,即配置了集群环境的时候,如果直接sc.textFile("path/README.md")
路径会自动变成: hdfs://master:9000/user/spark/README.md
这个时候如果HDFS中没有,就会说,input path does not exist
2、给hdfs 的路径也可以



相关内容:

1、
Spark Quick Start - call to open README.md needs explicit fs prefix
Good catch; the Spark cluster on EC2 is configured to use HDFS as its default filesystem, so
it can’t find this file. The quick start was written to run on a single machine with an
out-of-the-box install. If you’d like to upload this file to the HDFS cluster on EC2, use
the following command:
2、
This has been discussed into spark mailing list, and please refer this mail.
You should use hadoop fs -put <localsrc> ... <dst> copy the file into hdfs:
${HADOOP_COMMON_HOME}/bin/hadoop fs -put /path/to/README.md README.md
于是我 /bin/hadoop -fs -put /home/spark/Desktop/README.md README.md

但这种方法怎么试都不行,说no such file or directory,还在研究



1 0
原创粉丝点击