Python通过thrift访问hadoop:报错java.lang.IllegalArgumentException: Wrong FS: hdfs:/ expected file:///

来源:互联网 发布:win10需要优化软件吗 编辑:程序博客网 时间:2024/05/22 16:55

近日在研究使用python操作hdfs集群上面的文件,由于暂时不想使用第三方库,故使用thrift的方式。
在借鉴了这里之后,写了简单一段脚本hdfs-test.py

import syssys.path.append('gen-py')from hdfs import hadoopthrift_clihost = '10.33.28.200'port = 10086fs_con = hadoopthrift_cli(host,port)fs_con.connect()fs_con.do_ls(r'hdfs://10.33.28.200:9000/')

然后修改服务器端脚本start_thrift_server.sh,主要是修改其中jar包的位置信息。
启动服务器端

[root@hadoop1 scripts]# sh start_thrift_server.sh 10086Starting the hadoop thrift server on port [10086]...15/04/18 21:30:52 INFO hadoop.thrift: Starting the hadoop thrift server on port [10086]...

启动客户端

python hdfs-test.py

不料,此时的提示是这样的

[root@test py-hdfs]# python hdfs-test.py Traceback (most recent call last):  File "hdfs-test.py", line 11, in <module>    fs_con.do_ls(r'hdfs://10.33.28.200:9000/')  File "/root/py-hdfs/hdfs.py", line 297, in do_ls    status = self.client.stat(path)  File "gen-py/hadoopfs/ThriftHadoopFileSystem.py", line 452, in stat    return self.recv_stat()  File "gen-py/hadoopfs/ThriftHadoopFileSystem.py", line 463, in recv_stat    (fname, mtype, rseqid) = self._iprot.readMessageBegin()  File "build/bdist.linux-i686/egg/thrift/protocol/TBinaryProtocol.py", line 126, in readMessageBegin  File "build/bdist.linux-i686/egg/thrift/protocol/TBinaryProtocol.py", line 206, in readI32  File "build/bdist.linux-i686/egg/thrift/transport/TTransport.py", line 58, in readAll  File "build/bdist.linux-i686/egg/thrift/transport/TTransport.py", line 159, in read  File "build/bdist.linux-i686/egg/thrift/transport/TSocket.py", line 120, in readthrift.transport.TTransport.TTransportException: TSocket read 0 bytes

而服务器端也报错

[root@hadoop1 scripts]# sh start_thrift_server.sh 10086Starting the hadoop thrift server on port [10086]...15/04/18 22:49:39 INFO hadoop.thrift: Starting the hadoop thrift server on port [10086]...java.lang.IllegalArgumentException: Wrong FS: hdfs://10.33.28.200:9000/, expected: file:///        at org.apache.hadoop.fs.FileSystem.checkPath(FileSystem.java:390)        at org.apache.hadoop.fs.RawLocalFileSystem.pathToFile(RawLocalFileSystem.java:55)        at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:398)        at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:255)        at org.apache.hadoop.thriftfs.HadoopThriftServer$HadoopThriftHandler.stat(HadoopThriftServer.java:425)        at org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem$Processor$stat.process(Unknown Source)        at org.apache.hadoop.thriftfs.api.ThriftHadoopFileSystem$Processor.process(Unknown Source)        at com.facebook.thrift.server.TThreadPoolServer$WorkerProcess.run(Unknown Source)        at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1146)        at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)        at java.lang.Thread.run(Thread.java:701)

但是如果将客户端文件里面的访问文件地址改为

fs_con.do_ls(r'/')

则会显示出服务器的本地文件结构

[root@test py-hdfs]# python hdfs-test.py 1       4096    1429308924000   rwxrwxrwx       root    root    file:/tmp1       12288   1400066227000   r-xr-xr-x       root    root    file:/sbin1       4096    1401269647000   rwxrwxrwx       root    root    file:/share1       4096    1316778468000   rwxr-xr-x       root    root    file:/mnt1       0       1429300092000   rw-r--r--       root    root    file:/.autofsck1       12288   1400066213000   r-xr-xr-x       root    root    file:/lib1       16384   1396620511000   rwx------       root    root    file:/lost+found1       4096    1400066199000   rwxr-xr-x       root    root    file:/var1       4096    1429311134000   r-xr-x---       root    root    file:/root1       4096    1316778468000   rwxr-xr-x       root    root    file:/srv1       4096    1396620633000   rwxr-xr-x       root    root    file:/selinux1       1024    1396620854000   r-xr-xr-x       root    root    file:/boot1       0       1429300085000   rwxr-xr-x       root    root    file:/sys1       0       1400066782000   rw-r--r--       root    root    file:/.autorelabel1       4096    1316778468000   rwxr-xr-x       root    root    file:/home1       4096    1401446050000   rwxr-xr-x       root    root    file:/media1       0       1429300085000   r-xr-xr-x       root    root    file:/proc1       4096    1429364692000   rwxr-xr-x       root    root    file:/etc1       4096    1416433067000   rwxr-xr-x       root    root    file:/usr1       4096    1316778468000   rwxr-xr-x       root    root    file:/opt1       3720    1429300104000   rwxr-xr-x       root    root    file:/dev1       4096    1400066213000   r-xr-xr-x       root    root    file:/bin

感觉应该是项目无法找到文件系统的配置文件,即core-site.xml中的关于hdfs的地址。但将conf目录放入path仍然无法解决。

后来查看各类代码,包括HadoopThriftServer.java等,查找了各种path,pathname等字段的写法,也未发现问题。

人言道,外事不决问谷歌,内事不决问百度。可能搜索能力还是不够,在谷歌折腾一天也没找到,倒是反而在百度搜到了解决办法。详情请看这里。

果然是因为项目无法找到原来是需要将配置文件,而此情况下,需将配置文件放于项目目录下即可。对于本项目,就把core-site.xml放在start_thrift_server.sh文件同一个目录下即可。

[root@test py-hdfs]# python hdfs-test.py 0       0       1413390909861   rwxr-xr-x       root    supergroup      hdfs://10.33.28.200:9000/root0       0       1413412534130   rwxr-xr-x       root    supergroup      hdfs://10.33.28.200:9000/user
0 0
原创粉丝点击