How-to: resolve spark "/usr/bin/python: No module named pyspark" issue

来源:互联网 发布:des算法加解密过程 编辑:程序博客网 时间:2024/06/03 17:36
Error: 
 Error from python worker:
/usr/bin/python: No module named pyspark
PYTHONPATH was:
/home/hadoop/tmp/nm-local-dir/usercache/chenfangfang/filecache/43/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:392) 

Root cause:
I am using 1.7.0_45. While python spark on yarn has some issue which makes pyspark does not work with spark build with jdk7: https://issues.apache.org/jira/browse/SPARK-1520. There was not such issue with cdh 5.4.1 spark. But cdh 5.4.1 announced that it was using jdk 1.7.0_45, while its spark was build with jdk6.  

Solution:
It is not reasonable for us to rebuild spark with jdk 6, as there are some issue during building. One available solution could be:

Regerate new package with following way:
unzip -d foo spark/lib/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
cd foo
$JAVA6_HOME/bin/jar cvmf META-INF/MANIFEST.MF ../spark/lib/spark-assembly-1.3.0-cdh5.4.1-hadoop2.6.0-cdh5.4.1.jar
don't neglect the dot at the end of that command

0 0
原创粉丝点击