python 安装spark_Spark环境搭建 (Python)

来源:互联网 发布:淘宝哪家的玉是真的 编辑:程序博客网 时间:2024/06/05 06:29
  1. 安装lib

材料:

spark : http://spark.apache.org/downloads.htmlhadoop : http://hadoop.apache.org/releases.htmljdk: http://www.oracle.com/technetwork/java/javase/downloads/index-jsp-138363.htmlhadoop-commin : https://github.com/srccodes/hadoop-common-2.2.0-bin/archive/master.zip (for windows7)

需要下载对应的版本

步骤:

a. 安装jdk,默认步骤即可b. 解压spark  (D:\spark-2.0.0-bin-hadoop2.7)c. 解压hadoop  (D:\hadoop2.7)d. 解压hadoop-commin (for w7)e. copy hadoop-commin/bin to hadoop/bin (for w7)
  1. 环境变量设置
SPARK_HOME = D:\spark-2.0.0-bin-hadoop2.7HADOOP_HOME = D:\hadoop2.7PATH append = D:\spark-2.0.0-bin-hadoop2.7\bin;D:\hadoop2.7\bin
  1. Python lib设置
a. copy D:\spark-2.0.0-bin-hadoop2.7\python\pyspark to [Your-Python-Home]\Lib\site-packagesb. pip install py4jc. pip install psutil   (for windows: http://www.lfd.uci.edu/~gohlke/pythonlibs/#psutil)
  1. Testing

cmd -> pyspark 不报错并且有相应的cmd

原创粉丝点击