大数据基础（八） Spark 2.0.0下IPython和Notebook的安装配置

来源：互联网发布：行会2修改数据编辑：程序博客网时间：2024/06/06 05:02

环境：

spark 2.0.0,anaconda2

1.spark ipython和notebook安装配置

方法一：

这个方法可以通过网页进入ipython notebook，另开终端可以进入pyspark
如果装有Anaconda 就可以直接如下方式获得IPython界面的登陆，没有装Anaconda的参考最下边的链接自行安装ipython相关包。
vi ~/.bashrc
export PYSPARK_DRIVER_PYTHON=ipython
export PYSPARK_DRIVER_PYTHON_OPTS="notebook --NotebookApp.open_browser=False --NotebookApp.ip='*' --NotebookApp.port=8880"
source ~/.bashrc

重新启动pyspark
出现
ting a Notebook with PySpark
On the driver host, choose a directory notebook_directory to run the Notebook. notebook_directory contains the .ipynb files that represent the different notebooks that can be served.
In notebook_directory, run pyspark with your desired runtime options. You should see output like the following:
参考：
ipython和jupyter on spark 2.0.0
http://www.cloudera.com/documentation/enterprise/5-5-x/topics/spark_ipython.html

方法二：
方法二用ipython可以，但是jupyter有问题，不知道是不是个别的
It is also possible to launch the PySpark shell in IPython, the enhanced Python interpreter. PySpark works with IPython 1.0.0 and later. To use IPython, set the PYSPARK_DRIVER_PYTHON variable to ipython when running bin/pyspark:

$ PYSPARK_DRIVER_PYTHON=ipython ./bin/pyspark
To use the Jupyter notebook (previously known as the IPython notebook),

$ PYSPARK_DRIVER_PYTHON=jupyter ./bin/pyspark
You can customize the ipython or jupyter commands by setting PYSPARK_DRIVER_PYTHON_OPTS.

root@py-server:/server/bin# PYSPARK_DRIVER_PYTHON=ipython $SPARK_HOME/bin/pyspark
Python 2.7.12 |Anaconda 4.1.1 (64-bit)| (default, Jul 2 2016, 17:42:40)
Type "copyright", "credits" or "license" for more information.

IPython 4.2.0 -- An enhanced Interactive Python.
? -> Introduction and overview of IPython's features.
%quickref -> Quick reference.
help -> Python's own help system.
object? -> Details about 'object', use 'object??' for extra details.
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel).
16/08/03 22:24:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/__ / .__/\_,_/_/ /_/\_\ version 2.0.0
/_/

Using Python version 2.7.12 (default, Jul 2 2016 17:42:40)
SparkSession available as 'spark'.

In [1]:

2. 使用：

Open http://notebook_host:8880/ in a browser.
比如：http://spark01:8880/
New->Python打开Python界面
Shift+Enter or Shift+Return执行命令

注意：

设置IPython后，pyspark就只能用IPython，除非恢复环境变量

3.测试例子

引用：《Spark for Python Developers》

file_in换成你自己的文件，如果是本地就用#那一句，hdfs就默认，修改一下具体地址即可。

0 0