PySpark in PyCharm on a remote server
来源:互联网 发布:淘宝极速退款五千额度 编辑:程序博客网 时间:2024/05/16 06:52
流程
一、虚拟机中共享本地目录,见前文:《通过virtualbox实现虚拟机中共享本地目录》
二、python安装或相关问题见《Install Python 3 on CentOS 6.5 Server》
三、当然,spark是必须的,见《centos单机安装Spark1.4.0》(用到hadoop,见《centos单机安装Hadoop2.6》)
四、remote端安装、设置
vi /etc/profile
添加一行:PYTHONPATH=$SPARK_HOME/python/:$SPARK_HOME/python/lib/py4j-0.8.2.1-src.zip
source /etc/profile
# 安装pip 和 py4j
下载pip-7.1.2.tar
tar -xvf pip-7.1.2.tar
cd pip-7.1.2
python setup.py install
pip install py4j
# 避免ssh时tty检测
cd /etc
chmod 640 sudoers
vi /etc/sudoers
#Default requiretty
五、本地Pycharm设置
Settings > Project Interpreter:
Project Interpreter > Add remote(前提:remote端python安装成功):
注意,如果python安装在其它路径,要把路径改过来,如:
Run > Edit Configuration (前提:虚拟机中共享本地目录成功):
六、测试
import osimport sysos.environ['SPARK_HOME'] = '/root/spark-1.4.0-bin-hadoop2.6'sys.path.append("/root/spark-1.4.0-bin-hadoop2.6/python")try: from pyspark import SparkContext from pyspark import SparkConf print ("Successfully imported Spark Modules")except ImportError as e: print ("Can not import Spark Modules", e) sys.exit(1)
Result:
ssh://root@192.168.22.250:22/usr/bin/python -u /mnt/shared/test01/test01a.pySuccessfully imported Spark ModulesProcess finished with exit code 0
来个复杂些的:
import syssys.path.append("/root/programs/spark-1.4.0-bin-hadoop2.6/python")try: import numpy as np import scipy.sparse as sps from pyspark.mllib.linalg import Vectors dv1 = np.array([1.0, 0.0, 3.0]) dv2 = [1.0, 0.0, 3.0] sv1 = Vectors.sparse(3, [0, 2], [1.0, 3.0]) sv2 = sps.csc_matrix((np.array([1.0, 3.0]), np.array([0, 2]), np.array([0, 2])), shape=(3, 1)) print(sv2)except ImportError as e: print("Can not import Spark Modules", e) sys.exit(1)
Result
ssh://root@192.168.22.250:22/root/programs/python3/bin/python -u /mnt/shared/test01/test01a.py (0, 0)1.0 (2, 0)3.0Process finished with exit code 0
Q&A
Q: sudo: sorry, you must have a tty to run sudo
A:
cd /etc
chmod 640 sudoers
vi /etc/sudoers
#Default requiretty #注释掉 Default requiretty 一行。意思就是sudo默认需要tty终端,注释掉就可以在后台执行了。
Q: VirtualBox的Shared folder功能出现broken shared folder错误
A: 见上文中提到的虚拟机中共享本地目录
Q: 一会儿什么cannot import name accumulators, 一会儿什么cannot import name py4j
A:
下载pip-7.1.2.tar
tar -xvf pip-7.1.2.tar
cd pip-7.1.2
python setup.py install
pip install py4j
搞定!
参考
https://edumine.wordpress.com/2015/08/14/pyspark-in-pycharm/
http://renien.github.io/blog/accessing-pyspark-pycharm/
http://www.tuicool.com/articles/MJnYJb
等等。。。
- PySpark in PyCharm on a remote server
- Bulk insert on remote server is a unsolved bug!
- RUNNING JUPYTER NOTEBOOKS ON A REMOTE SERVER VIA SSH
- pycharm remote debug python in docker
- [小技巧] ssh -t (open a pseudo tty) run commands on a remote server
- ssh pub key on remote server
- 在pycharm上配置pyspark
- 在pycharm调试pyspark-streaming
- pypy on PySpark
- Use a “remote” EJB in Camel routes
- Pipe in PySpark
- Running a Remote Desktop on a Windows Azure Linux VM
- Tree View works through VWD but not via local host or from a remote browser. /Server Error in '/' Application.
- 如何使用PyCharm编写Spark程序(pyspark)
- pycharm开发spark导入pyspark包
- Pycharm调用Pyspark API配置,小记
- 探讨解决bulk insert on remote server的问题
- Block Remote IP address (hacker attack) on Windows 2008 Server
- Atitit. Class 元数据的反射操作 api apache 工具
- 61单片机项目 - 系统时钟
- 如何打开APP在应用市场中对应的的activity页面---Android拓展篇 ——诺诺"涂鸦"记忆
- cocos2d-x 用cocos code IDE 打包android mac
- AsyncTask类简单总结
- PySpark in PyCharm on a remote server
- 内存和I/O访问
- Mac 下 PostgreSQL 的安装与使用
- hdu 5418 (状态压缩)
- 四年程序员归零心态
- Android中的dispatchTouchEvent()、onInterceptTouchEvent()和onTouchEvent()
- 实体类(VO,DO,DTO)的划分
- c++题目——内存管理——待解决
- 为多态基类声明一个虚析构函数(Effective C++_7)