spark上安装graphframes
来源:互联网 发布:网站流量统计java代码 编辑:程序博客网 时间:2024/06/06 02:26
安装环境
java:1.8
centos:6
spark:2.1.0
graphframes:0.5
1、安装和测试graphframes(root账户)
a、下载graphframes的最新版jar包到spark目录下的python/lib目录
cd /usr/hdp/2.6.0.3-8/spark2/python/lib
wget http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar
b、配置/etc/profile, 在最后增加一条
PYTHONPATH=/usr/hdp/2.6.0.3-8/spark2/python/lib/graphframes-0.5.0-spark2.1-s_2.11.jar:$PYTHONPATH
c、执行下列命令,使之生效
source /etc/profile
d、安装graphframes
spark-shell --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
这样会在~/.ivy2/jars/目录中生成5个jar包文件,
e、将前面生成的5个jar包文件复制到spark目录的sharelib子目录
mkdir /usr/hdp/2.6.0.3-8/spark2/sharelib
cp -r ~/.ivy2/jars/*.jar /usr/hdp/2.6.0.3-8/spark2/sharelib/
f、将sharelib子目录复制到整个集群的所有节点上
scp -r /usr/hdp/2.6.0.3-8/spark2/sharelib/ cloud102:/usr/hdp/2.6.0.3-8/spark2/sharelib
scp -r /usr/hdp/2.6.0.3-8/spark2/sharelib/ cloud103:/usr/hdp/2.6.0.3-8/spark2/sharelib
...
scp -r /usr/hdp/2.6.0.3-8/spark2/sharelib/ cloud10*:/usr/hdp/2.6.0.3-8/spark2/sharelib
g、修改spark-env.sh,
vim /usr/hdp/2.6.0.3-8/spark2/conf/spark-env.sh
在其中添加:
jarsPath=.
for i in $SPARK_HOME/sharelib/*.jar;
do jarsPath=$i:$jarsPath;
done
len=${#jarsPath}
jarsPath=${jarsPath:0:len-2}
SPARK_CLASSPATH=$jarsPath;
echo $SPARK_CLASSPATH
这样spark将能搜索到graphframes实现;
若使用oozie启动,则需要将spark-env.sh复制到所有的节点
2、启动jupyter测试
(jupyter自带了new terminal功能,通过它能直接使用启用用户终端登录,所以为安全起见,我们需要另外建立一个账户启动;)
a、创建jupyter组和jupyter用户
groupadd jupyter
useradd -g jupyter jupyter
使用jupyter用户而不是root用户,是为了保证安全性;
b、在hdfs上创建jupyter的个人目录,并授权
hdfs dfs -mkdir /user/jupyter
hdfs dfs -chown -R jupyter:jupyter /user/jupyter
hdfs dfs -chmod 755 /user/jupyter
c、启动jupyter
nohup jupyter-notebook --notebook-dir=./ipython-code/ --no-browser --ip='172.16.11.92' --port=8889 >jupyter_nohup.log 2>&1 &
d、登录jupyter,在浏览器中打开
http://192.168.0.1:8889/
如果要输入token,在后台用jupyter notebook list命令查看
e、在jupyter中新建一个python程序,执行
from graphframes import *
若无错误,则表示graphframes引入成功
f、完整样例
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from pyspark.context import SparkConf
from graphframes import *
conf=SparkConf().setAppName("jupyter_xuyufei").setMaster("yarn").set("deploy-mode","client")
# conf.set("num-executors", "6").set("executor-cores", 1).set("executor-memory", "3g").set("driver-memory", "1g")
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)
v = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
e = sqlContext.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
g = GraphFrame(v, e)
g.inDegrees.show()
result = g.labelPropagation(maxIter=2)
result.select("id", "label").show()
3、使用spark-submit测试
将上面的完整样例程序存成demo.py,执行下面语句:
spark-submit --master yarn --deploy-mode client --num-executors 6 --driver-memory 1g --executor-memory 1g --executor-cores 1 demo.py
java:1.8
centos:6
spark:2.1.0
graphframes:0.5
1、安装和测试graphframes(root账户)
a、下载graphframes的最新版jar包到spark目录下的python/lib目录
cd /usr/hdp/2.6.0.3-8/spark2/python/lib
wget http://dl.bintray.com/spark-packages/maven/graphframes/graphframes/0.5.0-spark2.1-s_2.11/graphframes-0.5.0-spark2.1-s_2.11.jar
b、配置/etc/profile, 在最后增加一条
PYTHONPATH=/usr/hdp/2.6.0.3-8/spark2/python/lib/graphframes-0.5.0-spark2.1-s_2.11.jar:$PYTHONPATH
c、执行下列命令,使之生效
source /etc/profile
d、安装graphframes
spark-shell --packages graphframes:graphframes:0.5.0-spark2.1-s_2.11
这样会在~/.ivy2/jars/目录中生成5个jar包文件,
e、将前面生成的5个jar包文件复制到spark目录的sharelib子目录
mkdir /usr/hdp/2.6.0.3-8/spark2/sharelib
cp -r ~/.ivy2/jars/*.jar /usr/hdp/2.6.0.3-8/spark2/sharelib/
f、将sharelib子目录复制到整个集群的所有节点上
scp -r /usr/hdp/2.6.0.3-8/spark2/sharelib/ cloud102:/usr/hdp/2.6.0.3-8/spark2/sharelib
scp -r /usr/hdp/2.6.0.3-8/spark2/sharelib/ cloud103:/usr/hdp/2.6.0.3-8/spark2/sharelib
...
scp -r /usr/hdp/2.6.0.3-8/spark2/sharelib/ cloud10*:/usr/hdp/2.6.0.3-8/spark2/sharelib
g、修改spark-env.sh,
vim /usr/hdp/2.6.0.3-8/spark2/conf/spark-env.sh
在其中添加:
jarsPath=.
for i in $SPARK_HOME/sharelib/*.jar;
do jarsPath=$i:$jarsPath;
done
len=${#jarsPath}
jarsPath=${jarsPath:0:len-2}
SPARK_CLASSPATH=$jarsPath;
echo $SPARK_CLASSPATH
这样spark将能搜索到graphframes实现;
若使用oozie启动,则需要将spark-env.sh复制到所有的节点
2、启动jupyter测试
(jupyter自带了new terminal功能,通过它能直接使用启用用户终端登录,所以为安全起见,我们需要另外建立一个账户启动;)
a、创建jupyter组和jupyter用户
groupadd jupyter
useradd -g jupyter jupyter
使用jupyter用户而不是root用户,是为了保证安全性;
b、在hdfs上创建jupyter的个人目录,并授权
hdfs dfs -mkdir /user/jupyter
hdfs dfs -chown -R jupyter:jupyter /user/jupyter
hdfs dfs -chmod 755 /user/jupyter
c、启动jupyter
nohup jupyter-notebook --notebook-dir=./ipython-code/ --no-browser --ip='172.16.11.92' --port=8889 >jupyter_nohup.log 2>&1 &
d、登录jupyter,在浏览器中打开
http://192.168.0.1:8889/
如果要输入token,在后台用jupyter notebook list命令查看
e、在jupyter中新建一个python程序,执行
from graphframes import *
若无错误,则表示graphframes引入成功
f、完整样例
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
from pyspark.context import SparkConf
from graphframes import *
conf=SparkConf().setAppName("jupyter_xuyufei").setMaster("yarn").set("deploy-mode","client")
# conf.set("num-executors", "6").set("executor-cores", 1).set("executor-memory", "3g").set("driver-memory", "1g")
sc=SparkContext(conf=conf)
sqlContext = SQLContext(sc)
v = sqlContext.createDataFrame([
("a", "Alice", 34),
("b", "Bob", 36),
("c", "Charlie", 30),
], ["id", "name", "age"])
e = sqlContext.createDataFrame([
("a", "b", "friend"),
("b", "c", "follow"),
("c", "b", "follow"),
], ["src", "dst", "relationship"])
g = GraphFrame(v, e)
g.inDegrees.show()
result = g.labelPropagation(maxIter=2)
result.select("id", "label").show()
3、使用spark-submit测试
将上面的完整样例程序存成demo.py,执行下面语句:
spark-submit --master yarn --deploy-mode client --num-executors 6 --driver-memory 1g --executor-memory 1g --executor-cores 1 demo.py
阅读全文
0 0
- spark上安装graphframes
- GraphFrames介绍
- 在集群上安装spark
- Mac上安装spark过程
- spark安装:在hadoop YARN上运行spark-shell
- CentOS上安装spark standalone mode
- 在 Yarn 上 安装 Spark 0.9.0
- spark上安装mysql与hive
- hadoop2.2.0上spark伪分布式安装
- 在docker上安装 Spark 1.2.0
- 在Windows上安装单机Spark
- Spark集群安装(h15\h16\h18上)
- Spark客户端在centos6.4上安装
- CentOS 7上安装Spark 2.2单机
- CDH5上安装Hive,HBase,Impala,Spark等服务
- 分布式-ubuntu12.04上安装spark-1.0.0
- 在Linux集群上安装和配置Spark
- CDH5上安装Hive,HBase,Impala,Spark等服务
- 机器学习、深度学习、计算机视觉、自然语言处理及应用案例——干货分享
- python爬虫爬取糗事百科的段子
- 输入剩余字数
- Java中的阻塞队列
- P4243【ZJOI2007】时态同步
- spark上安装graphframes
- 搭建Spring Cloud Eureka 服务的注册和发现小项目
- Unable to locate JAR/zip in file system as specified by the driver definition: mysql-connector-java-
- c++入门小知识
- 美团面试 2017年秋季
- 面试可能遇到的一些问题
- lua 使用工具
- Java学习(3)_数组
- 如何进行结构体排序