Apache Spark 的安装
来源:互联网 发布:timeedit控件优化 编辑:程序博客网 时间:2024/05/22 09:47
安装JAVA和Python
> sudo apt-get install openjdk-8-jdk> vim /etc/environment # 添加export到全局env里面> export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64# 如果不想重启,可以直接使用source> source /etc/environment> sudo apt-get install python python3
安装Hadoop
> wget http://apache.fayea.com/hadoop/common/hadoop-2.7.3/hadoop-2.7.3.tar.gz> tar -zxf hadoop-2.7.3.tar.gz > sudo mkdir input> sudo chmod -R 777 input> cp etc/hadoop/*.xml input/> sudo ./bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'> cat output/*
安装Apache Spark
> wget http://mirrors.tuna.tsinghua.edu.cn/apache/spark/spark-2.1.1/spark-2.1.1-bin-hadoop2.7.tgz> tar -zxf spark-2.1.1-bin-hadoop2.7.tgz > cd spark-2.1.1-bin-hadoop2.7
关联hadoop与spark
> sudo vim /etc/environment export LD_LIBRARY_PATH=/vagrant/hadoop-2.7.3/lib/native/:$LD_LIBRARY_PATH
安装pyspark到pip
> cd /vagrant/spark-2.1.1-bin-hadoop2.7/python> pip install -e .
运行测试
> ./bin/spark-shell --master local[2]# Python Version Spark Api> ./bin/pyspark --master local[2]
note:The –master option specifies the master URL for a distributed cluster, local[N] is run locally with N threads.
Spark Standalone Cluster Deploy
> sudo ./bin/spark-submit --master spark://192.168.33.67:7077 --executor-memory 4G --deploy-mode client --packages org.apache.spark:spark-streaming-kafka-0-8_2.11:2.1.1 /vagrant/stream_kafka.py
Spark Standalone client deploy mode and cluster deploy mode:
Client:
- Driver runs on a dedicated server (Master node) inside a dedicated process. This means it has all available resources at it’s disposal to execute work.
- Driver opens up a dedicated Netty HTTP server and distributes the JAR files specified to all Worker nodes (big advantage).
- Because the Master node has dedicated resources of it’s own, you don’t need to “spend” worker resources for the Driver program.
- If the driver process dies, you need an external monitoring system to reset it’s execution.
Cluster:
- Driver runs on one of the cluster’s Worker nodes. The worker is chosen by the Master leader
- Driver runs as a dedicated, standalone process inside the Worker.
- Driver programs takes up at least 1 core and a dedicated amount of memory from one of the workers (this can be configured).
- Driver program can be monitored from the Master node using the –supervise flag and be reset in case it dies.
- When working in Cluster mode, all JARs related to the execution of your application need to be publicly available to all the workers. This means you can either manually place them in a shared place or in a folder for each of the workers.
阅读全文
0 0
- Apache Spark 的安装
- window下安装Apache Spark
- apache spark单机安装教程
- Mac Apache-spark 单机安装
- 3.如何安装Apache Spark
- Spark交互式分析平台Apache Zeppelin的安装
- Spark交互式分析平台Apache Zeppelin的安装
- Spark交互式分析平台Apache Zeppelin的安装
- Apache Spark的发光点
- Apache Spark 的一些浅见。
- apache spark在windows7 下安装
- Apache Spark 学习笔记 (3) 安装Ambari
- apache spark
- Apache Spark
- Apache Spark
- Spark的安装
- Spark的安装,编译
- spark的安装方法
- 打算进军github!
- 文章链接
- 身份证校验
- 解决openssl: error while loading shared libraries: libssl.so.1.1: cannot open shared object file: No s
- mysql 查询一小时之内的数据
- Apache Spark 的安装
- Myeclipse 8.5破解方法
- 高级表单、表格+bfc讲解与使用
- 区块链开发底层交易虚拟币的找零机制是什么?
- Apache Storm 的安装
- 数组必知道的几个操作
- Merriam Webster's Vocabulary Builder Roots 韦小绿英文词根 Unit 1整理
- opencv里的内存泄漏(持续更新)
- 文字高度自适应