cnet6.5 32bit 安装spark
来源:互联网 发布:手机炒股软件排名 编辑:程序博客网 时间:2024/05/01 13:16
1、查看系统环境
- cat /etc/redhat-release
- uname -r
- uname -m
- /etc/init.d/iptables stop
- chkconfig iptables off
- chkconfig --list iptables
HostIPHadoopSparkNode1192.168.2.128MasterMasterNode2192.168.2.130SlaveSlaveNode3192.168.2.131SlaveSlaveNode4192.168.2.1132SlaveSlave
3、下载Hadoop2.6.0版本和对应的Spark1.6.0版本、下载jdk-7u65-linux-x64.rpm
在node1上创建一个的soft目录将对应的软件下载到soft目录
- mkdir /soft/
- cd /soft/
- wget http://archive.apache.org/dist/hadoop/core/hadoop-2.6.0/hadoop-2.6.0.tar.gz
- wget http://archive.apache.org/dist/spark/spark-1.6.0/spark-1.6.0-bin-hadoop2.6.tgz
4、部署spark分布式集群
4.1、对应IP和系统主机名
4.1.1将系统中的IP和主机名进行一个映射配置/etc/hosts中,方便后期直接操作直接使用主机名称
- vim /etc/hosts
- #编辑内容为
- 192.168.2.128 node1
- 192.168.2.130 node2
- 192.168.2.131 node3
- 192.168.2.132 node4
4.1.2、将node1上的/etc/hosts配置文件发送到其它三台主机上
- scp /etc/hosts root@192.168.2.130:/etc/hosts
- scp /etc/hosts root@192.168.2.131:/etc/hosts
- scp /etc/hosts root@192.168.2.132:/etc/hosts
4.2、在四台服务器上创建spark用户
- #node1 创建用户
- useradd spark
- #node2 创建用户
- useradd spark
- #node3 创建用户
- useradd spark
- #node4 创建用户
- useradd spark
4.3、配置主机直接免密码登录
配置四台主机通过ssh服务进行系统的免密码登录,后期hadoop、spark向slave发送命令时就无需输入对应的口令了。
4.3.1、配置SSH免登录
- #node1
- ssh-keygen -t rsa -P ''
- #node2
- ssh-keygen -t rsa -P ''
- #node3
- ssh-keygen -t rsa -P ''
- #node4
- ssh-keygen -t rsa -P ''
4.3.2、将node2、node3、node4三台服务器上的/home/spark/.ssh/id_rsa.pub文件拷贝到node1上并进行重命名
- #node2
- scp /home/spark/.ssh/id_rsa.pub spark@node1:/home/spark/.ssh/node2.id_rsa.pub
- #node3
- scp /home/spark/.ssh/id_rsa.pub spark@node1:/home/spark/.ssh/node3.id_rsa.pub
- #node4
- scp /home/spark/.ssh/id_rsa.pub spark@node1:/home/spark/.ssh/node4.id_rsa.pub
- #node1
- cat /home/spark/.ssh/id_rsa.pub >> /home/spark/.ssh/authorized_keys
- cat /home/spark/.ssh/node2.id_rsa.pub >> /home/spark/.ssh/authorized_keys
- cat /home/spark/.ssh/node3.id_rsa.pub >> /home/spark/.ssh/authorized_keys
- cat /home/spark/.ssh/node4.id_rsa.pub >> /home/spark/.ssh/authorized_keys
- chmod 600 /home/spark/.ssh/authorized_keys
- scp /home/spark/.ssh/authorized_keys spark@node2:/home/spark/.ssh/authorized_keys
- scp /home/spark/.ssh/authorized_keys spark@node3:/home/spark/.ssh/authorized_keys
- scp /home/spark/.ssh/authorized_keys spark@node4:/home/spark/.ssh/authorized_keys
4.4、安装JDK
将/soft/jdk-7u65-Linux-x64.rpm软件通过scp命令拷贝到其它节点上然后在各个节点上安装jdk
- #node1
- rpm -ivh /soft/jdk-7u65-linux-x64.rpm
- vim /etc/profile
- export JAVA_HOME=/usr/java/latest
- export JRE_HOME=$JAVA_HOME/jre
- export PATH=$JAVA_HOME/bin:$PATH
- export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
配置完成后通过使其生效
- source /etc/profile
其它服务器安装node1上进行配置即可
4.5、安装hadoop
在node1上解压hadoop软件到/usr/local/- tar xf /soft/hadoop-2.6.0.tar.gz -C /usr/local/
- ln -sv /usr/local/hadoop-2.6.0/ /usr/local/hadoop
1、将hadoop配置到环境变量中/etc/profile内容:
- export HADOOP_HOME=/usr/local/hadoop-2.6.0
- export HADOOP_CONF_DIR=$HADOOP_HOME/etc/hadoop
- export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
修改hadoop的配置文件
1、文件 slaves,将作为 DataNode 的主机名写入该文件,每行一个,默认为 localhost,所以在伪分布式配置时,节点即作为 NameNode 也作为 DataNode。分布式配置可以保留 localhost,也可以删掉,让 Master 节点仅作为 NameNode 使用。
- vim /usr/local/hadoop-2.6.0/etc/hadoop/slaves
- node2
- node3
- node4
2、修改/usr/local/hadoop/etc/hadoop/core-site.xml文件
- <configuration>
- <property>
- <name>fs.defaultFS</name>
- <value>hdfs://node1:9000</value>
- </property>
- <property>
- <name>hadoop.tmp.dir</name>
- <value>/hadoopdata/tmp</value>
- <description>Abase for other temporary directories.</description>
- </property>
- </configuration>
3、文件 hdfs-site.xml,dfs.replication 一般设为 3
- <configuration>
- <property>
- <name>dfs.namenode.secondary.http-address</name>
- <value>node1:50090</value>
- </property>
- <property>
- <name>dfs.replication</name>
- <value>3</value>
- </property>
- <property>
- <name>dfs.namenode.name.dir</name>
- <value>/hadoopdata/tmp/dfs/name</value>
- </property>
- <property>
- <name>dfs.datanode.data.dir</name>
- <value>/hadoopdata/tmp/dfs/data</value>
- </property>
- </configuration>
- <configuration>
- <property>
- <name>mapreduce.framework.name</name>
- <value>yarn</value>
- </property>
- <property>
- <name>mapreduce.jobhistory.address</name>
- <value>node1:10020</value>
- </property>
- <property>
- <name>mapreduce.jobhistory.webapp.address</name>
- <value>node1:19888</value>
- </property>
- </configuration>
- <configuration>
- <property>
- <name>yarn.resourcemanager.hostname</name>
- <value>node1</value>
- </property>
- <property>
- <name>yarn.nodemanager.aux-services</name>
- <value>mapreduce_shuffle</value>
- </property>
- </configuration>
- mkdir /hadoopdata/tmp -pv
- mkdir -p /hadoopdata/tmp/dfs/{name,data}
- chown -R spark:spark /hadoopdata/
7、将node1节点上的hadoop分发到其它的节点上(首先删除/usr/local/hadoop/share/doc/这个文件都是文档)
- scp -r /usr/local/hadoop-2.6.0/ root@node2:/usr/local/hadoop-2.6.0/
- scp /etc/profile root@node2:/etc/profile
- source /etc/profile
- chown -R spark:spark /usr/local/hadoop-2.6.0/
10、格式化hdfs
- hdfs namenode -format
11、启动hdfs、yarn
- start-dfs.sh
- start-yarn.sh
- hdfs dfsadmin -report
- http://node1:50070/dfshealth.html#tab-datanode
4.6、安装scala
spark依赖的Scala的版本为2.1.x版本所以选择的scala版本为scala-2.10.6.tgz
1、解压scala到/usr/local/中
- tar xf /soft/scala-2.10.6.tgz -C /usr/local/
- vim /etc/profile
- #内容为:
- export SCALA_HOME=/usr/local/scala-2.10.6
- export PATH=$SCALA_HOME/bin:$PATH
- source /etc/profile
4.6、安装spark
1、在node1上解压spark软件到/usr/local/- tar xf /soft/spark-1.6.0-bin-hadoop2.6.tgz -C /usr/local/
- cp /usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh.template /usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-env.sh
- export JAVA_HOME=/usr/java/latest
- export SCALA_HOME=/usr/local/scala-2.10.6
- export HADOOP_HOME=/usr/local/hadoop-2.6.0
- export HADOOP_CONF_DIR=/usr/local/hadoop-2.6.0/etc/hadoop
- export SPARK_MASTER_IP=node1
- export SPARK_WORKER_MEMORY=1g
- export SPARK_EXECUTOR_MEMORY=1g
- export SPARK_DRIVER_MEMORY=1g
- export SPARK_WORKER_CORES=1
配置内容如下
- node2
- node3
- node4
- cp /usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-defaults.conf.template /usr/local/spark-1.6.0-bin-hadoop2.6/conf/spark-defaults.conf
- #配置内容
- spark.executor.extraJavaOptions -XX:+PrintGCDetails -Dkey=value -Dnumbers="one two three"
- spark.eventLog.enabled true
- spark.eventLog.dir hdfs://node1:9000/historyserverforSpark
- spark.yarn.historyServer.address node1:18080
- spark.history.fs.logDirectory hdfs://node1:9000/historyserverforSpark
注意需要说明的是:配置了spark历史记录信息情况必须要在hdfs文件系统中创建historyserverforSpark此目录,如果没有创建spark是无法启动记录历史信息的进程
- hadoop dfs -mkdir /historyserverforSpark
5、配置系统环境变量
- vim /etc/profile
- #内容为:
- export SPARK_HOME=/usr/local/spark-1.6.0-bin-hadoop2.6
- export PATH=$SPARK_HOME/bin:$SPARK_HOME/sbin:$PATH
- source /etc/profile
- scp -r /usr/local/spark-1.6.0-bin-hadoop2.6/ root@node2:/usr/local/spark-1.6.0-bin-hadoop2.6/
- scp /etc/profile root@node2:/etc/profile
- chown -R spark:spark /usr/local/spark-1.6.0-bin-hadoop2.6/
- /usr/local/spark-1.6.0-bin-hadoop2.6/sbin/start-all.sh
- #启动一个记录历史程序的进程
- /usr/local/spark-1.6.0-bin-hadoop2.6/sbin/start-history-server.sh
9、启动HistoryServer记录spark程序历史记录
- /usr/local/spark-1.6.0-bin-hadoop2.6/sbin/start-history-server.sh
5、测试运行一个spark程序
- /usr/local/spark-1.6.0-bin-hadoop2.6/bin/spark-submit --class org.apache.spark.examples.SparkPi --master spark://node1:7077 /usr/local/spark-1.6.0-bin-hadoop2.6/lib/spark-examples-1.6.0-hadoop2.6.0.jar 1000
./bin/spark-submit \
--class <main-class> \
--master <master-url> \
--deploy-mode <deploy-mode> \
--conf <key>=<value> \
... # other options
<application-jar> \
[application-arguments]
2、--master: The master URL for the cluster (e.g. spark://23.195.26.187:7077)#集群中master的访问入口
3、--deploy-mode: Whether to deploy your driver on the worker nodes (cluster) or locally as an external client (client) (default: client)#spark作业提交模式
4、--conf: Arbitrary Spark configuration property in key=value format. For values that contain spaces wrap “key=value” in quotes (as shown).#spark提交作业的参数设置,参数设置为(key=value)这种模式
5、application-jar: Path to a bundled jar including your application and all dependencies. The URL must be globally visible inside of your cluster, for instance, an hdfs:// path or a file:// path that is present on all nodes.#spark提交作业的jar路径
6、application-arguments: Arguments passed to the main method of your main class, if any#此参数可以通过main中的参数输入到程序中
- cnet6.5 32bit 安装spark
- Red Hat Enterprise Linux 5(32Bit/64Bit)安装Oracle(10g/11g)!!!
- mongoDB windows 32bit 安装
- winXP 32bit 安装AS
- centos7 32bit 安装epel
- win7(32bit)+vs2008+opnet14.5安装教程
- CentOS6.5 32bit安装Oracle-11gR2步骤说明
- Ubuntu 64bit安装32bit软件办法
- 安装ANDROID开发工具(FOR WINDOWS 32BIT/64BIT)
- 32bit / 64bit 服务器上软件安装注意事项
- CentOS 64bit 安装 32bit 的glibc
- [zz]64bit ubuntu 安装32bit软件
- [转]通过32bit Winpe 安装64bit windows2008
- CentOS 4.4(32bit)安装过程
- 32bit Windows7安装4G内存
- 32bit Ubuntu安装4G内存
- ubuntu 10.04 32bit JDK 1.5 安装
- WinXP-32bit下安装mongodb
- Java输出结果保留两位小数
- Pow(x, n)
- 闰秒及其对计算机系统影响
- Hibernate:cannot simultaneously fetch multiple bags 解决方案
- 空指针异常的解决思路
- cnet6.5 32bit 安装spark
- iOS你该掌握什么?
- Linux C selec
- 不使用循环,判断一个数是否是2的N次方
- Java中的“抽象接口”
- hadoop(2.x)以hadoop2.2为例完全分布式最新高可靠安装文档
- C wait()和waitpid()
- 时间介词
- JSP内置对象(二)