hadoop+spark+zookeeper
来源:互联网 发布:vero moda除了淘宝 编辑:程序博客网 时间:2024/05/21 04:38
主机名
IP地址
安装的软件
运行的进程
Node10
192.168.18.23
jdk,hadoop,spark
namenode,resourcemanager,zkfc,Master
Node20
192.168.18.230
jdk,hadoop,spark
namenode,resourcemanager,zkfc,Master
Node30
192.168.18.248
jdk,hadoop,zookeeper,spark
datanode,nodemanager,journalnode,QuorumPeerMain,Worker
Node40
192.168.18.246
jdk,hadoop,zookeeper,spark
datanode,nodemanager,journalnode,QuorumPeerMain,Worker
Node50
192.168.18.232
jdk,hadoop,zookeeper,spark
datanode,nodemanager,journalnode,QuorumPeerMain,Worker
1.关闭selinux和iptables
vi /etc/sysconfig/selinux
SELINUX=disabled
service iptables stop
chkconfig iptables off
2. 配置/etc/hosts
Vi /etc/hosts
192.168.1.23 node10
192.168.1.230 node20
192.168.1.248 node30
192.168.1.246 node40
192.168.1.232 node50
3.创建用户并配置ssh免密码登录
[root@localhost ~]# useradd heren
[root@localhost ~]# passwd heren
[root@localhost ~]# su - heren
[heren@localhost ~]$ ssh-keygen -t rsa
[heren@localhost ~]$ cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
[heren@localhost ~]$ chmod 600 .ssh/authorized_keys
下面操作只需在node10上执行即可
[heren@node10 ~]$ ssh node20 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[heren@node10 ~]$ ssh node30 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[heren@node10 ~]$ ssh node40 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[heren@node10 ~]$ ssh node50 cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
[heren@node10 ~]$ scp ~/.ssh/authorized_keys node20:/home/heren/.ssh/authorized_keys
[heren@node10 ~]$ scp ~/.ssh/authorized_keys node30:/home/heren/.ssh/authorized_keys
[heren@node10 ~]$ scp ~/.ssh/authorized_keys node40:/home/heren/.ssh/authorized_keys
[heren@node10 ~]$ scp ~/.ssh/authorized_keys node50:/home/heren/.ssh/authorized_keys
4.安装jdk1.7,配置环境变量
rpm -qa | grep java | xargs rpm -e --nodeps
mkdir -p /usr/java
[root@localhost ~]#cp /root/jdk-7u80-linux-x64.tar.gz /usr/java/
[root@localhost java]#tar xf jdk-7u80-linux-x64.tar.gz
[root@localhost ~]#vi /etc/profile
export JAVA_HOME=/usr/java/jdk1.7.0_80
export JRE_HOME=/usr/java/jdk1.7.0_80/jre
export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib:$CLASSPATH
export PATH=$JAVA_HOME/bin:$JRE_HOME/bin:$JAVA_HOME:$PATH
export ZOOKEEPER_HOME=/usr/local/software/zookeeper
export CLASSPATH=$CLASSPATH:$ZOOKEEPER_HOME/lib
export PATH=$PATH:$ZOOKEEPER_HOME/bin
export HADOOP_HOME=/usr/local/software/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
export SCALA_HOME=/usr/local/software/scala
export PATH=$SCALA_HOME/bin:$PATH
export SPARK_HOME=/usr/local/software/spark
export PATH=$SPARK_HOME/bin:$PATH
[root@localhost ~]#source .bash_profile
[root@localhost ~]#. /etc/profile
[root@localhost ~]#java -version
5.创建目录
创建应用目录
[root@node10 ~]# mkdir -p /usr/local/software
隶属于heren用户
[root@node10 ~]#chown -R heren:heren /usr/local/software
创建数据目录
mkdir -p /data/hadoop/
mkdir -p /data/spark/
mkdir -p /data/zookeeper/
chown -R heren:heren /data/
6.安装zookeeper集群 node30 node40 node50
[root@node30 software]# su - heren
[heren@node30 ~]$ cd /usr/local/software/
[heren@node30 software]$ tar xf zookeeper-3.4.6.tar.gz
[heren@node30 software]$ mv zookeeper-3.4.6 zookeeper
配置文件
[heren@node30 software]$ cp zookeeper/conf/zoo_sample.cfg zookeeper/conf/zoo.cfg
[heren@node30 software]$ vi zookeeper/conf/zoo.cfg
dataLogDir=/data/zookeeper/logs
dataDir=/data/zookeeper/data
server.1=node30:2888:3888
server.2=node40:2888:3888
server.3=node50:2888:3888
创建id文件
mkdir -p /data/zookeeper/data/
echo '1' > /data/zookeeper/data/myid
echo '2' > /data/zookeeper/data/myid
echo '3' > /data/zookeeper/data/myid
在各个节点上分别启动zookooper (node30 node40 node50)
[heren@node30 zookeeper]$ ./bin/zkServer.sh start
JMX enabled by default
Using config: /usr/local/software/zookeeper/bin/../conf/zoo.cfg
Starting zookeeper ... STARTED
[heren@node30 ~]$ /usr/local/software/zookeeper/bin/zkServer.sh status
JMX enabled by default
Using config: /usr/local/software/zookeeper/bin/../conf/zoo.cfg
Mode: leader
启动zookeeper sh /usr/local/software/zookeeper/bin/zkServer.sh start
检查zookeeper是否启动成功 sh /usr/local/software/zookeeper/bin/zkServer.sh status
停止zookeeper sh /usr/local/software/zookeeper/bin/zkServer.sh stop
7.安装配置hadoop集群 node10-node50
[root@node10 ~]# su - heren
[heren@node10 ~]$ cd /usr/local/software/
[heren@node10 software]$ tar xf hadoop-2.6.0.tar.gz
[heren@node10 software]$ mv hadoop-2.6.0 hadoop
修改hadoop-env.sh
[heren@node10 hadoop]$ vi hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_80
修改core-site.xml
[heren@node10 hadoop]$ vi core-site.xml
<configuration>
<!-- 指定hdfs的nameservice为masters -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://masters</value>
</property>
<!-- 指定hadoop临时目录-->
<property>
<name>hadoop.tmp.dir</name>
<value>/usr/local/software/hadoop/tmp</value>
</property>
<!-- 指定zookeeper地址-->
<property>
<name>ha.zookeeper.quorum</name>
<value>node30:2181,node40:2181,node50:2181</value>
</property>
</configuration>
修改hdfs-stie.xml
[heren@node10 hadoop]$ vi hdfs-site.xml
<configuration>
<!--指定hdfs的nameservice为masters,需要和core-site.xml中的保持一致-->
<property>
<name>dfs.nameservices</name>
<value>masters</value>
</property>
<!-- Master下面有两个NameNode,分别是Master,Slave1 -->
<property>
<name>dfs.ha.namenodes.masters</name>
<value>node10,node20</value>
</property>
<!-- Master的RPC通信地址-->
<property>
<name>dfs.namenode.rpc-address.masters.node10</name>
<value>node10:9000</value>
</property>
<!-- Master的http通信地址-->
<property>
<name>dfs.namenode.http-address.masters.node10</name>
<value>node10:50070</value>
</property>
<!-- Slave1的RPC通信地址-->
<property>
<name>dfs.namenode.rpc-address.masters.node20</name>
<value>node20:9000</value>
</property>
<!-- Slave1的http通信地址-->
<property>
<name>dfs.namenode.http-address.masters.node20</name>
<value>node20:50070</value>
</property>
<!-- 指定NameNode的元数据在JournalNode上的存放位置-->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node30:8485;node40:8485;node50:8485/masters</value>
</property>
<!-- 指定JournalNode在本地磁盘存放数据的位置-->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/usr/local/software/hadoop/journal</value>
</property>
<!-- 开启NameNode失败自动切换-->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<!-- 配置失败自动切换实现方式 -->
<property>
<name>dfs.client.failover.proxy.provider.masters</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
<property>
<name>dfs.ha.fencing.methods</name>
<value>
sshfence
shell(/bin/true)
</value>
</property>
<!-- 使用sshfence隔离机制时需要ssh免登陆-->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/heren/.ssh/id_rsa</value>
</property>
<!-- 配置sshfence隔离机制超时时间-->
<property>
<name>dfs.ha.fencing.ssh.connect-timeout</name>
<value>30000</value>
</property>
</configuration>
修改mapred-site.xml
[heren@node10 hadoop]$ vi mapred-site.xml
<configuration>
<!-- 指定mr框架为yarn方式-->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
</configuration>
修改yarn-site.xml
[heren@node10 hadoop]$ vi yarn-site.xml
<configuration>
<!-- Site specific YARN configuration properties -->
<!-- 开启RM高可靠-->
<property>
<name>yarn.resourcemanager.ha.enabled</name>
<value>true</value>
</property>
<!-- 指定RM的cluster id -->
<property>
<name>yarn.resourcemanager.cluster-id</name>
<value>RM_HA_ID</value>
</property>
<!-- 指定RM的名字-->
<property>
<name>yarn.resourcemanager.ha.rm-ids</name>
<value>rm1,rm2</value>
</property>
<!-- 分别指定RM的地址-->
<property>
<name>yarn.resourcemanager.hostname.rm1</name>
<value>node10</value>
</property>
<property>
<name>yarn.resourcemanager.hostname.rm2</name>
<value>node20</value>
</property>
<property>
<name>yarn.resourcemanager.recovery.enabled</name>
<value>true</value>
</property>
<property>
<name>yarn.resourcemanager.store.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
</property>
<!-- 指定zk集群地址-->
<property>
<name>yarn.resourcemanager.zk-address</name>
<value>node30:2181,node40:2181,node50:2181</value>
</property>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
</configuration>
修改slaves
[heren@node10 hadoop]$ vi slaves
node30
node40
node50
将配置好的hadoop复制到其他节点
[heren@node10 etc]$ scp -r hadoop heren@192.168.1.230:/usr/local/software/hadoop/etc/
[heren@node10 etc]$ scp -r hadoop heren@192.168.1.246:/usr/local/software/hadoop/etc/
[heren@node10 etc]$ scp -r hadoop heren@192.168.1.248:/usr/local/software/hadoop/etc/
[heren@node10 etc]$ scp -r hadoop heren@192.168.1.232:/usr/local/software/hadoop/etc/
启动格式化启动hadoop集群
启动journalnode(分别在node30,node40,node50上执行 )
[heren@node30 software]$ hadoop-daemon.sh start journalnode
starting journalnode, logging to /usr/local/software/hadoop/logs/hadoop-heren-journalnode-node30.out
格式化hdfs
[heren@node10 software]$ hdfs namenode -format
[heren@node10 hadoop]$ scp -r tmp/ heren@node20:/usr/local/software/hadoop/
格式化zk在node10上执行
[heren@node10 hadoop]$ hdfs zkfc -formatZK
启动hdfs
[heren@node10 hadoop]$ start-dfs.sh
启动yarn
[heren@node10 hadoop]$ start-yarn.sh
Node20上的standby resourcemanger是需要手动启动的
[heren@node20 hadoop]$ yarn-daemon.sh start resourcemanager
通过web查看集群状态
查看namenode
http://node10:50070/
http://node20:50070/
查看resourcemanger
http://node10:8088/
http://node20:8088/
通过hdfs命名查看集群状态
[heren@node10 hadoop]$ hdfs dfsadmin -report
下载完以后,解压到hadoop的lib/native目录下,覆盖原有文件即可
[heren@node10 lib]$tar xf hadoop-native-64-2.6.0.tar
[heren@node10 lib]$ mv libh* native
验证HDFS HA
首先向hdfs上传一个文件
[heren@node10 hadoop]$ hadoop fs -put /etc/profile /profile
[heren@node10 hadoop]$ hadoop fs -ls /
然后再kill掉active的NameNode
[heren@node10 hadoop]$ kill -9 10020
文件仍然在
[heren@node10 hadoop]$ hadoop fs -ls /
手动启动那个挂掉的NameNode
[heren@node10 hadoop]$ sbin/hadoop-daemon.sh start namenode
验证YARN:
运行一下hadoop提供的demo中的WordCount程序:
[heren@node10 mapreduce]$ hadoop jar /usr/local/software/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out
启动停止集群命令
/usr/local/software/hadoop/sbin/start-dfs.sh
/usr/local/software/hadoop/sbin/stop-dfs.sh
8.安装spark集群
[root@node10 ~]#mkdir -p /usr/scala
[root@node10 scala]# tar xf scala-2.11.7.tgz
[heren@node10 software]$ tar xf spark-1.5.2-bin-hadoop2.6.tgz
[heren@node10 software]$ mv spark-1.5.2-bin-hadoop2.6 spark
[heren@node10 conf]$ vi spark-env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_80
export SCALA_HOME=/usr/scala/scala-2.11.7
export SPARK_WORKER_MEMORY=1g
export HADOOP_CONF_DIR=/usr/local/software/hadoop/etc/hadoop
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER
-Dspark.deploy.zookeeper.url=node30:2181,node40:2181,node50:2181 -Dspark.deploy.zookeeper.dir=/spark"
[heren@node10 conf]$ vi spark-defaults.conf
spark.master spark://node10:7077,node20:7077
spark.serializer org.apache.spark.serializer.KryoSerializer
spark.eventLog.enabled true
spark.eventLog.dir hdfs://mycluster/sparklogs
[heren@node10 conf]$ vi slaves
node20
node30
node40
node50
启动停止spark集群
/usr/local/software/spark1.3/sbin/start-all.sh
/usr/local/software/spark1.3/sbin/stop-all.sh
- hadoop+spark+zookeeper
- Hadoop+Spark+Zookeeper 集群搭建
- Hadoop+hbase+zookeeper+spark+sqoop
- hadoop、spark、zookeeper、hive集群搭建脚本
- hadoop、zookeeper、hbase、spark集群环境搭建
- hadoop、zookeeper、hbase、spark集群环境搭建
- hadoop、zookeeper、hbase、spark集群环境搭建
- Hadoop + hbase + Zookeeper + spark + scala 集群搭建
- 大数据组件原理总结-Hadoop、Hbase、Kafka、Zookeeper、Spark
- CDH版Hadoop-zookeeper-hbase-spark安装文档
- Hadoop+Hbase+Spark集群配置—Zookeeper安装
- spark/hadoop/hive/alluxio/sqoop/zookeeper 安装文档
- 大数据组件原理总结-Hadoop、Hbase、Kafka、Zookeeper、Spark
- Centos7+Hadoop+Spark+Zookeeper+Hbase之集群搭建
- 大数据1-hadoop、zookeeper、hbase、spark集群环境搭建
- Hadoop--ZooKeeper
- Hadoop-zookeeper
- Hadoop & Spark
- 引用js或css后加?v= 版本号的用法
- 移动H5手势密码解锁插件--demo
- 动态代理 两种方式
- 【线段树+dfs序】J
- Laravel 的Validation
- hadoop+spark+zookeeper
- python爬虫之新浪微博模拟登陆
- 3 字节的 UTF-8 序列的字节 3 无效
- Kafka集群的Linux安装步骤
- 练习1-6 验证表达式getchar()!=EOF的值是0还是1
- java实现文件下载功能(Struts和普通下载)
- 餐馆
- Android循环的定时器
- hdu 2546 饭卡 01背包