统计系统部署

来源:互联网 发布:一起做淘宝链接 编辑:程序博客网 时间:2024/05/21 09:42

一、 服务器准备

1. 192.168.1.23

2. 192.168.1.230

3. 192.168.1.248

4. 192.168.1.246

5. 192.168.1.232

二、 应用程序准备

1. Zookeeper3.4.6 负责管理namenodemasterha(高可靠)切换

2. Hadoop2.4.1 负责存储日志

3. Spark1.3.1 负责统计计算

4. Mysql5.6.23 负责保存统计结果

5. Jdk1.6 运行环境

6. Scala2.10.4 运行环境

三、 服务器环境配置

1. 创建普通用户,用户名heren密码heren2015#@!12

新建用户 useradd heren

修改密码 passwd heren heren2015#@!12

创建应用目录 mkdir -p /usr/local/software

隶属于heren用户 chown -R heren:heren /usr/local/software

创建数据目录 mkdir -P /data/hrdata/hadoop/

mkdir -P /data/hrdata/spark1.3/

mkdir -P /data/hrdata/zookeeper/

chown -R heren:heren /data/hrdata

受影响服务器 192.168.1.23 192.168.1.230 192.168.1.248 192.168.1.246 192.168.1.232

2. 配置ssh免密码登录

登录heren用户

生成公钥 ssh-keygen -t rsa

生成验证文件 cat ~/.ssh/id_rsa.pub  >> ~/.ssh/authorized_keys

修改验证文件权限 chmod 600 ~/.ssh/authorized_keys

拷贝公钥到其他服务例 ssh-copy-id

受影响服务器 192.168.1.23 192.168.1.230 192.168.1.248 192.168.1.246 192.168.1.232

五台之间都可以通讯

配置/etc/hosts文件

192.168.1.23 node10

192.168.1.230 node20

192.168.1.248 node30

192.168.1.246 node40

192.168.1.232   node50

受影响服务器 192.168.1.23 192.168.1.230 192.168.1.248 192.168.1.246 192.168.1.232

3. 关闭防火墙

chkconfig iptables off

service iptables stop 

受影响服务器 192.168.1.23 192.168.1.230 192.168.1.248 192.168.1.246 192.168.1.232

四、 应用程序部署

hadoop jdkscalaspark1.3zookeeper安装在/usr/local/software目录下

配置环境变量

######jdk1.6##########

export JAVA_HOME=/usr/local/software/jdk

export PATH=$JAVA_HOME/bin:$PATH

export CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar

export JRE_HOME=$JAVA_HOME/jre

 

######scala2.10.4##########

export SCALA_HOME=/usr/local/software/scala

export PATH=$SCALA_HOME/bin:$PATH

 

######hadoop2.4.1########

export HADOOP_HOME=/usr/local/software/hadoop

export PATH=$HADOOP_HOME/bin:$PATH

 

####spark1.3.1#######

export SPARK_HOME=/usr/local/software/spark

export PATH=$SPARK_HOME/bin:$PATH

 

#######mysql5.6########

export PATH=/usr/local/mysql/bin:$PATH

alias mysql_start="mysqld_safe&"

alias mysql_stop="mysqladmin -uroot -p shutdown"

五、 应用程序配置

1. 配置zookeeper,目录/usr/local/software/zookeeper/conf/

修改zookeeper配置文件usr/local/software/zookeeper/conf/zoo.cfg

dataLogDir=/data/hrdata/zookeeper/logs

dataDir=/data/hrdata/zookeeper/data

server.1=192.168.1.248:2888:3888

server.2=192.168.1.246:2888:3888

server.3=192.168.1.232:2888:3888

受影响服务器192.168.1.248 192.168.1.246 192.168.1.232

配置三台zookeeper服务器id,以下三条命令对应上面三行server后台的数字

echo '1' > /data/hrdata/zookeeper/data/myid

echo '2' > /data/hrdata/zookeeper/data/myid

echo '3' > /data/hrdata/zookeeper/data/myid

 

启动zookeeper  sh /usr/local/software/zookeeper/bin/zkServer.sh start

检查zookeeper是否启动成功 sh /usr/local/software/zookeeper/bin/zkServer.sh status

停止zookeeper  sh /usr/local/software/zookeeper/bin/zkServer.sh stop

2. 配置hadoop,目录/usr/local/software/hadoop/etc/hadoop/

core-site.xml

<configuration>

<property>

<name>fs.defaultFS</name>

 <value>hdfs://mycluster</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value>node30:2181,node40:2181,node50:2181</value>

</property>

<property>

        <name>dfs.journalnode.edits.dir</name>

        <value>/data/hrdata/hadoop/journal</value>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/data/hrdata/hadoop/tmp</value>

</property>

<property>

<name>hadoop.security.authorization</name>

<value>false</value>

</property>

</configuration>

 

hadoop-env.sh

export JAVA_HOME=/usr/local/software/jdk

export HADOOP_SSH_OPTS="-o StrictHostKeyChecking=no -p 8085"

export HADOOP_LOG_DIR='/data/hrdata/hadoop/logs'

HADOOP_PID_DIR='/usr/local/software/hadoop/pid'

 

hdfs-site.xml

<configuration>

<property>

<name>dfs.nameservices</name>

<value>mycluster</value>

</property>

<property>

<name>dfs.namenode.name.dir</name>

<value>/data/hrdata/hadoop/dfs/name</value>

</property>

<property>

<name>dfs.ha.namenodes.mycluster</name>

<value>nn1,nn2</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.nn1</name>

<value>node10:9000</value>

</property>

<property>

<name>dfs.namenode.rpc-address.mycluster.nn2</name>

<value>node20:9000</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.nn1</name>

<value>node10:50070</value>

</property>

<property>

<name>dfs.namenode.http-address.mycluster.nn2</name>

<value>node20:50070</value>

</property>

<property>

<name>dfs.namenode.shared.edits.dir</name>

<value>qjournal://node30:8485;node40:8485;node50:8485/mycluster</value>

</property>

<property>

<name>dfs.client.failover.proxy.provider.mycluster</name>

<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

</property>

<property>

<name>dfs.ha.fencing.methods</name>

<value>sshfence(heren:8085)</value>

</property>

<property>

<name>dfs.ha.fencing.ssh.private-key-files</name>

<value>/home/heren/.ssh/id_rsa</value>

</property>

<property>

<name>dfs.ha.automatic-failover.enabled</name>

<value>true</value>

</property>

<property>

<name>ha.zookeeper.quorum</name>

<value>node30:2181,node40:2181,node50:2181</value>

</property>

<property>

<name>dfs.data.dir</name>

<value>/data/hrdata/hadoop/dfs/data</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

<property>

  <name>dfs.block.size</name>

  <value>67108864</value>

</property>

</configuration>

 

 

Slaves

Node30

Node40

Node50

 

 

 

3. Hadoop格式化准备

在192.168.1.248 192.168.1.230   192.168.1.246启动journalnode

/usr/local/software/hadoop/sbin/hadoop-daemon.sh start journalnode

在192.168.1.23格式化namenode

/usr/local/software/hadoop/bin/hadoop namenode –format

/usr/local/software/hadoop/sbin/hadoop-daemon.sh start namenode

192.168.1.230同步192.168.1.23hadoop元数据

/usr/local/software/hadoop/bin/hdfs namenode -bootstrapStandby –force

在192.168.1.23初始化zookeeper节点

/usr/local/software/hadoop/bin/hdfs zkfc –formatZK

启动停止集群命令

/usr/local/software/hadoop/sbin/start-dfs.sh

/usr/local/software/hadoop/sbin/stop-dfs.sh

 

 

4. 配置spark,目录/usr/local/software/spark1.3/conf/

spark-defaults.conf

spark.master spark://node10:7077,node20:7077

spark.serializer org.apache.spark.serializer.KryoSerializer

spark.eventLog.enabled true

spark.eventLog.dir      hdfs://mycluster/spark1.3logs

spark.eventLog.compress true

spark.shuffle.manager hash

spark.sql.shuffle.partitions 20

spark.cores.max 8

spark.executor.memory 3G

 

spark.executor.extraClassPath /usr/local/software/spark1.3/lib/mysql-connector-java-5.1.13.jar

spark-env.sh

export SPARK_DRIVER_MEMORY=2G

export HADOOP_CONF_DIR=/usr/local/software/hadoop/etc/hadoop

export SPARK_DAEMON_JAVA_OPTS="-Dspark.deploy.recoveryMode=ZOOKEEPER -Dspark.deploy.zookeeper.url=node30:2181,node40:2181,node50:2181 -Dspark.deploy.zookeeper.dir=/spark1.3"

export SPARK_HISTORY_OPTS="-Dspark.history.ui.port=7777 -Dspark.history.retainedApplications=3 -Dspark.history.fs.logDirectory=hdfs://mycluster/spark1.3logs"

export SPARK_SSH_OPTS="-o StrictHostKeyChecking=no -p 8085"

export SCALA_HOME=/usr/local/software/scala

export JAVA_HOME=/usr/local/software/jdk

export SPARK_WORKER_MEMORY=7G

export SPARK_LOCAL_DIRS=/data/hrdata/spark1.3/local

export SPARK_WORKER_DIR=/data/hrdata/spark1.3/work

export SPARK_PID_DIR=/usr/local/software/spark1.3/pid

export SPARK_WORKER_OPTS="-Dspark.worker.cleanup.enabled=true"

slaves

Node30

Node40

Node50

 

 

启动停止spark集群

/usr/local/software/spark1.3/sbin/start-all.sh

/usr/local/software/spark1.3/sbin/stop-all.sh

原创粉丝点击