Hadoop2.6.4、zookeeper3.4.6、HBase1.2.2、Hive1.2.1、sqoop1.99.7、spark1.6.2安装
来源:互联网 发布:数据导入导出怎么做 编辑:程序博客网 时间:2024/05/16 12:22
一、准备
<h2 id="1-安装虚拟机与编译hadoop">1. 安装虚拟机与编译Hadoop
创建3个虚拟机,分别为hsm01, hss01, hss02
2. 配置服务器
2.1 关闭防火墙
# 执行命令service iptables stop# 验证service iptables status# 关闭防火墙的自动运行chkconfig iptables off# 验证chkconfig --list | grep iptables
2.2 设置主机名
hostname hss01vim /etc/sysconfig/network# ip 与 hostname 绑定vim /etc/hosts
2.3 免密码登录
# 设置 ssh 免密码登录(在三个节点分别执行以下命令)ssh-keygen -t rsa# ~/.ssh/id_rsa.pub就是生成的公钥,把三个id_rsa.pub的内容合并,写入以下文件cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys# 复制到其他节点scp ~/.ssh/authorized_keys zkpk@hss01:~/.ssh/scp ~/.ssh/authorized_keys zkpk@hss02:~/.ssh/
3. 安装JDK
# root用户(也可以其他用户安装)vim /etc/profileexport JAVA_HOME=/opt/jdk1.8.0_45export PATH=$PATH:$JAVA_HOME/binexport CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jarsource /etc/profile
4. 版本
5. 规划
二、安装
hadoop相关程序都是用zkpk用户进行操作,并安装在/home/zkpk目录下
1. zookeeper
1.1 解压
tar -xf zookeeper-3.4.6.tar.gz
1.2 配置
cd ~/zookeeper-3.4.6/confcp zoo_sample.cfg zoo.cfgvim zoo.cfg# 修改dataDir=/home/zkpk/zookeeper-3.4.6/data# 添加dataLogDir=/home/zkpk/zookeeper-3.4.6/logs# 在最后添加server.1=hsm01:2888:3888server.2=hss01:2888:3888server.3=hss02:2888:3888
1.3 创建目录与myid文件
# zookeeper根目录执行mkdir datamkdir logs# 在dataDir目录下创建myid文件写入1vim data/myid
1.4 复制ZooKeeper到其他节点
scp -r ~/zookeeper-3.4.6/ zkpk@hss01:~/scp -r ~/zookeeper-3.4.6/ zkpk@hss02:~/# 将hss01中的myid改为2,hss02中的myid改为3vim ~/zookeeper-3.4.6/data/myid
1.5 配置环境变量
vim ~/.bash_profileexport ZOOKEEPER_HOME=/home/zkpk/zookeeper-3.4.6export PATH=$PATH:$ZOOKEEPER_HOME/binsource ~/.bash_profile
1.6 逐个启动验证
zkServer.sh startzkServer.sh status
1.7 问题
zookeeper环境搭建中的几个坑[Error contacting service. It is probably not running]的分析及解决
http://www.paymoon.com/index.php/2015/06/04/zookeeper-building/
安装zookeeper时候,可以查看进程启动,但是状态显示报错:Error contacting service. It is probably not running
http://www.cnblogs.com/xiaohua92/p/5460515.html
所有节点的系统时间要同步
# root用户date -s "yyyyMMdd HH:mm:ss"clock -w
Zookeeper 日志输出到指定文件夹
http://www.tuicool.com/articles/MbUb63n
2. Hadoop
2.1 解压(/home/zkpk)
tar -xf hadoop-2.6.4.tar.gz
2.2 创建相应目录
cd hadoop-2.6.4# namenode信息存放目录mkdir name# datanode信息存放目录mkdir data
2.3 修改JAVA_HOME
cd etc/hadoopvim yarn-env.shvim hadoop-env.shvim mapred-env.shexport JAVA_HOME=/opt/jdk1.8.0_45
2.4 配置core-site.xml
vim core-site.xml fs.defaultFS hdfs://ns1 hadoop.tmp.dir /home/zkpk/hadoop-2.6.4/tmp ha.zookeeper.quorum hsm01:2181,hss01:2181,hss02:2181
注:不要忘了创建tmp目录
2.5 配置hdfs-site.xml
vim hdfs-site.xml dfs.replication 1 dfs.permissions false dfs.namenode.name.dir /home/zkpk/hadoop-2.6.4/name true dfs.datanode.data.dir /home/zkpk/hadoop-2.6.4/data true dfs.nameservices ns1 dfs.ha.namenodes.ns1 nn1,nn2 dfs.namenode.rpc-address.ns1.nn1 hsm01:9000 dfs.namenode.http-address.ns1.nn1 hsm01:50070 dfs.namenode.rpc-address.ns1.nn2 hss01:9000 dfs.namenode.http-address.ns1.nn2 hss01:50070 dfs.namenode.shared.edits.dir qjournal://hsm01:8485;hss01:8485;hss02:8485/ns1 dfs.journalnode.edits.dir /home/zkpk/hadoop-2.6.4/journal dfs.ha.automatic-failover.enabled true dfs.client.failover.proxy.provider.ns1 org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider dfs.ha.fencing.methods sshfence shell(/bin/true) dfs.ha.fencing.ssh.private-key-files /home/zkpk/.ssh/id_rsa dfs.ha.fencing.ssh.connect-timeout 30000
2.6 编辑mapred-site.xml
cp mapred-site.xml.template mapred-site.xmlvim mapred-site.xml mapreduce.framework.name yarn
2.7 编辑yarn-site.xml
vim yarn-site.xml yarn.nodemanager.aux-services mapreduce_shuffle yarn.nodemanager.aux-services.mapreduce.shuffle.class org.apache.hadoop.mapred.ShuffleHandler yarn.resourcemanager.ha.enabled true yarn.resourcemanager.ha.automatic-failover.enabled true yarn.resourcemanager.ha.id rm1 yarn.resourcemanager.cluster-id yrc yarn.resourcemanager.ha.rm-ids rm1,rm2 yarn.resourcemanager.hostname.rm1 hsm01 yarn.resourcemanager.hostname.rm2 hss01 yarn.resourcemanager.zk-address hsm01:2181,hss01:2181,hss02:2181 yarn.resourcemanager.recovery.enabled true yarn.resourcemanager.store.class org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore
2.8 编辑slaves
vim slaveshss01hss02
2.9 复制到其他节点
scp -r ~/hadoop-2.6.4 hss01:~/scp -r ~/hadoop-2.6.4 hss02:~/
2.10 配置各节点环境变量
打开:vim ~/.bash_profile添加:export HADOOP_HOME=/home/zkpk/hadoop-2.6.4export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin刷新:source ~/.bash_profile验证:(输入以下命令,如果出现hadoop对应的版本,则hadoop配置成功。)hadoop version
2.11 集群启动(严格按照下面的步骤)
a. 启动zookeeper集群(分别在hsm01、hss01、hss02上启动zk)
zkServer.sh start# 查看状态:一个leader,两个followerzkServer.sh status
b. 启动journalnode(分别在hsm01、hss01、hss02上启动journalnode)
hadoop-daemon.sh start journalnode# 运行jps命令检验,hsm01、hss01、hss02上多了JournalNode进程
c. 格式化HDFS
# hsm01上执行hdfs namenode -format
d. 将tmp拷到其他节点
scp -r ~/hadoop-2.6.4/name hss01:~/hadoop-2.6.4/scp -r ~/hadoop-2.6.4/name hss02:~/hadoop-2.6.4/
e. 格式化ZK
# hsm01上执行hdfs zkfc -formatZK
f. 启动HDFS
# 启动zkServer.sh startstart-dfs.shstart-yarn.sh# 关闭stop-dfs.shstop-yarn.shzkServer.sh stop
2.13 问题
待续
3. Hive安装
3.1 MySQL安装
http://blog.csdn.net/u013980127/article/details/52261400
# 创建hadoop用户grant all on *.* to hadoop@'%' identified by 'hadoop';grant all on *.* to hadoop@'localhost' identified by 'hadoop';grant all on *.* to hadoop@'hsm01' identified by 'hadoop';flush privileges;# 创建数据库create database hive_121;
3.2 解压
tar -xf apache-hive-1.2.1-bin.tar.gz# 文件名修改为hive-1.2.1mv apache-hive-1.2.1-bin/ hive-1.2.1
3.3 修改文件名
# 在hive-1.2.1/conf下,修改文件名mv hive-default.xml.template hive-site.xmlmv hive-log4j.properties.template hive-log4j.propertiesmv hive-exec-log4j.properties.template hive-exec-log4j.propertiesmv hive-env.sh.template hive-env.sh
3.4 hive-env.sh
export HADOOP_HOME=/home/zkpk/hadoop-2.6.4export HIVE_CONF_DIR=/home/zkpk/hive-1.2.1/conf
3.5 hive-log4j.properties
hive.log.dir=/home/zkpk/hive-1.2.1/logs# 创建日志目录mkdir /home/zkpk/hive-1.2.1/logs
3.6 hive-site.xml
删除所有内容,添加如下内容:
hive.metastore.warehouse.dir hdfs://ns1/hive/warehouse hive.exec.scratchdir hdfs://ns1/hive/scratchdir hive.querylog.location /home/zkpk/hive-1.2.1/logs javax.jdo.option.ConnectionURL jdbc:mysql://hss02:3306/hive_121?characterEncoding=UTF-8 javax.jdo.option.ConnectionDriverName com.mysql.jdbc.Driver javax.jdo.option.ConnectionUserName hadoop javax.jdo.option.ConnectionPassword hadoop
3.7 环境变量
vim ~/.bash_profileexport HIVE_HOME=/home/zkpk/hive-1.2.1export PATH=$PATH:$HIVE_HOME/binsource ~/.bash_profile
在hive/lib下有个jline的jar,将hadoop内的这个jar包换成一致的,否则会启动hive会报错。
将mysql-connector-java-5.1.29.jar连接jar拷贝到hive-1.2.1/lib目录下
# 运行下面命令hive# http://hsm01:50070,查看是否多了hive目录。
3.8 问题与参考
Hive配置项的含义详解
Hive 使用陷阱(Lock table) 排查过程
Hive、Spark SQL、Impala比较
4. Sqoop安装
4.1 解压
tar -xf sqoop-1.99.7-bin-hadoop200.tar.gz# 修改目录名mv sqoop-1.99.7-bin-hadoop200/ sqoop-1.99.7
4.2 配置Hadoop代理访问
# 配置代理vim $HADOOP_HOME/etc/hadoop/core-site.xml# zkpk是运行server的用户 hadoop.proxyuser.zkpk.hosts * hadoop.proxyuser.zkpk.groups *# 由于用户id小于1000(可用id命令查看),设置此项vim $HADOOP_HOME/etc/hadoop/container-executor.cfgallowed.system.users=zkpk
4.3 sqoop.properties
# @LOGDIR@修改为/home/zkpk/sqoop-1.99.7/logs# @BASEDIR@修改为/home/zkpk/sqoop-1.99.7# hadoop配置文件路径org.apache.sqoop.submission.engine.mapreduce.configuration.directory=/home/zkpk/hadoop-2.6.4/etc/hadoop/# 设置验证机制(去掉注释)org.apache.sqoop.security.authentication.type=SIMPLEorg.apache.sqoop.security.authentication.handler=org.apache.sqoop.security.authentication.SimpleAuthenticationHandlerorg.apache.sqoop.security.authentication.anonymous=true
4.4 配置第三方jar引用路径
复制mysql驱动jar文件到$SQOOP_HOME/extra(创建extra目录)
export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extra
4.5 环境变量
vim ~/.bash_profileexport SQOOP_HOME=/home/zkpk/sqoop-1.99.7export SQOOP_SERVER_EXTRA_LIB=$SQOOP_HOME/extraexport PATH=$PATH:$SQOOP_HOME/binsource ~/.bash_profile
4.6 启动验证
# 验证配置是否有效sqoop2-tool verify# 开启服务器sqoop2-server start# 客户端验证sqoop2-shellshow connector
4. 问题与参考
Sqoop1.99.7安装、配置和使用(一)
Sqoop1.99.7安装、配置和使用(二)
Sqoop2的安装与使用
Sqoop1.X 和 Sqoop2架构区别
Hadoop数据收集与入库系统Flume与Sqoop
5. HBase安装
5.1 解压
tar -xf hbase-1.2.2-bin.tar.gz
5.2 lib更新
cd hbase-1.2.2/libcp ~/hadoop-2.6.4/share/hadoop/mapreduce/lib/hadoop-annotations-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/tools/lib/hadoop-auth-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/common/hadoop-common-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/hdfs/hadoop-hdfs-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-app-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-common-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-core-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-jobclient-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/mapreduce/hadoop-mapreduce-client-shuffle-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-api-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-client-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-common-2.6.4.jar .cp ~/hadoop-2.6.4/share/hadoop/yarn/hadoop-yarn-server-common-2.6.4.jar .# 解决java.lang.NoClassDefFoundError: org/htrace/Tracecp ~/hadoop-2.6.4/share/hadoop/common/lib/htrace-core-3.0.4.jar .# 删除老版的jarrm *-2.5.1.jar
5.2 hbase-env.sh
export JAVA_HOME=/opt/jdk1.8.0_45export HBASE_MANAGES_ZK=falseexport HBASE_CLASSPATH=/home/zkpk/hadoop-2.6.4/etc/hadoop# 注释掉下面的配置(因为1.8JDK没有这个选项)#export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"#export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -XX:PermSize=128m -XX:MaxPermSize=128m"
5.3 hbase-site.xml
hbase.cluster.distributed true hbase.tmp.dir /home/zkpk/hbase-1.2.2/tmp hbase.rootdir hdfs://ns1/hbase zookeeper.session.timeout 120000 hbase.zookeeper.property.tickTime 6000 hbase.zookeeper.property.clientPort 2181 hbase.zookeeper.quorum hsm01,hss01,hss02 hbase.zookeeper.property.dataDir /home/zkpk/zookeeper-3.4.6/data dfs.replication 1 hbase.master.maxclockskew 180000
5.4 regionservers
hss01hss02
5.5 拷贝hbase到其他节点
把hadoop的hdfs-site.xml和core-site.xml 放到hbase/conf下
cp hadoop-2.6.4/etc/hadoop/hdfs-site.xml hbase-1.2.2/conf/cp hadoop-2.6.4/etc/hadoop/core-site.xml hbase-1.2.2/conf/scp -r /home/zkpk/hbase-1.2.2 hss01:~/scp -r /home/zkpk/hbase-1.2.2 hss02:~/
5.6 配置环境变量
# 各节点分别配置vim ~/.bash_profileexport HBASE_HOME=/home/zkpk/hbase-1.2.2export PATH=$PATH:$HBASE_HOME/binsource ~/.bash_profile
5.7 启动验证
# 启动start-hbase.sh# 通过浏览器访问hbase HMaster Web页面http://hsm01:16010# HRegionServer Web页面http://hss01:16030http://hss02:16030# shell验证hbase shell# list验证list# 建表验证create 'user','name','sex'
5.8 问题与参考
Hbase与hadoop有版本兼容要求,一般的解决方式都是把Hbase中与hadoop相关的jar包,替换成hadoop版本的jar包。集群时间记得要同步,同步方式界面操作调整时区和格式。
date -s "yyyyMMdd HH:mm:dd"clock -w
<a href="http://www.blogjava.net/anchor110/archive/2015/05/06/424888.%3Ca%20href=" https:="" www.2cto.com="" kf="" web="" asp="" "="" target="_blank" class="keylink" style="color: rgb(31, 58, 135); text-decoration: none;">aspx">hbase启动时报错:java.lang.NoClassDefFoundError: org/htrace/Trace
或者用ntp设置
Linux NTP配置详解 (Network Time Protocol)
6. Spark安装
6.1 安装 Scala
# root安装(其他用户也可以)tar -xf scala-2.11.7.tgzmv scala-2.11.7/ /opt/# 环境变量vim /etc/profileexport SCALA_HOME=/opt/scala-2.11.7export PATH=$PATH:$SCALA_HOME/binsource /etc/profile# 验证scala -version# 将scala复制到其他节点,并配置环境变量scp -r scala-2.11.7 root@hss01:/optscp -r scala-2.11.7 root@hss02:/opt
6.2 解压spark
tar -xf spark-1.6.2-bin-hadoop2.6.tgzmv spark-1.6.2-bin-hadoop2.6/ spark-1.6.2
6.3 spark-env.sh
# conf目录cp spark-env.sh.template spark-env.shvim spark-env.shexport JAVA_HOME=/opt/jdk1.8.0_45export SCALA_HOME=/opt/scala-2.11.7export SPARK_MASTER_IP=hsm01export SPARK_WORKER_MEMORY=1gexport HADOOP_CONF_DIR=/home/zkpk/hadoop-2.6.4/etc/hadoopexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib:$HADOOP_HOME/lib/native"
6.4 slaves
cp slaves.template slaveshsm01hss01hss02
6.5 复制spark到其他节点
scp -r spark-1.6.2/ hss01:~/scp -r spark-1.6.2/ hss02:~/
6.6 环境变量
vim ~/.bash_profileexport SPARK_HOME=/home/zkpk/spark-1.6.2export PATH=$PATH:$SPARK_HOME/binsource ~/.bash_profile
6.7 启动验证
# 启动(由于和hadoop的启动shell名字一样,需要注意)$SPARK_HOME/sbin/start-all.sh# 查看集群状态http://hsm01:8080/# 命令行交互验证./bin/spark-shellscala> val textFile = sc.textFile("file:///home/zkpk/spark-1.6.2/README.md")textFile: org.apache.spark.rdd.RDD[String] = file:///home/zkpk/spark-1.6.2/README.md MapPartitionsRDD[1] at textFile at :27scala> textFile.count()res0: Long = 95scala> textFile.first()res1: String = # Apache Spark
- Hadoop2.6.4、zookeeper3.4.6、HBase1.2.2、Hive1.2.1、sqoop1.99.7、spark1.6.2安装
- 基于hadoop集群的Hive1.2.1、Hbase1.2.2、Zookeeper3.4.8完全分布式安装
- Hadoop2.6+HA+Zookeeper3.4.6+Hbase1.0.0安装
- Hadoop2.6+HA+Zookeeper3.4.6+Hbase1.0.0安装
- Hadoop2.6+HA+Zookeeper3.4.6+Hbase1.0.0安装
- sqoop1.4.6+hadoop2.6.2安装
- 伪分布安装Hadoop2.8.0+Hbase1.3.1+Hive1.2.1+Kylin2.0
- Hadoop2.6.4分布式下安装 hive1.2.1
- Hadoop2.3.0+Hbase0.96.1.1+Hive0.14.0+Zookeeper3.4.6+Sqoop1.99.3安装配置流程
- CentOS6.9+Hadoop2.7.3+Hive1.2.1+Hbase1.3.1+Spark2.1.1
- Hadoop 2.5.1高可用,hadoop2.5.1+zookeeper3.4.6+hbase1.2.1
- Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6搭建分布式集群环境
- Hadoop2.7.3+HBase1.2.5+ZooKeeper3.4.6搭建分布式集群环境
- 搭建高可用 zookeeper3.4.6 +hadoop2.7.1 +hbase1.2.6 环境
- CentOS64位6.4下Hadoop2.7.1、Mysql5.5.46、Hive1.2.1、Spark1.5.0的集群环境部署
- spark1.6.2 on hadoop2.6.4安装流程
- Hadoop2.7.3 + HBase1.2.3 + ZooKeeper3.4.9 整合
- hive安装 (hive1.2.1+hadoop2.7+mysql)
- android intent 与 IntentFilter
- vue基本使用--过滤器
- BZOJ1063:道路设计(树形dp)
- 第七讲 方法的定义与参数、返回值与重载
- jQuery实现文本框回车键提交form表单
- Hadoop2.6.4、zookeeper3.4.6、HBase1.2.2、Hive1.2.1、sqoop1.99.7、spark1.6.2安装
- 郑州黑马JavaEE就业10期 平均薪资5795元 毕业5个工作日,就...
- 计算机网络七层模型
- 获取 Android 模拟器root 权限(解决data权限问题)
- Kylin系列-Apache Kylin中对上亿字符串的精确Count_Distinct示例
- 网络实时流量监测工具iftop
- docker上oracle数据库乱码解决方案
- 关于linux ext文件系统自动check的建议
- monkey