单机版hadoop平台搭建
来源:互联网 发布:双向数据绑定 编辑:程序博客网 时间:2024/05/21 17:08
第一次写比较杂,,,只能等有空再整理了。
环境
我的主机环境是Ubuntu16.04,64位操作系统
工具
jdk:jdk-8u151-linux-x64.tar.gz hadoop:hadoop-2.6.0-cdh5.7.0.tar.gzhive:hive-1.1.0-cdh5.7.0.tar.gzspark:spark-2.2.0.tgzmaven:apache-maven-3.5.2-bin.tar.gzScala:scala-2.11.8.tgz
新建用户
1)新建hadoop用户:adduser hadoop2)修改/etc/sudoers文件:添加hadoop ALL=(ALL:ALL) ALL
配置java环境
1)解压:
tar -xvf jdk-8u151-linux-x64.tar.gz -C /usr/java/
2)修改/etc/profile文件
# set java environment export JAVA_HOME=/usr/java/jdk1.8.0_151/ export JRE_HOME=/usr/java/jdk1.8.0_151/jre export CLASSPATH=.:$CLASSPATH:$JAVA_HOME/lib:$JRE_HOME/lib export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin
安装hadoop:
1)解压安装包到/usr底下
tar -xvf hadoop-2.6.0-cdh5.7.0.tar.gz -C /usr
2)修改细节
#修改文件名 mv hadoop-xxx-xxx hadoop-2.6.0 #把hadoop文件开启hadoop权限 sudo chown -R hadoop:hadoop hadoop-2.6.0
3)修改相应的配置文件 /etc/profile
# set java environmentexport HADOOP_HOME=/usr/hadoop-2.6.0export PATH=$PATH:$HADOOP_HOME/binexport PATH=$PATH:$HADOOP_HOME/sbinexport HADOOP_MAPRED_HOME=$HADOOP_HOMEexport HADOOP_COMMON_HOME=$HADOOP_HOMEexport HADOOP_HDFS_HOME=$HADOOP_HOMEexport YARN_HOME=$HADOOP_HOMEexport HADOOP_ROOT_LOGGER=INFO,consoleexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
4)修改$HADOOP_HOME/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/java/jdk1.8.0_151
5)修改$HADOOP_HOME/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://106.14.32.248:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/usr/hadoop-2.6.0/tmp</value> <description>Abase for other temporary directories.</description> </property></configuration>
6)修改$HADOOP_HOME/etc/hadoop/hdfs-site.xml
<property> <name>dfs.namenode.secondary.http-address</name> <value>106.14.32.248:50090</value> </property> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.name.dir</name> <value>file:/usr/hadoop-2.6.0/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/usr/hadoop-2.6.0/hdfs/data</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property>
7)复制template,生成xml,命令如下:
cp mapred-site.xml.template mapred-site.xml
8)修改$HADOOP_HOME/etc/hadoop/mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> </property> <property> <name>mapreduce.jobhistory.address</name> <value>hadoop001:10020</value> </property> <property> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop001:19888</value> </property>
9)修改$HADOOP_HOME/etc/hadoop/yarn-site.xml
<property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name> <value>org.apache.hadoop.mapred.ShuffleHandler</value> </property> <property> <name>yarn.resourcemanager.address</name> <value>hadoop001:8032</value> </property> <property> <name>yarn.resourcemanager.scheduler.address</name> <value>hadoop001:8030</value> </property> <property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>hadoop001:8031</value> </property> <property> <name>yarn.resourcemanager.admin.address</name> <value>hadoop001:8033</value> </property> <property> <name>yarn.resourcemanager.webapp.address</name> <value>hadoop001:8088</value> </property><property> <name>yarn.nodemanager.resource.memory-mb</name> <value>20480</value> </property> <property> <name>yarn.scheduler.minimum-allocation-mb</name> <value>2048</value> </property> <property> <name>yarn.nodemanager.vmem-pmem-ratio</name> <value>2.1</value> </property>
安装ssh免密码登录
1)安装ssh
sudo apt-get install openssh-server
2)将公钥导入自己的机子上
cat id_rsa.pub >> ~/.ssh/authorized_keys
hive下载
1)解压安装包
tar -xvf hive-1.1.0-cdh5.7.0.tar.gz -C /usr
2)修改/etc/profile文件
export HIVE_HOME=/usr/hive-1.1.0export PATH=$HIVE_HOME/bin:$PATH
3)修改细节
mv hive-1.1.0-cdh5.7.0/ hive-1.1.0sudo chown -R hadoop:hadoop hive-1.1.0/
4)修改hive-env.sh(mv hive-env.sh.template hive-env.sh)
export HADOOP_HOME=/usr/hadoop-2.6.0
5)修改hive-site.xml文件
<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property> <name>javax.jdo.option.ConnectionURL</name> <value>jdbc:mysql://hadoop001:3306/hive?createDatabaseIfNotExist=true</value> <description>JDBC connect string for a JDBC metastore</description></property><property> <name>javax.jdo.option.ConnectionDriverName</name> <value>com.mysql.jdbc.Driver</value> <description>Driver class name for a JDBC metastore</description></property><property> <name>javax.jdo.option.ConnectionUserName</name> <value>hive</value> <description>username to use against metastore database</description></property><property> <name>javax.jdo.option.ConnectionPassword</name> <value>hive</value> <description>password to use against metastore database</description></property></configuration>
6)修改 bin/hive-config.sh 文件
export JAVA_HOME=/usr/java/jdk1.8.0_151export HIVE_HOME=/usr/hive-1.1.0export HADOOP_HOME=/usr/hadoop-2.6.0
安装spark
1)解压maven:
tar -xvf apache-maven-3.5.2-bin.tar.gz -C /usr
2)配置maven环境,MAVEN_OPTS
export MAVEN_HOME =/usr/apache-maven-3.5.2export PATH=$MAVEN_HOME/bin:$PATHexport MAVEN_OPTS="-Xmx2g -XX:ReservedCodeCacheSize=512m"export MAVEN_OPTS="-Xmx2g -XX:MaxPermSize=512M -XX:ReservedCodeCacheSize=512m"
3)修改maven本地仓库的地址:
cd /usr/apache-maven-3.5.2/confvi settings.xml <localRepository>/usr/apache-maven-3.5.2/maven_repo</localRepository>
3)安装scala
tar -xvf scala-2.11.8.tgz -C /usr/spark-2.2.0/build/
4)配置环境
export SCALA_HOME=/usr/spark-2.2.0/buildexport PATH=$SCALA_HOME/bin:$PATH
5)解压spark
tar -xvf spark-2.2.0.tgz -C /usr
6)修改pom.xml文件
<repository><id>cloudera</id><name>cloudera Repository</name> <url>https://repository.cloudera.com/artifactory/cloudera-repos</url></repository>
7)编译Spark1
./dev/change-scala-version.sh 2.11
./dev/make-distribution.sh --name 2.6.0-cdh5.7.0 --tgz -Pyarn -Dscala-2.11 -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0./dev/make-distribution.sh \--name 2.6.0-cdh5.7.0 \--tgz \-Dscala-2.11 \-Dhadoop.version=2.6.0-cdh5.7.0 \-Phadoop-2.6 \-Phive -Phive-thriftserver \-Pyarn
或
./build/mvn -Pyarn -Dscala-2.11 -Phadoop-2.6 -Phive -Phive-thriftserver -Dhadoop.version=2.6.0-cdh5.7.0 -DskipTests clean package
编译spark2
介于maven多次编译不成功,我又尝试了用git下载的安装包进行编译(保佑可以用吧!)
1)在线安装git
apt-get install git
2)创建一个目录复制spark源代码
mkdir -p /usr/projects/opensourcegit clone https://github.com/apache/spark.git
spark环境配置
修改spark-env.sh文件
SPARK_MASTER_HOST=hadoop001SPARK_WORKER_CORES=2SPARK_WORKER_MEMORY=2gSPARK_WORKER_INSTANCES=1export JAVA_HOME=/usr/java/jdk1.8.0_151/export PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/binexport SCALA_HOME=/usr/scala-2.11.8export PATH=$SCALA_HOME/bin:$PATHexport SPARK_MASTER_IP=106.14.32.248export HADOOP_CONF_DIR=/usr/hadoop-2.6.0/etc/hadoop
遇到的问题
1、启动namenode和DataNode报错:
在namenode中报错==>java.lang.IllegalArgumentException: Does not contain a valid host:port authority: 50070
在DataNode中报错==>java.net.BindException: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use; For more details see: http://wiki.apache.org/hadoop/BindException
在secondarynamenode中报错==>java.net.BindException: Port in use: 0.0.0.0:50090
解决:我是通过kill占用的process,netstat -ap | grep 50070
查看端口占用,并用kill 1068
杀掉对应的processid,之后DataNode就能启动了,但namenode还不行,于是我查到了他说要host:port,原来我的hdfs-site.xml中value值写错了只写了50070,应该是hadoop001:50070
,于是我的namenode和DataNode都起起来了。
注意点:如果DataNode访问不到namenode,那么要用iptables -I INPUT -p tcp --dport 8020 -j ACCEPT
添加启动一下端口的监听
2、安装mysql包时遇到E: Unable to locate package mysql-server
Unable to locate packet就是无法找到包嘛,那还不赶紧sudo apt-get update下,于是就执行sudo apt-get update
问题解决~~~
3、启动./hive时出现错误==>com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link
==>Caused by: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient
==>Caused by: com.mysql.jdbc.exceptions.jdbc4.CommunicationsException: Communications link failure
解决方法:尝试把之前自己建的hive用户删了drop user ‘hive’@’%’;
再新建一个CREATE USER 'hive' IDENTIFIED BY 'hive';
并赋权限GRANT ALL PRIVILEGES ON *.* TO 'hive'@'106.14.32.248' identified by 'hive';
然后重启mysqlsudo /etc/init.d/mysql restart
OK,用hive用户就可以登录了。
4、编译spark时报错,,,精神分裂了要 Failed to execute goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile (scala-com
,百度上好像关于这个错没啥资料,有也只是说
pile-first) on project spark-core_2.11: Execution scala-compile-first of goal net.alchim31.maven:scala-maven-plugin:3.2.2:compile failed../dev/change-scala-version.sh 2.11
但我在官网上看说这个只要2.10版本才需要这样搞,2.11不需要,但活马当死马医,果然还是不行,不是这个问题。最后果然还是要借助报错文档,在报错的error提供的网址上标注了要在标签底下加入一行代码:<defaultGoal>install</defaultGoal>
,OKcore没问题了。
5、然而catalyst和sql又报错,用他来解决问题mvn clean install -rf :spark-catalyst_2.11 -Dmaven.test.skip=true
参考文章:Spark的安装和配置http://blog.csdn.net/tian_li/article/details/49328517;
大数据躺过的坑https://www.cnblogs.com/zlslch/p/5865707.html;
学习spark之spark编译部署http://blog.csdn.net/yesuhuangsi/article/details/51830459;
感谢大神们的分享!!!
- 单机版hadoop平台搭建
- 单机版hadoop搭建
- 单机版hadoop搭建
- ubuntu 12.04 hadoop 单机模式平台搭建
- hadoop单机版搭建过程
- ubuntu搭建单机版hadoop
- hadoop平台搭建(3)--hadoop安装、环境配置、单机运行
- 初识Hadoop之Hadoop单机版搭建
- hadoop单机版搭建图文详解
- hadoop单机版搭建图文详解 (转)
- Ubuntu 12.04搭建hadoop单机版环境
- Ubuntu上搭建Hadoop环境(单机版)
- Ubuntu 12.04搭建Hadoop单机版环境
- hadoop单机版搭建图文详解
- 超详细单机版搭建hadoop环境
- Ubuntu 12.04搭建hadoop单机版环境
- hadoop 1.0.1单机版搭建
- Ubuntu 12.04搭建hadoop单机版环境
- 作业1
- 高级软件工程学习心得
- 解释型语言和编译型语言的区别
- 断点续传进度条显示 开始 暂停 结束
- 69. Sqrt(x)
- 单机版hadoop平台搭建
- 断点续传更新版
- 内核编程:驱动之学习笔记
- maven中pow项目找到jar项目中的配置文件
- H5下上传图片预览
- 实验7、矩阵的2种转置运算 (4学时)
- Linux Shell 学习笔记(二) 命令
- Maven私服Nexus搭建&应用
- 4.C语言(1)