Mac 安装 hadoop+hive+hbase+spark

来源:互联网 发布:阿里云大数据 编辑:程序博客网 时间:2024/05/22 12:46

本人刚接触大数据,在调试安装的的过程中,有些bug没有记录,有些bug的处理方法也不太记得清了,如下述流程有误,欢迎批评指正


一、 hadoop

1. 安装JDK和Hadoop

mac自带jdk,用homebrew安装hadoop,注意brew安装的文件都在/usr/local/Cellar/下

brew install hadoop

2. 配置ssh免密码登录

ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa ``` cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys

测试一下:ssh localhost

出现 ssh: connect to host localhost port 22: Connection refused

则到系统偏好设置,共享下,打开远程登录功能

3. 配置相关文件(伪分布形式)

(1) core-site.xml

<configuration>  <property>    <name>hadoop.tmp.dir</name>    <value>/Users/glenn/.hadoop_tmp</value>    <description>A base for other temporary directories.</description>  </property> <property>    <name>fs.default.name</name>    <value>hdfs://localhost:9000</value>  </property> </configuration>

注意,这里hadoop.tmp.dir对应了hadoop的文件系统路径,里面记录了namenode,datanode,mapred的相关信息,hdfs下的文件内容都在这里,默认情况下,它对应的是/tmp/{$user},这是个随时会清空的路径,每次重启也会自动清空,这将会影响hdfs内容的存储,必须修改路径;

如果不修改,对应的bug现象是:jps找不到datanode 或 datanode等;一般这时候需要格式化hdfs,bin/hadoop namenode -format,多次之后,出现 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: Java.io.IOException:Incompatible namespaceIDs

(2) hdfs-site.xml

伪分布式不需要备份文件信息了

 <configuration>   <property>     <name>dfs.replication</name>     <value>1</value>   </property> </configuration>

(3) mapred-site.xml

 <configuration>   <property>     <name>mapred.job.tracker</name>     <value>localhost:9001</value>   </property> </configuration>

貌似参考了hadoop1.0版本的设置。。。需要配置yarn的请参考其他说明,如:
http://www.cnblogs.com/micrari/p/5716851.html

4. 配置环境(~/.bash_profile)

 HADOOP_HOME="/usr/local/Cellar/hadoop/2.8.0" PATH=$HADOOP_HOME/sbin:$PATH:                                                                                                               export HADOOP_OPTS="$HADOOP_OPTS -Djava.library.path=$HADOOP_HOME/lib/native" alias start-hadoop='$HADOOP_HOME/sbin/start-all.sh' alias stop-hadoop='$HADOOP_HOME/sbin/stop-all.sh'

5. 测试

格式化hdfs(参考): bin/hadoop namenode -format

启动hadoop: start-hadoop

bug:“WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform… using builtin-java classes where applicable”

原因:hadoop的lib是32位的,系统64位,需要重新编译lib库,不管它也可以正常运行大部分功能

命令行检查: jps

JpsSecondaryNameNodeResourceManagerNodeManagerDataNodeNameNode

至少要出现datanode,resourceManager,namenode

浏览器查看:

ResourceManager:http://localhost:50070
JobTracker:http://localhost:8088
Node imformation:http://localhost:8042
DataNode:http://localhost:50075

6. bug

(1)hdfs.DFSClient: DataStreamer Exception: org.apache.hadoop
.ipc.RemoteException: java.io.IOException: … could only be replicated to 0 nodes, instead of 1 …

datanode启动异常

stop-hadoophadoop namenode -format

检查是不是hadoop.tmp.dir路径有问题

(2)It looks like you are making an HTTP request to a Hadoop IPC port. This is not the correct port for the web interface on this daemon

单节点hadoop,端口号不是9000,参考5中端口号

参考:http://www.jianshu.com/p/d19ce17234b7

二、 hive

1、hive简介

hive能将hdfs上的数据看作是数据库表的形式来处理,为此,它需要为数据形成表模式,这些信息存储在metastore数据库中,也就是说,它依赖数据库的管理模式,所以是需要为节点配置数据库的

hive的metastore的配置有三种模式:

(1) 内嵌metasore:每次只能有一个内嵌的Derby数据库可以访问某个磁盘的数据库文件,这是hive默认的配置形式

(2) 本地metastore:支持多用户同时访问,但是metastore服务会和hive服务运行在同一个进程

(3)远程metastore:metastore服务和hive服务运行在不同进程,数据库可以置于防火墙之后

2、 安装hive

brew:brew install hive

3、 配置环境(~/.bash_profile)

HIVE_HOME="/usr/local/Cellar/hive/2.1.1"PATH=$HIVE_HOME/binL$PATH:

4、 配置metastore

这里采用本地metastore配置

(1) 安装mysql:brew install mysql

(2) 测试mysql:

mysql.server start

mysql_secure_installation

mysql -u root -p

(3) mysql下创建metastore和hive用户

mysql> CREATE DATABASE metastore;mysql> USE metastore;mysql> CREATE USER 'hiveuser'@'localhost' IDENTIFIED BY 'password';mysql> GRANT SELECT,INSERT,UPDATE,DELETE,ALTER,CREATE ON metastore.* TO 'hiveuser'@'localhost';

这里创建了数据库metastore,本地用户hiveuser

(4) 下载mysql的jdbc:

curl -L 'http://www.mysql.com/get/Downloads/Connector-J/mysql-connector-java-5.1.22.tar.gz/from/http://mysql.he.net/sudo cp mysql-connector-java-5.1.15/mysql-connector-java-5.1.22-bin.jar /usr/local/Cellar/hive/hive.version.no/libexec/lib/ 

5、 配置hive

(1) hive-default.xml

直接cp hive-deafult.xml.template hive-default.xml

(2) hive-site.xml

<?xml version="1.0" encoding="UTF-8" standalone="no"?>                                                                                      <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property>   <name>javax.jdo.option.ConnectionURL</name>   <value>jdbc:mysql://localhost/metastore?createDatabaseIfNotExist=true</value> </property> <property>   <name>javax.jdo.option.ConnectionDriverName</name>   <value>com.mysql.jdbc.Driver</value> </property> <property>   <name>javax.jdo.option.ConnectionUserName</name>   <value>hiveuser</value> </property> <property>   <name>javax.jdo.option.ConnectionPassword</name>   <value>root</value> </property> <property>   <name>datanucleus.autoCreateSchema</name>   <value>true</value> </property> <property>   <name>datanucleus.fixedDatastore</name>   <value>true</value> </property> <property>  <name>datanucleus.autoCreateTables</name>  <value>True</value>  </property> <property>  <name>hive.metastore.warehouse.dir</name>  <!-- base hdfs path -->  <value>/user/hive/warehouse</value>  <description>location of default database for the warehouse</description>                                                                  </property> <property>      <name>hive.metastore.schema.verification</name>      <value>false</value>      <description>      Enforce metastore schema version consistency.      True: Verify that version information stored in metastore matches with one from Hive jars.  Also disable automatic      schema migration attempt. Users are required to manully migrate schema after Hive upgrade which ensures      proper metastore schema migration. (Default)      False: Warn if the version information stored in metastore doesn't match with one from in Hive jars.      </description>   </property>   </configuration>    

主要是配上javax.jdo.option.ConnectionURL,设置为刚刚创建的metastore,javax.jdo.option.ConnectionDriverName设置为jdbc的驱动,javax.jdo.option.ConnectionUserName设置为刚刚建立的用户hiveuser,hive.metastore.warehouse.dir设置hive内表对应的hdfs路径根目录

6、 测试

hadoop开启:hadoop-start
hive开启: hive

SLF4J: Class path contains multiple SLF4J bindings.SLF4J: Found binding in [jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/log4j-slf4j-impl-2.4.1.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: Found binding in [jar:file:/usr/local/Cellar/hadoop/2.8.0/libexec/share/hadoop/common/lib/slf4j-log4j12-1.7.10.jar!/org/slf4j/impl/StaticLoggerBinder.class]SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.SLF4J: Actual binding is of type [org.apache.logging.slf4j.Log4jLoggerFactory]Logging initialized using configuration in jar:file:/usr/local/Cellar/hive/2.1.1/libexec/lib/hive-common-2.1.1.jar!/hive-log4j2.properties Async: trueHive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.

7、 bug

(1) Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStore

没有关联上对应的metastore,这里可能是没有事先创建好db,mysql的服务又没有启动;也可能是忘记先开启hadoop;

(2)Version information not found

hive-site.xml中没有hive.metastore.schema.verification为false

(3)metastore_db cannot create

“ERROR Datastore.Schema (Log4JLogger.java:error(125)) - Failed initialising database.
Failed to create database ‘metastore_db’, see the next exception for details”

检查hive-site.xml路径;第一次启动,写权限不够,sudo hive即可(不确定第二条成不成立,不记得具体如何解决了)

参考:http://www.cnblogs.com/ToDoToTry/p/5349753.html

三、 hbase

1. 安装hbase

brew:brew install hbase

2. 配置参数

(1)hbase-env.sh

这里我主要打开了hbase自带的zookeeper,设置hadoop路径

export HBASE_MANAGES_ZK=trueexport HBASE_CLASSPATH="/usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop"

(2)hbase-site.xml

<configuration>   <property>     <name>hbase.rootdir</name>     <value>hdfs://localhost:9000/hbase</value>   </property>   <property>     <name>hbase.zookeeper.property.clientPort</name>     <value>2181</value>   </property>   <property>     <name>hbase.zookeeper.property.dataDir</name>     <value>/usr/local/var/zookeeper</value>   </property>   <property>     <name>hbase.zookeeper.dns.interface</name>     <value>lo0</value>   </property>   <property>     <name>hbase.regionserver.dns.interface</name>     <value>lo0</value>   </property>   <property>     <name>hbase.master.dns.interface</name>     <value>lo0</value>   </property>   <property>     <name>hbase.cluster.distributed</name>   <value>true</value> </property> <property>     <name>dfs.replication</name>     <value>1</value> </property> <property>     <name>hbase.master.info.port</name>     <value>60010</value> </property> </configuration>   

这里主要是hbase.rootdir端口号要和hadoop的datanode保持一致,zookeeper采用系统默认的,hbase的端口号hbase.master.info.port改为60010

3. 配置环境(~/.bash_profile)

HBASE_HOME="/usr/local/Cellar/hbase/1.2.6"PATH=$HBASE_HOME/bin:$PATH:

4. 测试

(1)shell下查看:hbase shell

(2)服务查看:start-hbase.sh,进入localhost:60010

5. bug

(1)hbase 控制台打不开

hbase 1.0 以后的版本,需要自己手动配置hbase端口,在文件 hbase-site.xml 中添加如下配置

<property>    <name>hbase.master.info.port</name>    <value>60010</value></property>

四、 spark

1. 安装scala

brew:brew install scala

2. 下载spark

官网:http://spark.apache.org/downloads.html

3. 安装spark

把下载的包解压到 /usr/local/spark/

4. 配置spark

(1)

cp slaves.template slaves cp spark-env.sh.template spark-env.sh

(2) spark-env.sh

 export SCALA_HOME=/usr/loal/Cellar/scala/2.12.3  export SPARK_HOME=/usr/local/spark/spark-2.2.0-bin-hadoop2.7 export HADOOP_HOME=/usr/local/Cellar/hadoop/2.8.0 export JAVA_HOME=/Library/Java/JavaVirtualMachines/jdk1.8.0_121.jdk/Contents/Home export HADOOP_CONF_DIR=/usr/local/Cellar/hadoop/2.8.0/libexec/etc/hadoop export SPARK_WORKER_MEMORY=1g export SPARK_MASTER_IP=localhost export SPARK_WORKER_CORES=2 export SPARK_LOCAL_IP=127.0.0.1 export SPARK_MASTER_WEBUI_PORT=1080 

5. 配置环境(~/.bash_profile)

SPARK_HOME=/usr/local/spark/spark-2.2.0-bin-hadoop2.7PATH=$SPARK_HOME/bin:$PATH:alias start-spark='sudo $SPARK_HOME/sbin/start-all.sh'alias stop-spark='sudo $SPARK_HOME/sbin/stop-all.sh'

6. 测试

开启spark:start-spark
运行demo:

./bin/spark-submit --class org.apache.spark.examples.SparkPi --master local examples/jars/spark-examples_2.11-2.2.0.jar

shell测试:spark-shell

7. bug

(1)Java.NET.BindException: Cannot assign requested address: Service ‘sparkDriver’ failed after 16 retries (starting from 0)! Consider explicitly setting the appropriate port for the service ‘sparkDriver’ (for example spark.ui.port for SparkUI) to an available port or increasing spark.port.maxRetries.

根据提示,这里是端口号没设置上,检查spark-env.sh,是否有设置如下两个参数

 export SPARK_LOCAL_IP=127.0.0.1 export SPARK_MASTER_WEBUI_PORT=1080 

(2)Directory /usr/local/spark/spark-2.2.0-bin-hadoop2.7/metastore_db cannot be created.

该路径上创建db文件夹没有权限,用sudo spark-shell

(3)mac root@localhost’s password: localhost: Permission denied, please try again

如果忘记密码,则重设root密码:

sudo passwd root

否则可能远程登录服务没开启:

sudo launchctl load -w /System/Library/LaunchDaemons/ssh.plist

或者直接在系统偏好设置的共享里面打开远程登录