基于多个HDFS文件系统的HBASE配置

来源：互联网发布：js 匹配中文编辑：程序博客网时间：2024/06/07 21:42

HBase是运行在HDFS上的，那么能不能在实现同一个HBase运行在不同的HDFS上以实现数据的不同文件系统的存放呢？花了两天的时间调研这个事情，得到的结论是可以的，以下记录下配置过程：

1，安装Hadoop，这个不用多说了，可以搜多到很多关于Hadoop安装配置的博文，我也是参考这些博文进行安装Hadoop的。

http://www.cnblogs.com/wayne1017/archive/2007/03/20/678724.html

2，我试验的是单机版的Hadoop，因此，如果要安装多个文件系统，每个文件系统的namenode和datanode都必须安装在同一台机器上，这时候注意在/etc/hosts文件中添加不同的域名以用于区分不同的文件系统（配置文件中用到）。

3，为每一个文件系统设置不同hdfs-site.xml配置文件，里面的内容包括：

<?xml version="1.0"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration> <property> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.name.dir</name> <value>${hadoop.tmp.dir}/dfs2/name</value> <description>Determines where on the local filesystem the DFS name </description></property><property> <name>dfs.data.dir</name> <value>${hadoop.tmp.dir}/dfs2/data</value> <description>Determines where on the local filesystem an DFS data</description></property> <property> <name>dfs.secondary.http.address</name> <value>0.0.0.0:50090</value> <description> The secondary namenode http server address and port. If the port is 0 then the server will start on a free port. </description> <property> <name>dfs.datanode.address</name> <value>0.0.0.0:50010</value> <description> The address where the datanode server will listen to. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.datanode.http.address</name> <value>0.0.0.0:50075</value> <description> The datanode http server address and port. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.datanode.ipc.address</name> <value>0.0.0.0:50020</value> <description> The datanode ipc server address and port. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.http.address</name> <value>0.0.0.0:50070</value> <description> The address and the base port where the dfs namenode web ui will listen on. If the port is 0 then the server will start on a free port. </description></property><property> <name>dfs.datanode.https.address</name> <value>0.0.0.0:50475</value></property><property> <name>dfs.https.address</name> <value>0.0.0.0:50470</value></property></configuration>

4，启动每个HDFS的实例包含三个进程(这里不涉及jobtracker和tasktracker)：namenode,datanode,secondarynamenode。

在启动脚本start-dfs.sh和stop-dfs.sh里会调用脚本hadoop-daemon.sh，这里面用到了一些环境变量：

#   HADOOP_CONF_DIR Alternate conf dir. Default is ${HADOOP_HOME}/conf.
#   HADOOP_LOG_DIR   Where log files are stored. PWD by default.
#   HADOOP_MASTER    host:path where hadoop code should be rsync'd from
#   HADOOP_PID_DIR   The pid files are stored. /tmp by default.
#   HADOOP_IDENT_STRING   A string representing this instance of hadoop. $USER by default
#   HADOOP_NICENESS The scheduling priority for daemons. Defaults to 0.

通过查看脚本程序，发现上述进程的启动和停止都是通过文件/$HADOOP_PID_DIR/*.pid来填充和取得进程ID的，因此对于不同的HDFS文件系统，这些目录应当设置为不同，否则就会造成进程ID的冲突。

5，注意配置./conf/master和./conf/slaves中的地址配置。

6，配置好HDFS设置，启动前应当先格式化文件系统，否则namenode不会启动。

7，在上述配置完成多个HDFS后，可以部署HBase，参考博文：

http://hi.baidu.com/webcell/blog/item/4b289125465aab6935a80f28.html

这里可以修改配置文件指向不同的HDFS文件系统。

8，这里可能存在的问题是端口冲突：在HBase中的源码和配置文件中都制定了HMaster绑定60000的端口，之前Hadoop中不能存在和此端口号冲突的端口号（我在这里查了好久）。

9，./conf/regionserver里的地址要和文件系统的配置地址一样。

10，出现错误时查看日志很重要，Hadoop和HBase的日志都在主目录的logs目录下，结构非常清晰。

ＴＯＤＯ：

以后得工作打算通过在HBase中编程来改变配置文件以实现不同的数据存放在不同的文件系统里。