hadoop 第二节单节点集群配置 Setting up a Single Node Cluster

来源：互联网发布：淘宝店铺如何开编辑：程序博客网时间：2024/06/05 17:15

一. 单机模式：Local (Standalone) Mode

直接使用hadoop命令执行jar包，以下是官方例子：

root@ubuntu:~/hadoop/output# cd $HADOOP_HOMEroot@ubuntu:~/hadoop# mkdir inputroot@ubuntu:~/hadoop# cp etc/hadoop/*.xml inputroot@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dsf[a-z.]+'root@ubuntu:~/hadoop# cat output/*

二.伪分布式模式：Pseudo-Distributed Mode

1. ssh设置免密登录

root@ubuntu:~# ssh localhost

2. 如果提示输入密码，需要进行导入公钥

root@ubuntu:~# ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsaGenerating public/private dsa key pair.Your identification has been saved in /root/.ssh/id_dsa.Your public key has been saved in /root/.ssh/id_dsa.pub.The key fingerprint is:SHA256:9xUa1hH5XJJQYr3G5AU5rapYcYXNICK8hIgshfI/SWQ root@ubuntuThe key's randomart image is:+---[DSA 1024]----+|.+.. o. . . =O=B ||=.. E o. . o.o%.+||o. o . .    o=oO.||  . . .   ...o*.o||   o .  S .o.o.  ||    +    .....   ||     .   o ..    ||        . .      ||                 |+----[SHA256]-----+root@ubuntu:~# cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized.keysroot@ubuntu:~# chmod 0600 ~/.ssh/authorized.keys root@ubuntu:~# ssh localhost root@localhost's password:

PS：按照官方文档，并不能无密登录，换成RSA就能成功了（表示疑惑，两者感觉并无不同啊？？）

root@ubuntu:~# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsaGenerating public/private rsa key pair.Your identification has been saved in /root/.ssh/id_rsa.Your public key has been saved in /root/.ssh/id_rsa.pub.The key fingerprint is:SHA256:bc4qQR2NMHsGVL6NzNSJ5ycuUBiqXtS9jB1BQJGXxsM root@ubuntuThe key's randomart image is:+---[RSA 2048]----+|     oXX++       ||     o.OE+..     ||    o ++Xo+      ||   o  .@.X       ||  . ..o S * .    || . .  .. = o     ||  .    .. +      ||      .  o       ||       ..        |+----[SHA256]-----+root@ubuntu:~# cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keysroot@ubuntu:~# chmod 0600 ~/.ssh/authorized_keys root@ubuntu:~# ssh localhost Welcome to Ubuntu 16.04 LTS (GNU/Linux 4.4.0-21-generic x86_64) * Documentation:  https://help.ubuntu.com/Last login: Mon Jun 27 16:57:53 2016 from 192.168.80.1

3. 格式化文件系统

root@ubuntu:~# hdfs namenode -format

PS：http://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/core-default.xml根据文档配置说明，hadoop.tmp.dir 的默认值是/tmp/hadoop-${user.name}，如果重启系统后，文件系统会被删除，所以避免这种情况可以更改此配置文件etc/hadoop/core-sites.xml，加入以下配置：

<configuration>  <property>    <name>hadoop.tmp.dir</name>    <value>/data/hadoop-${user.name}</value>  </property></configuration>

4. 启动NameNode和DataNode守护进程

root@ubuntu:~# start-dfs.shIncorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.Starting namenodes on []localhost: Error: JAVA_HOME is not set and could not be found.localhost: Error: JAVA_HOME is not set and could not be found.Starting secondary namenodes [0.0.0.0]0.0.0.0: Error: JAVA_HOME is not set and could not be found.

ERROR1：提示JAVA_HOME找不到，但是前文已经配置了。
解决问题步骤如下：
1) 先搜索错误提示经查证，发现问题出在hadoop-config.sh脚本。

root@ubuntu:~/hadoop# grep -R "JAVA_HOME is not set and could not be found" ../libexec/hadoop-config.sh:    echo "Error: JAVA_HOME is not set and could not be found." 1>&2

2)hadoop-config.sh中的$JAVA_HOME变量又是执行etc/hadoop/hadoop-env.sh脚本export的，将${JAVA_HOME}改成绝对路径。

# The java implementation to use.#export JAVA_HOME=${JAVA_HOME}export JAVA_HOME=/root/jdk

root@ubuntu:~/hadoop# start-dfs.shIncorrect configuration: namenode address dfs.namenode.servicerpc-address or dfs.namenode.rpc-address is not configured.Starting namenodes on []localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.outlocalhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.outStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out0.0.0.0: Exception in thread "main" java.lang.IllegalArgumentException: Invalid URI for NameNode address (check fs.defaultFS): file:/// has no authority.0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:471)0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:461)0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.NameNode.getServiceAddress(NameNode.java:454)0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.initialize(SecondaryNameNode.java:229)0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.<init>(SecondaryNameNode.java:192)0.0.0.0:        at org.apache.hadoop.hdfs.server.namenode.SecondaryNameNode.main(SecondaryNameNode.java:671)

ERROR2：再次启动，发生文件系统URI无效
解决问题步骤如下：

root@ubuntu:~/hadoop# vi etc/hadoop/core-site.xml <configuration>  <property>    <name>hadoop.tmp.dir</name>    <value>/data/hadoop-${user.name}</value>  </property>  <!-- 加入以下配置 -->  <property>    <name>fs.defaultFS</name>    <value>hdfs://localhost:9000</value>  </property></configuration>

5. 启动成功

root@ubuntu:~/hadoop# start-dfs.shStarting namenodes on [localhost]localhost: starting namenode, logging to /root/hadoop/logs/hadoop-root-namenode-ubuntu.outlocalhost: starting datanode, logging to /root/hadoop/logs/hadoop-root-datanode-ubuntu.outStarting secondary namenodes [0.0.0.0]0.0.0.0: starting secondarynamenode, logging to /root/hadoop/logs/hadoop-root-secondarynamenode-ubuntu.out

6. 通过web接口查看NameNode：http://localhost:50070/

7.创建HDFS目录

root@ubuntu:~/hadoop# hdfs dfs -mkdir /userroot@ubuntu:~/hadoop# hdfs dfs -mkdir /user/root

8.把文件put到dfs中

root@ubuntu:~/hadoop# hdfs dfs -put etc/hadoop inputroot@ubuntu:~/hadoop# hdfs dfs -lsFound 1 itemsdrwxr-xr-x   - root supergroup          0 2016-06-28 09:44 input

9.用hadoop提供的examples运行mapreduce，做一个检查的搜索字符串

root@ubuntu:~/hadoop# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar grep input output 'dfs[a-z.]+'

10.查看运行结果

1）可以直接在dfs中查看

root@ubuntu:~/hadoop# hdfs dfs -cat output/*6       dfs.audit.logger4       dfs.class3       dfs.server.namenode.2       dfs.period2       dfs.audit.log.maxfilesize2       dfs.audit.log.maxbackupindex1       dfsmetrics.log1       dfsadmin1       dfs.servers1       dfs.file

2）也可以先从dfs中拿到本地目录，再查看

root@ubuntu:~/hadoop# hdfs dfs -get output outputroot@ubuntu:~/hadoop# cat output/*6       dfs.audit.logger4       dfs.class3       dfs.server.namenode.2       dfs.period2       dfs.audit.log.maxfilesize2       dfs.audit.log.maxbackupindex1       dfsmetrics.log1       dfsadmin1       dfs.servers1       dfs.file

总结

严格按照文档操作，仍会出现一些意想不到的事情。类似上文中的第4点，启动NameNode和DataNode守护进程失败，遇到这类情况，如果有报错信息，可以按图索骥，一步一步检查修正。如果不能自己解决，网上也有许多资料可供参考和答疑的。
TOO ME，KEEP GOING，JUST DO IT！！！

0 0

hadoop 第二节 单节点集群配置 Setting up a Single Node Cluster