Hadoop2.7.5伪分布式安装

来源:互联网 发布:unity3d大型游戏 编辑:程序博客网 时间:2024/05/19 04:54
  • 将安装包复制到/soft文件目录下


  • 解压
[hadoop@localhost soft]$ sudo tar -zxvf hadoop-2.7.5.tar.gz

  • 删除安装包
[hadoop@localhost soft]$ sudo rm -rf hadoop-2.7.5.tar.gz

  • 创建符号连接
[hadoop@localhost soft]$ sudo ln -s hadoop-2.7.5 hadoop2.7[hadoop@localhost soft]$ lltotal 8lrwxrwxrwx. 1 root  root   12 Dec 21 03:02 hadoop2.7 -> hadoop-2.7.5drwxr-xr-x. 9 20415  101 4096 Dec 15 20:12 hadoop-2.7.5lrwxrwxrwx. 1 root  root   11 Dec 21 02:43 jdk1.8 -> jdk1.8.0_66drwxr-xr-x. 8    10  143 4096 Oct  6  2015 jdk1.8.0_66[hadoop@localhost soft]

  • 修改环境变量
[hadoop@localhost sbin]$ sudo nano /etc/profile



  • 使得环境变量生效,并查看Hadoop是否安装成功
[hadoop@localhost sbin]$ source /etc/profile[hadoop@localhost sbin]$ hadoop versionHadoop 2.7.5Subversion https://shv@git-wip-us.apache.org/repos/asf/hadoop.git -r 18065c2b6806ed4aa6a3187d77cbe21bb3dba075Compiled by kshvachk on 2017-12-16T01:06ZCompiled with protoc 2.5.0From source with checksum 9f118f95f47043332d51891e37f736e9This command was run using /soft/hadoop-2.7.5/share/hadoop/common/hadoop-common-2.7.5.jar

  • 进入etc/hadoop目录
[hadoop@localhost hadoop2.7]$ cd etc/hadoop

  • 配置文件hadoop-env.sh(配置Java环境)
[hadoop@localhost hadoop]$ nano hadoop-env.sh



参考博客:点击打开链接
  • 配置core-site.xml ===>指定hdfs的地址,即namenode,指定Hadoop运行时产生文件的存储目录
[hadoop@localhost hadoop]$ sudo nano core-site.xml[sudo] password for hadoop:

添加如下代码:
<property>  <name>fs.defaultFS</name>  <value>hdfs://hadoop:9000</value></property><property>  <name>hadoop.tmp.dir</name>  <value>/soft/hadoop2.7/tmp</value></property>

  • 配置hdfs-site.xml ====>指定hdfs副本的数量
<property>  <name>dfs.replication</name>  <value>1</value>
<property><name>dfs.permissions</name><value>false</value></property> 
</property>

  • 配置mapred-site.xml===>指定map reduce运行在yarn上
[hadoop@localhost hadoop]$ sudo cp mapred-site.xml.template mapred-site.xml//修改文件名[hadoop@localhost hadoop]$ sudo nano mapred-site.xml[hadoop@localhost hadoop]$
<property>  <name>mapreduce.framework.name</name>  <value>yarn</value></property>
<property><name>mapred.job.tracker</name><value>hadoop:9001</value></property>
  • 配置yarn-site.xml====>指定yarn资源管理器的地址,以及reducer获取数据的方式
<property>  <name>yarn.nodemanager.aux-services</name>  <value>mapreduce_shuffle</value></property>
<property>  <name>yarn.resourcemanager.hostname</name>  <value>hadoop</value>  </property>
第一次添加了这个,但是后来启动的时候一直显示java.net.BindException: Problem binding to [hadoop:8031] java.net.BindException: Cannot assign
 requested address,后来去掉了这个属性就可以了

  • 开始配置ssh
  • 查看是否安装了ssh相关的软件包
[hadoop@localhost hadoop]$ yum list installed|grep sshlibssh2.x86_64                         1.4.3-8.el7                     @anacondaopenssh.x86_64                         6.4p1-8.el7                     @anacondaopenssh-clients.x86_64                 6.4p1-8.el7                     @anacondaopenssh-server.x86_64                  6.4p1-8.el7                     @anaconda[hadoop@localhost hadoop]$

  • 检查是否启动了ssh进程
[hadoop@localhost hadoop]$ ps -Af |grep sshdroot       2208      1  0 02:22 ?        00:00:00 /usr/sbin/sshd -Droot      14509   2208  0 02:28 ?        00:00:00 sshd: hadoop [priv]hadoop    14574  14509  0 02:29 ?        00:00:00 sshd: hadoop@pts/1root      16620   2208  0 02:40 ?        00:00:03 sshd: root@nottyroot      45692   2208  0 03:31 ?        00:00:00 sshd: hadoop [priv]hadoop    45704  45692  0 03:31 ?        00:00:00 sshd: hadoop@pts/2hadoop    45985  16486  0 03:48 pts/1    00:00:00 grep --color=auto sshd[hadoop@localhost hadoop]$

  • 在客户端client生成公私密钥对
[hadoop@localhost hadoop]$ ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa//表示将密钥对放在文件夹id_rsa下Generating public/private rsa key pair.Created directory '/home/hadoop/.ssh'.Your identification has been saved in /home/hadoop/.ssh/id_rsa.Your public key has been saved in /home/hadoop/.ssh/id_rsa.pub.The key fingerprint is:36:db:ae:21:48:b6:e4:ed:5e:80:cf:1c:cc:47:db:d9 hadoop@localhost.localdomainThe key's randomart image is:+--[ RSA 2048]----+|                 ||                 ||        .        ||     + . o o     ||    = = S o E    ||   = B = +       ||    + * + .      ||     . o o       ||     .o ...      |+-----------------+

  • 查看密钥对(公钥和私钥)
[hadoop@localhost ~]$ cd .ssh[hadoop@localhost .ssh]$ lsid_rsa  id_rsa.pub
  • 将公钥追加到server的authorizekey里面
[hadoop@localhost .ssh]$ cat id_rsa.pub>>authorized_keys

  • 修改authorized——keys的权限为644
[hadoop@localhost .ssh]$ chmod 644 authorized_keys

  • 测试ssh退出session之后是否可以免密码登录
[hadoop@localhost .ssh]$ ssh localhostThe authenticity of host 'localhost (::1)' can't be established.ECDSA key fingerprint is a7:5b:2c:55:73:e9:9a:2e:8d:48:a5:8b:98:dd:f8:05.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added 'localhost' (ECDSA) to the list of known hosts.Last login: Thu Dec 21 03:31:45 2017 from 192.168.109.1[hadoop@localhost ~]$ exitlogoutConnection to localhost closed.[hadoop@localhost .ssh]$ ssh localhostLast login: Thu Dec 21 04:01:32 2017 from localhost[hadoop@localhost ~]$
  • Hadoop完全分布式
  • 在本地模式,伪分布模式,完全分布式模式之间切换
  • 复制一个本地模式local
[hadoop@localhost etc]$ sudo cp -r hadoop local
  • 将刚才的配置文件内容清除core-site.xml mapred-site.xml .....
[hadoop@localhost local]$ nano core-site.xml[hadoop@localhost local]$ nano mapred-site.xml[hadoop@localhost local]$ nano hdfs-site.xml[hadoop@localhost local]$ nano yarn-site.xml[hadoop@localhost local]$

  • 复制伪分布模式,完全分布式模式
[hadoop@localhost etc]$ sudo cp -r hadoop pseudo[hadoop@localhost etc]$ lshadoop  local  pseudo[hadoop@localhost etc]$ sudo cp -r hadoop full[hadoop@localhost etc]$ lsfull  hadoop  local  pseudo[hadoop@localhost etc]$
  • 删除Hadoop文件夹,创建符号链接来切换模式
[hadoop@localhost etc]$ sudo rm -rf hadoop
[hadoop@localhost etc]$ sudo ln -s pseudo hadoop[hadoop@localhost etc]$ lltotal 12drwxr-xr-x. 2 root   root   4096 Dec 21 04:28 fulllrwxrwxrwx. 1 root   root      6 Dec 21 04:30 hadoop -> pseudodrwxr-xr-x. 2 hadoop hadoop 4096 Dec 21 04:10 localdrwxr-xr-x. 2 root   root   4096 Dec 21 04:28 pseudo[hadoop@localhost etc]$

  • 现在切换到伪分布模式,需要启动Hadoop进程,启动进程之前需要初始化Hadoop文件系统hdfs
[hadoop@localhost:/soft/hadoop2.7/bin]hadoop namenode format

  • 启动所有进程 这里我使用Hadoop用户启动start-all.sh一直报错,查了很多资料说是ssh没有启动,但是我的ssh时启动成功了的,至今也不知道什么原因
错误如下
hadoop: ssh: connect to host hadoop port 22: Connection refusedlocalhost: chown: changing ownership of ‘/soft/hadoop-2.7.5/logs’: Operation not permittedlocalhost: starting datanode, logging to /soft/hadoop-2.7.5/logs/hadoop-hadoop-datanode-localhost.localdomain.out
查看ssh进程
[hadoop@localhost:/home/hadoop]yum list installed|grep sshlibssh2.x86_64                         1.4.3-8.el7                     @anacondaopenssh.x86_64                         6.4p1-8.el7                     @anacondaopenssh-clients.x86_64                 6.4p1-8.el7                     @anacondaopenssh-server.x86_64                  6.4p1-8.el7                     @anaconda[hadoop@localhost:/home/hadoop]ps -Af|grep sshdroot       1712      1  0 06:51 ?        00:00:00 /usr/sbin/sshd -D //这里的sshd为root用户,我想估计是这里的问题,需要将Hadoop加入sshd -Droot      12975   1712  0 06:53 ?        00:00:00 sshd: hadoop [priv]hadoop    12981  12975  0 06:53 ?        00:00:00 sshd: hadoop@pts/1root      13876   1712  0 07:04 ?        00:00:00 sshd: hadoop [priv]hadoop    13879  13876  0 07:04 ?        00:00:00 sshd: hadoop@pts/2hadoop    18834  18788  0 07:37 pts/2    00:00:00 grep --color=auto sshd[hadoop@localhost:/home/hadoop]ssh localhost
后来我切换到root用户启动,就成功了,但是利用root启动的时候每个节点都需要输入root用户的密码,很麻烦
[root@localhost:/soft/hadoop2.7/etc]start-all.sh
[root@localhost:/soft/hadoop2.7/etc]jps18579 NodeManager18691 Jps18423 ResourceManager18249 SecondaryNameNode18043 DataNode

单独启动namenode进程
[root@localhost:/soft/hadoop2.7/sbin]hadoop-daemon.sh start namenode

  • 查看HDFS文件系统
又出现了这个问题真是头疼
Call From localhost/127.0.0.1 to hadoop:9000 failed on connection exception: java.net.ConnectException: Connection refused;
最后修改了那个core-site.xml成功了
修改如下
[root@localhost:/soft/hadoop2.7/etc/hadoop]cat core-site.xml<?xml version="1.0" encoding="UTF-8"?><?xml-stylesheet type="text/xsl" href="configuration.xsl"?><configuration><property>  <name>fs.defaultFS</name>  <value>hdfs://localhost:9000</value>//把Hadoop改成了local host</property><property>  <name>hadoop.tmp.dir</name>  <value>/root/hadoop_tmp</value>//把Hadoop的临时缓存目录重新在家目录新建了一个文件夹,然后作为Hadoop的tmp文件夹
//见我的另外一篇博客http://blog.csdn.net/xiaoqiu_cr/article/details/78868420,这里也主要和我每次启动start-all.sh的时候,namenode总是需要手动启动有关</property></configuration>
真的很忧桑,感觉举步维艰,下面我们重新启动HDFS文件管理系统
[root@localhost:/soft/hadoop2.7/etc/hadoop]hadoop fs -ls 
  • 创建文件夹
[root@localhost:/soft/hadoop2.7/etc/hadoop]hadoop fs -mkdir -p /user/centos/hadoop[root@localhost:/soft/hadoop2.7/etc/hadoop]hadoop fs -lsr /lsr: DEPRECATED: Please use 'ls -R' instead.drwxr-xr-x   - root supergroup          0 2017-12-21 09:38 /userdrwxr-xr-x   - root supergroup          0 2017-12-21 09:38 /user/centosdrwxr-xr-x   - root supergroup          0 2017-12-21 09:38 /user/centos/hadoop

  • web端访问50070
发现又访问失败,真是命运多舛,后来发现是防火墙的原因
先关闭防火墙
[root@localhost:/soft/hadoop2.7/etc/hadoop]systemctl stop firewalld.service

  • 重新访问http://192.168.109.130:50070
终于看到了感人的页面