Ubuntu16.04 hadoop2.7.3 伪分布配置

来源:互联网 发布:bgm是什么意思网络用语 编辑:程序博客网 时间:2024/05/17 01:06

一、在Ubuntu配置java环境变量

我是下载jdk1.8.0_151最新版本的。官方网站:http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html

(因为我是Linux64位的,就下载了jdk-8u151-linux-x64.tar.gz)
1.vim /etc/profile

#我的java根目录是/javaexport JAVA_HOME=/java/jdk1.8.0_151export JRE_HOME=${JAVA_HOME}/jreexport CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/libexport PATH=${JAVA_HOME}/bin:$PATH

二、安装ssh-server并实现免密码登录
1.在Ubuntu中下载ssh-server

sudo apt-get install openssh-server

2.启动ssh-server

sudo /etc/init.d/ssh start

出现以下字样

[ ok ] Starting ssh (via systemctl): ssh.service.

查看ssh-server服务是否启动

 ps -ef|grep ssh

如果出现下面情况:

root       1073      1  0 13:05 ?        00:00:00 /usr/sbin/sshd -Droot       6799   2245  0 14:02 pts/19   00:00:00 grep --color=auto ssh

说明ssh-server启动成功

3.设置ssh-server免密码登录

使用如下命令,一直回车,直到生成了rsa

ssh-keygen -t rsa

导入authorized_keys:

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

测试可不可以免密码登录了

ssh localhost

如果出现以下情况:

Welcome to Ubuntu 16.04.3 LTS (GNU/Linux 4.10.0-28-generic x86_64) * Documentation:  https://help.ubuntu.com * Management:     https://landscape.canonical.com * Support:        https://ubuntu.com/advantagepackages can be updated.updates are security updates.Last login: Sat Oct 28 15:05:51 2017 from 127.0.0.1

关闭防火墙

ufw disable

三、安装hadoop单机模式和伪分布模式

1.下载hadoop-2.7.3.tar.gz,解压到/usr/local(单机模式搭建):

下载网站:http://archive.apache.org/dist/hadoop/core/hadoop-2.7.3/?C=S;O=A

切换到/usr/local下,将hadoop-2.7.3重命名为hadoop:

cd /usr/localsudo mv hadoop-2.7.3 hadoop

修改/usr/local/hadoop的使用权限:

sudo chmod 777 /usr/local/hadoop

配置.bashrc文件

  1. sudo vim ~/.bashrc

在文件末尾追加下面内容,然后保存:

#HADOOP VARIABLES STARTexport HADOOP_INSTALL=/bigdata/hadoop-2.7.3export HADOOP_MAPRED_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_HOME=$HADOOP_INSTALLexport HADOOP_HDFS_MOME=$HADOOP_INSTALLexport YARE_HOME=$HADOOP_INSTALLexport HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_INSTALL/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_INSTALL/lib"export PATH=$PATH:$HADOOP_INSTALL/sbinexport PATH=$PATH:$HADOOP_INSTALL/bin #HADOOP VARIABLES END

执行下面命令,使添加的环境变量生效:

source ~/.bashrc

hadoop配置 (伪分布模式搭建)

1.配置hadoop-env.sh:

sudo vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

在文件加入以下代码:

# the Java implementation to use. export JAVA_HOME=/java/jdk1.8.0_151export HADOOP_HOME=/bigdata/hadoop-2.7.3export PATH=$PATH:/bigdata/hadoop-2.7.3/binexport HADOOP_OPTS="-Djava.library.path=${HADOOP_HOME}/lib/native"

配置yarn-env.sh:

sudo vim /usr/local/hadoop/etc/hadoop/yarn-env.sh

在文件最后加上以下代码:

# JAVA_HOME=/java/jdk1.8.0_151export /java/jdk1.8.0_151

配置core-site.xml,在home目录下创建/usr/local/hadoop/tmp目录,然后在core-site.xml中添加下列内容

sudo mkdir /usr/local/hadoop/tmpsudo vim /usr/local/hadoop/etc/hadoop/core-site.xml

在文件最后加上以下代码:

  <configuration>    <property>        <name>fs.defaultFS</name>        <value>hdfs://hadoop4:9000</value>    </property>     <property>                <name>dfs.replication</name>                <value>1</value>        </property>        <property>                <name>hadoop.tmp.dir</name>                <value>/bigdata/hadoop-2.7.3/tmp</value>        </property>        <property>                <name>dfs.name.dir</name>                <value>/home/hdfs/name</value>        </property>        <property>                <name>hadoop.tmp.dir</name>                <value>/home/hadoop3/hadoop_tmp</value>        <description>A base for other temporary directories.</description>        </property>         <property>                     <name>dfs.permissions</name>                     <value>false</value>          <description>            If "true", enable permission checking in HDFS. If "false", permiss    ion checking is turned off,but all other behavior is unchanged.  Switching f    rom one parameter value to the other does  not change the mode, owner or gro    up of files or directories            </description>         </property></configuration>

配置hdfs-site.xml:

sudo vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

在文件最后加上以下代码:

<configuration>    <property>        <name>dfs.replication</name>        <value>1</value>    </property>        <property>                <name>dfs.data.dir</name>                <value>/bigdata/hadoop-2.7.3/data</value>        </property></configuration>

配置mapred-site.xml:

sudo  /usr/local/hadoop/etc/hadoop/mapred-site.xml

在文件最后加上以下代码:

  <configuration>        <property>                <name>mapreduce.framework.name</name>                <value>yarn</value>        </property>        <property>                <name>mapred.job.tracker</name>                <value>localhost:9001</value>        </property></configuration>~                   

配置yarn-site.xml:

sudo vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

在文件最后加上以下代码:

<property>                <name>yarn.nodemanager.aux-services</name>                <value>mapreduce_shuffle</value>        </property>        <property>                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>                <value>org.apache.hadoop.mapred.ShuffleHandler</value>        </property>        <property>                <name>yarn.resourcemanager.scheduler.address</name>                <value>hadoop4:8030</value>        </property>        <property>                <name>yarn.resourcemanager.address</name>                <value>hadoop4:8032</value>        </property>        <property>                <name>yarn.resourcemanager.resource-traker.address</name>                <value>hadoop4:8031</value>        </property></configuration>

关机重启系统。

sudo   reboot

测试Hadoop是否安装并配置成功

验证Hadoop单机模式安装完成:

hadoop version

出现Hadoop版本信息单机模式成功了
例如以下信息

Hadoop 2.7.3Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r baa91f7c6bc9cb92be5982de4719c1c8af91ccffCompiled by root on 2016-08-18T01:41ZCompiled with protoc 2.5.0From source with checksum 2e4ce5f957ea4db193bce3734ff29ff4This command was run using /usr/local/bigdata/hadoop-2.7.3/share/hadoop/common/hadoop-common-2.7.3.ja

启动hdfs使用为分布模式

格式化namenode:

hdfs namenode -format  

有 “……has been successfully formatted” 等字样出现即说明格式化成功
启动hdfs

start-dfs.sh

显示进程

jps

出现以下六个就成功了

 ResourceManager Jps DataNode SecondaryNameNode NameNode NodeManager

在浏览器中输入http://localhost:50070/进行测试

输入 http://localhost:8088/测试伪分布安装配置是否成功

停止hdfs

stop-all.sh

运行wordcount

启动hdfs

start-all.sh

查看hdfs底下包含的文件目录

hadoop dfs -ls /

如果是第一次运行hdfs,则什么都不会显示
在hdfs中创建一个文件目录input,将/usr/local/hadoop/README.txt上传至input中

hdfs dfs -mkdir /inputhadoop fs -put /usr/local/hadoop/README.txt /input

执行以下命令运行wordcount,并将结果输出到output中

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar wordcount /input /output

执行成功后output 目录底下会生成两个文件 _SUCCESS 成功标志的文件,里面没有内容。 一个是 part-r-00000 ,通过以下命令查看执行的结果

hadoop fs -cat /output/part-r-00000
原创粉丝点击