大数据学习5:hdfs和yarn 的学习记录

来源:互联网 发布:珠宝设计绘图软件 编辑:程序博客网 时间:2024/05/17 17:46

大数据学习5:hdfs和yarn 的学习记录

=======================

一、hdfs启动过程的解析

二、hdfs配置参数

三、yarn资源调度配置

四、hdfs使用,yarn的任务检查

五、配置过程中的检查

========================

一、hdfs启动过程的解析

在伪分布式hadoop部署中,启动hdfs

 [hadoop@hadoop001 sbin]$ ./start-dfs.sh

Startingnamenodes on [hadoop001]

hadoop001:starting namenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop001.out

localhost:starting datanode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop001.out

Startingsecondary namenodes [0.0.0.0]

0.0.0.0:starting secondarynamenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out

在启动过程中,会发现用到了3种地址,由于是伪分布式,这3个地址应该为同一个地址,如何调整呢?

[hadoop@hadoop001sbin]$ vi start-dfs.sh

#---------------------------------------------------------

# namenodes

 

NAMENODES=$($HADOOP_PREFIX/bin/hdfs getconf -namenodes)

 

echo"Starting namenodes on [$NAMENODES]"

 

"$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\

  --config "$HADOOP_CONF_DIR" \

  --hostnames "$NAMENODES" \

  --script "$bin/hdfs" start namenode$nameStartOpt

 

#---------------------------------------------------------

# datanodes (using default slaves file)

 

if [ -n"$HADOOP_SECURE_DN_USER" ]; then

  echo \

    "Attempting to start secure cluster,skipping datanodes. " \

    "Run start-secure-dns.sh as root tocomplete startup."

else

  "$HADOOP_PREFIX/sbin/hadoop-daemons.sh"\

    --config "$HADOOP_CONF_DIR" \

    --script "$bin/hdfs" startdatanode $dataStartOpt

fi

 

#---------------------------------------------------------

# secondary namenodes (if any)

 

SECONDARY_NAMENODES=$($HADOOP_PREFIX/bin/hdfsgetconf -secondarynamenodes 2>/dev/null)

 

if [ -n"$SECONDARY_NAMENODES" ]; then

  echo "Starting secondary namenodes[$SECONDARY_NAMENODES]"

 

 "$HADOOP_PREFIX/sbin/hadoop-daemons.sh" \

      --config "$HADOOP_CONF_DIR" \

      --hostnames"$SECONDARY_NAMENODES" \

      --script "$bin/hdfs" startsecondarynamenode

fi

 

#---------------------------------------------------------

有此可见,在启动脚本中,打印出来的几个地址就是脚本中获得的。

namenode 为 $HADOOP_PREFIX/bin/hdfsgetconf –namenodes

datanode  为 /opt/software/hadoop/etc/hadoop/slaves

secondary namenode 为默认的配置,在hdfs-site.xml中要手动配置,可以在hadoop.apache.org-> document-> stable中,右侧最下方几个默认参数xml列表中找到默认配置。

dfs.namenode.secondary.http-address         0.0.0.0:50090 Thesecondary namenode http server address and port.

dfs.namenode.secondary.https-address       0.0.0.0:50091 Thesecondary namenode HTTPS server address and port.

这里手动配置hdfs-site.xml

[hadoop@hadoop001hadoop]$ vi hdfs-site.xml ,在configuration中添加

<configuration>

    <property>

       <name>dfs.replication</name>

        <value>1</value>

    </property>

    <property>

       <name>dfs.namenode.secondary.http-address</name>

        <value>192.168.137.11:50090</value>

    </property>

    <property>

       <name>dfs.namenode.secondary.https-address</name>

       <value>192.168.137.11:50091</value>

    </property>

</configuration>

重新启动(stop-dfs.sh,start-dfs.sh)后为:

[hadoop@hadoop001sbin]$ ./start-dfs.sh

Startingnamenodes on [hadoop001]

hadoop001:starting namenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-namenode-hadoop001.out

hadoop001:starting datanode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-datanode-hadoop001.out

Startingsecondary namenodes [hadoop001]

hadoop001:starting secondarynamenode, logging to/opt/software/hadoop-2.8.1/logs/hadoop-hadoop-secondarynamenode-hadoop001.out

 

登陆http://192.168.137.11:50070/ 可检查hdfs 情况

 

二、          hdfs配置参数

在hadoop.apache.org->document-> stable中,右侧最下方几个默认参数xml列表中找到默认配置。

把对应的参数配置到对应的xml文件中即可。格式如下:

<configuration>

    <property>

       <name>dfs.replication</name>

        <value>1</value>

    </property>

    <property>

       <name>dfs.namenode.secondary.http-address</name>

        <value>192.168.137.11:50090</value>

    </property>

    <property>

       <name>dfs.namenode.secondary.http-address</name>

       <value>192.168.137.11:50091</value>

    </property>

</configuration>

 

三、          yarn资源调度配置

在hadoop.apache.org中,找到yarn配置部分,按照官方文档进行配置。

 [这里注意,可能只有mapred-site.xml.template,yarn-site.xml.template文件,拷贝一份改名去掉.template]增加配置

修改etc/hadoop/mapred-site.xml:

<configuration>

    <property>

       <name>mapreduce.framework.name</name>

        <value>yarn</value>

    </property>

</configuration>

修改etc/hadoop/yarn-site.xml:

<configuration>

    <property>

       <name>yarn.nodemanager.aux-services</name>

        <value>mapreduce_shuffle</value>

    </property>

</configuration>

启动

$sbin/start-yarn.sh

查看

ResourceManager- http://192.168.137.11:8088/

报错的话看第五部分。

 

四、          hdfs使用,yarn的任务检查

(1)    开始一个任务:

进入这个目录,可以找到例子程序jar

[hadoop@hadoop001hadoop]$ cd /opt/software/hadoop-2.8.1/share/hadoop/mapreduce

[hadoop@hadoop001mapreduce]$ hadoop jar hadoop-mapreduce-examples-2.8.1.jar pi 10 10

求π。

(2)    检查任务:

runing

finish

点击任务连接,还可以显示详情,可以点击log等信息。


注意:在本处,点击log可能会网页打不开,是因为登陆电脑没有做DNS解析,无法识别hadoop中主机名,在win7 中修改hosts文件,添加映射关系即可。

C:\Windows\System32\drivers\etc\hosts中添加

192.168.137.11hadoop001

 

(3)    hdfs 一些简单使用

上传一个文件到hdfs 根目录:hadoop fs-put hadoop.log /

查看根目录下文件:hadoop fs-ls /

查看一个文件:hadoop fs-cat  /hadoop.log

也可以在网页的browsedirectory 里查看文件,甚至上传下载。


 

五、          配置过程中的检查

[root@hadoop001~]# cat /opt/software/hadoop-2.8.1/logs/yarn-hadoop-resourcemanager-hadoop001.out

[FatalError] yarn-site.xml:20:6:The markup in the documentfollowingthe root element must be well-formed.

这里的意思是yarn-site.xml文件的第20行有错误,去检查即可。

 

 

阅读全文
0 0