MapReduce 开发问题和解决方法汇总

来源:互联网 发布:javascript的作用 编辑:程序博客网 时间:2024/05/26 09:54

在做MapReduce开发的过程中,难免会遇到些问题,这里记录下这些问题及其解决方法

1.找不到ResourceManager

开发好的MapReduce客户端代码打成jar包提交到部署hadoop集群的服务器,运行的时候提示下面的错误:

17/08/08 10:02:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/08/08 10:02:23 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803217/08/08 10:02:25 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/08/08 10:02:26 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/08/08 10:02:27 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/08/08 10:02:28 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)17/08/08 10:02:29 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:8032. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)

分析问题:

在这个错误提示里面有一行关键词-----Connecting to ResourceManager at /0.0.0.0:8032,通过这里可以看出是要尝试连接到ResourceManager ,并且给出端口号8032,我们都知道在Hadoop 2.X里面是用YARN来管理资源的,并且YARN也是主从模式的结构,启动YRAN之后在主节点会产生ResourceManager 进程,在从节点会产生NodeManager进程,通常在配置YARN时,需要指定yarn.resourcemanager.address这个参数,它是ResourceManager 对客户端暴露的地址。客户端通过该地址向RM提交应用程序,杀死应用程序等。它的默认值:${yarn.resourcemanager.hostname}:8032,这里恰好是8032端口,上面的错误信息里面有多次连续的几行都是尝试连接到0.0.0.0/0.0.0.0:8032,说明是在链接ResourceManager。其实分析到这里问题的原因已经很明确了-----ResourceManager进程不存在或者是木有配置YARN.检查yarn-site.xml和yarn-env.sh发现,这俩配置文件里面都没有配置。

解决方法:

配置yarn-site.xml

<configuration>    <property>        <name>yarn.nodemanager.aux-services</name>        <value>mapreduce_shuffle</value>    </property>    <property><name>yarn.resourcemanager.hostname</name><value>node1</value>     </property>      <property>          <name>yarn.resourcemanager.address</name>          <value>node1:8032</value>      </property>      <property>          <name>yarn.resourcemanager.scheduler.address</name>          <value>node1:8030</value>      </property>      <property>          <name>yarn.resourcemanager.resource-tracker.address</name>          <value>node1:8031</value>      </property>      <property>          <name>yarn.resourcemanager.admin.address</name>          <value>node1:8033</value>      </property>      <property>          <name>yarn.resourcemanager.webapp.address</name>          <value>node1:8088</value>      </property> <property>  <name>yarn.log-aggregation-enable</name>  <value>true</value>  </property> </configuration>

配置yarn-env.sh

增加下面的配置项

export JAVA_HOME=/opt/package/jdk1.7.0_76export YARN_LOG_DIR=/opt/package/hadoop-2.7.2/logsexport YARN_ROOT_LOGGER=DEBUG

重新启动集群,并且在启动集群时在node1上执行

start-yarn.sh 在node1上会产生ResourceManager进程,在node2、node3上会产生NodeManager进程, 然后执行客户端MapReduce代码,结果正常.

其实一个HadoopHA 集群启动和停止是有顺序的,我在后面会有专门的文章来说明这种集群的启动和停止顺序。


阅读全文
0 0
原创粉丝点击