hadoop伪分布式试运行
来源:互联网 发布:恩典壁纸软件 编辑:程序博客网 时间:2024/05/16 12:32
伪分布式读取的是HDFS上的数据,要使用HDFS,首先在HDFS中创建用户目录:
hdfs dfs -mkdir -p /user/hadoop
接着可以将本地文件作为输入文件复制到HDFS中,比如将hadoop的配置xml文件复制到HDFS的/user/hadoop/input中。
hdfs dfs -mkdir inputhdfs dfs -put ./etc/hadoop/*.xml input
伪分布式运行MapReduce作业的方式和单机模式相同,不过伪分布式读写的是HDFS。
hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.0.0-alpha4.jar wordcount input output
理论上是可以了,等着跑完在output下找结果。但是报这样的错:
2017-11-03 16:56:34,091 INFO mapreduce.Job: Job job_1509699271441_0001 failed with state FAILED due to: Application application_1509699271441_0001 failed 2 times due to AM Container for appattempt_1509699271441_0001_000002 exited with exitCode: 1Failing this attempt.Diagnostics: [2017-11-03 16:56:33.411]Exception from container-launch.Container id: container_1509699271441_0001_02_000001Exit code: 1Stack trace: ExitCodeException exitCode=1: at org.apache.hadoop.util.Shell.runCommand(Shell.java:994) at org.apache.hadoop.util.Shell.run(Shell.java:887) at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:1212) at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:295) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.launchContainer(ContainerLaunch.java:455) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:275) at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:90) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748)
解决办法呢,是按官网分别给mapred-site.xml和yarn-site.xml加了这样的配置:
<property> <name>mapreduce.application.classpath</name> <value> /usr/local/hadoop/share/hadoop/mapreduce/*, /usr/local/hadoop/share/hadoop/mapreduce/lib/* </value></property>
<property> <name>yarn.application.classpath</name> <value> /usr/local/hadoop/etc/hadoop, /usr/local/hadoop/share/hadoop/common/*, /usr/local/hadoop/share/hadoop/common/lib/*, /usr/local/hadoop/share/hadoop/hdfs/*, /usr/local/hadoop/share/hadoop/hdfs/lib/*, /usr/local/hadoop/share/hadoop/yarn/*, /usr/local/hadoop/share/hadoop/yarn/lib/* </value></property>
不要像官网使用类似$HADOOP_HOME这样的,还是得写绝对路径,试了很多遍了,前面那种没用。 配置好重启以后,进入下一个坑。运行mapreduce后,看着是正常了,但是发现结束后output是空的。看了日志发现这个:
2017-11-03 21:25:37,004 INFO mapreduce.Job: Task Id : attempt_1509715385291_0001_m_000005_2, Status : FAILED[2017-11-03 21:25:34.662]Container [pid=17753,containerID=container_1509715385291_0001_01_000018] is running beyond virtual memory limits. Current usage: 126.4 MB of 1 GB physical memory used; 2.4 GB of 2.1 GB virtual memory used. Killing container.
网上查了下,大概是说虚拟内存溢出吧。需要配置yarn-site.xml中的yarn.nodemanager.vmem-check-enabled,这个默认是true的,需要加上false的配置。
<property> <name>yarn.nodemanager.vmem-check-enabled</name> <value>false</value></property>
OK,到这里终于是正常跑完了wordcount。可以查看输出结果:
hdfs dfs -cat output/*
也可以从hdfs中拿到本地:
hdfs dfs -get output ./output
OVER!
阅读全文
0 0
- hadoop伪分布式试运行
- hadoop伪分布式部署
- hadoop伪分布式配置
- Hadoop 伪分布式安装
- hadoop伪分布式部署
- hadoop 伪分布式安装
- Hadoop 伪分布式安装
- HADOOP伪分布式安装
- hadoop伪分布式搭建
- HADOOP伪分布式配置
- hadoop伪分布式操作方法
- hadoop伪分布式配置
- hadoop伪分布式安装
- Hadoop伪分布式安装
- hadoop伪分布式搭建
- Hadoop伪分布式安装
- hadoop 伪分布式搭建
- hadoop伪分布式安装
- PEP8 Python 编码规范
- JAVA用线程来反击钓鱼网站。
- 正则表达式原理
- ubuntu: 控制&查看 进程
- 支持向量机
- hadoop伪分布式试运行
- 数据库架构对比
- 反向遍历list的思路及js及json的总结
- SGD(Stochastic Gradient Descent)随机梯度下降
- 生成验证码工具类
- Java性能调优的11个实用技巧
- 实战cetons中搭建phpmyadmin
- 布局(一)float/absolute/relative的原理
- Word2vec安装使用