My Hadoop: Hadoop 0.23 setup
来源:互联网 发布:js图片切换效果代码 编辑:程序博客网 时间:2024/06/14 03:55
1 Download
choose a mirror http://www.apache.org/dyn/closer.cgi/hadoop/core/
download from renren for 0.23 version: hadoop-0.23.0.tar.gz
1.1 untar
tar zxfv hadoop-0.23.0.tar.gz
2 Run first hadoop program (locally)
2.1 compute pi
bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar pi -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars modules/hadoop-mapreduce-client-jobclient-0.23.0.jar 16 10000
Job Finished in 6.014 seconds
Estimated value of Pi is 3.14127500000000000000
2.2 word count
bin/hadoop jar hadoop-mapreduce-examples-0.23.0.jar wordcount -Dmapreduce.clientfactory.class.name=org.apache.hadoop.mapred.YarnClientFactory -libjars modules/hadoop-mapreduce-client-jobclient-0.23.0.jar LICENSE.txt output
Result is in the output dir
congratulations, you get the first MapReduce program.
While we know Hadoop is used in parallel/distributed computing, so next let's configure it one by one.
3 Setup the first node (master)
3.1 SSH
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
id_dsa.pub is the public key of localhost
authorized_keys contains all the public keys trusted in current hosts.
Import the localhost public key into authorized_keys, then you can ssh localhost in passphraseless.
Similarly, you can cat id_dsa.pub to other hosts authorized_keys file. Then you can ssh to other hosts in passphraseless.
3.2 Config HDFS
etc/hadoop/core-site.xml (Default is here)
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://172.16.100.122:9000</value> </property></configuration>
etc/hadoop/hdfs-site.xml (Default is here)
<configuration><property><name>dfs.replication</name><value>1</value></property><property><name>dfs.namenode.name.dir</name><value>file:/home/tntuser/hadoop-0.23.0/data/hdfs/namenode</value></property><property><name>dfs.datanode.data.dir</name><value>file:/home/tntuser/hadoop-0.23.0/data/hdfs/datanode</value></property></configuration>
a full URI is needed for the name dir and data dir.
3.3 Format HDFS
mkdir data/hdfs/namenode
mkdir data/hdfs/datanode
bin/hdfs namenode -format
3.4 Start HDFS
sbin/hadoop-daemon.sh start|stop namenode
sbin/hadoop-daemon.sh start|stop datanode
Check
JPS should show NameNode, DataNode
Run several HDFS command
bin/hadoop fs -ls
bin/hadoop fs -mkdir test
bin/hadoop fs -rm -r test
3.5 Config MapReduce
etc/hadoop/mapred-site.xml
<?xml version="1.0"?><?xml-stylesheet href="configuration.xsl"?><configuration><property><name>mapreduce.framework.name</name><value>yarn</value></property></configuration>
conf/yarn-site.xml
<?xml version="1.0"?><configuration><property><name>yarn.nodemanager.aux-services</name><value>mapreduce.shuffle</value></property><property><name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name><value>org.apache.hadoop.mapred.ShuffleHandler</value></property><property><name>yarn.resourcemanager.resource-tracker.address</name><value>172.16.100.122:8025</value></property><property><name>yarn.resourcemanager.scheduler.address</name><value>172.16.100.122:8030</value></property><property><name>yarn.resourcemanager.address</name><value>172.16.100.122:8040</value></property></configuration>
conf/yarn-env.sh
export HADOOP_CONF_DIR="${HADOOP_CONF_DIR:-$YARN_HOME/etc/hadoop}"
export HADOOP_COMMON_HOME="${HADOOP_COMMON_HOME:-$YARN_HOME}"
export HADOOP_HDFS_HOME="${HADOOP_HDFS_HOME:-$YARN_HOME}"
The conf directory that comes with Hadoop is no longer the default configuration directory. Rather, Hadoop looks in etc/hadoop for configuration files.
sbin/hadoop-daemon.sh call hdfs-config.sh, hdfs-config.sh calls hadoop-config.sh in $HADOOP_COMMON_HOME/libexec/hadoop-config.sh
3.6 Start MapReduce (YARN) Daemon
bin/yarn-daemon.sh start resourcemanager
bin/yarn-daemon.sh start nodemanager
bin/yarn-daemon.sh start historyserver
NodeManage may be fail because of 8080 is used by Tomcat
conf/yarn-env.sh
<property> <name>mapreduce.shuffle.port</name> <value>8090</value></property>
4 Run the hadoop program in single node
MapReduce JobHistory Server http://jhs_host:port/ Default HTTP port is 19888.
See the detail, the task is executed by node.
NameNode http://nn_host:port/ Default HTTP port is 50070,browser HDFS and hdfs nodes
ResourceManager http://rm_host:port/ Default HTTP port is 8088, browser map-reduce nodes
5 Setup the slave node
5.1 untar on the slave
5.2 copy config from master
scp 172.16.100.122:/home/tntuser/hadoop-0.23.0/etc/hadoop/*.xml etc/hadoop
scp 172.16.100.122:/home/tntuser/hadoop-0.23.0/conf/yarn-* conf
5.3 (re) format hdfs on master
shutdown daemons on master first
bin/hdfs namenode -format -clusterid hadoop_cluster
5.4 add slave hosts
conf/slave
172.16.100.122 //master
172.16.100.130
5.5 Start Master Daemons
sbin/hadoop-daemon.sh start|stop namenodesbin/hadoop-daemon.sh start|stop datanodebin/yarn-daemon.sh start resourcemanagerbin/yarn-daemon.sh start nodemanagerbin/yarn-daemon.sh start historyserver
5.6 Start Slave Daemons
sbin/hadoop-daemon.sh start|stop datanodebin/yarn-daemon.sh start nodemanager
6 Run the hadoop program in cluster
issue 1: temp directory already exists
hdfs://172.16.100.122:9000/user/tntuser/QuasiMonteCarlo_TMP_3_141592654 already exists. Please remove it first.
bin/hadoop fs -rm -r QuasiMonteCarlo_TMP_3_141592654
issues 2:
java.io.FileNotFoundException: File does not exist: hdfs://172.16.100.122:9000/user/tntuser/QuasiMonteCarlo_TMP_3_141592654/out/reduce-outat org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:764)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1614)at org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1638)at org.apache.hadoop.examples.QuasiMonteCarlo.estimatePi(QuasiMonteCarlo.java:314)at org.apache.hadoop.examples.QuasiMonteCarlo.run(QuasiMonteCarlo.java:351)at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:69)at org.apache.hadoop.examples.QuasiMonteCarlo.main(QuasiMonteCarlo.java:360)at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:4
1) dns config /etc/resolve.conf, make sure the dns nameserver is right
2) add master/slave hostname to each others /etc/hosts
172.16.100.122 dev122
172.16.100.130 dev130
3) check the hadoop slaves config file conf/slaves, make sure the hostname or ip is right
Reference
http://www.crobak.org/2011/12/getting-started-with-apache-hadoop-0-23-0/
http://www.cloudera.com/blog/2011/11/building-and-deploying-mr2/
http://www.rpark.com/2011/05/building-hadoop-cluster.htmlhttp://hadoop.apache.org/common/docs/current/single_node_setup.html
http://hadoop.apache.org/common/docs/current/cluster_setup.html
http://hadoop.apache.org/common/docs/r0.23.0/hadoop-yarn/hadoop-yarn-site/SingleCluster.html
- My Hadoop: Hadoop 0.23 setup
- Hadoop Setup
- hadoop map/reduce setup
- hadoop- cluster setup
- Hadoop - single node setup
- hadoop -- setup and configuration
- setup cluster -- configure hadoop
- Hadoop 2.2 Single-Node Setup
- hadoop中的setup()和cleanup()
- spark implementation hadoop setup,cleanup
- How to Setup Nutch and Hadoop
- hadoop 单节点安装 Single Node Setup
- hadoop 2.2.0 cluster setup-linux
- hadoop中mapper类的setup方法
- Hadoop V2.0.3 Cluster Setup Guide
- Installation. Hadoop-2.2.0 Cluster Setup
- setup hadoop 1.2.1 on Mac
- Hadoop Environment Setup(VM fushion. Centos7)
- 博客迁移至http://www.doxer.org
- Bss ssid wep wpa wpa2 wap
- 进程监视类ProceWatcher
- 关于QQ表情数据库的复制
- ndk 用第三方so
- My Hadoop: Hadoop 0.23 setup
- 数据库三范式经典实例解析
- std::runtime except
- 引入参数对象
- 图片1
- 利用任务计划定时重启服务器
- 使用 Felix 和 Struts2 开发 Web 应用
- 命令行运行vbs脚本并传参数给vbs中的变量简单示例
- 打印出匹配到该字符串的文件名 find, xargs, grep