hadoop-2.3.0 配置

来源：互联网发布：淘宝客如何推广商品编辑：程序博客网时间：2024/05/16 17:45

1：下载 hadoop-2.3.0.tar.gz； ubuntu-12.04.4-desktop-i386.iso

；jdk-7u25-linux-i586.tar.gz；vmware

2：安装vmware, ubuntu,

3:安装JDK7

参考：http://blog.csdn.net/shaohui2014/article/details/22271845

4：配置hadoop

#hadoop setting 27mar14
export HADOOP_HOME=/home/spishhw/opt/hadoop-2.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH

core-site.xml

<configuration>
    <property>
         <name>fs.defaultFS</name>
         <value>hdfs://localhost:9000</value>
    </property>
</configuration>

hdfs-site.xml

<configuration>
   <property>
       
       <name>dfs.namenode.name.dir</name>
       <value>file:/tmp/hadoop-2.3.0/dfs-name</value>
   </property>
   <property>
       
       <name>dfs.datanode.data.dir</name>
       <value>file:/tmp/hadoop-2.3.0/dfs-data</value>
   </property>
   <property>
       
       <name>dfs.replication</name>
       
       <value>1</value>
   </property>
   <property>
       
       <name>dfs.webhdfs.enabled</name>
       <value>true</value>
   </property>
</configuration>

mapred-site.xml

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:9001</value>
   </property>
   <property>
        
       <name>mapreduce.framework.name</name>
        
   <value>yarn</value>
   </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
   <name>yarn.nodemanager.aux-services</name>
   <value>mapreduce_shuffle</value>
    </property>

<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:18040</value>
</property>

<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:18030</value>
</property>

<property>
<description>The address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>127.0.0.1:18088</value>
</property>

<property>
<description>The address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8025</value>
</property>


<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
    <description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>4</value>
    <description>Number of CPU cores that can be allocated for containers.</description>
</property>
</configuration>

创建目录

mkdir -p tmp/hadoop-2.3.0/dfs-name
mkdir -p tmp/hadoop-2.3.0/dfs-data

启动服务

hdfs namenode -format

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode -------当datanode 启动不成功的时候，删除tmp/hadoop-2.3.0/dfs-data 里的数据
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
jps ----当启动成功后，显示如下

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ jps
394 Jps
30015 NodeManager
29843 NameNode
29962 ResourceManager
29897 DataNode

访问下面的

http://localhost:18088/cluster/apps/FINISHED --------------
http://spishhw-vm:8042/node/allApplications -----------------
http://127.0.0.1:50070/dfshealth.html#tab-overview ---------

准备运行 workcount

hdfs dfs -mkdir -p /user/spishhw/input

hadoop fs -put file*.txt /user/spishhw/input

（上传文件到input）

显示文件

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ hadoop fs -ls /user/spishhw/input
Found 2 items
-rw-r--r-- 1 spishhw supergroup 36 2014-04-03 17:10 /user/spishhw/input/file1.txt
-rw-r--r-- 1 spishhw supergroup 33 2014-04-03 17:10 /user/spishhw/input/file2.txt

hdfs dfsadmin -safemode leave 去除安全模式

sudo ufw disable 关闭防火墙

要让本机开放SSH服务就需要安装openssh-server：
sudo apt-get install openssh-server

查看sshserver是否运行：

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ ps -e |grep ssh
2313 ? 00:00:01 ssh-agent
14480 ? 00:00:00 sshd

运行sshserver：
sudo /etc/init.d/ssh start

go to path to run the sample

cd /opt/hadoop-2.3.0/share/hadoop/mapreduce/sources

hadoop jar hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount input output

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop jar hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount input output
14/04/03 17:11:10 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:18040
14/04/03 17:11:14 INFO input.FileInputFormat: Total input paths to process : 2
14/04/03 17:11:14 INFO mapreduce.JobSubmitter: number of splits:2
14/04/03 17:11:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1396516102162_0001
14/04/03 17:11:16 INFO impl.YarnClientImpl: Submitted application application_1396516102162_0001
14/04/03 17:11:17 INFO mapreduce.Job: The url to track the job: http://spishhw-vm:18088/proxy/application_1396516102162_0001/
14/04/03 17:11:17 INFO mapreduce.Job: Running job: job_1396516102162_0001
14/04/03 17:11:51 INFO mapreduce.Job: Job job_1396516102162_0001 running in uber mode : false
14/04/03 17:11:51 INFO mapreduce.Job: map 0% reduce 0%
14/04/03 17:12:55 INFO mapreduce.Job: map 100% reduce 0%
14/04/03 17:13:25 INFO mapreduce.Job: map 100% reduce 100%
14/04/03 17:13:29 INFO mapreduce.Job: Job job_1396516102162_0001 completed successfully
14/04/03 17:13:30 INFO mapreduce.Job: Counters: 49
   File System Counters
       FILE: Number of bytes read=97
       FILE: Number of bytes written=255490
       FILE: Number of read operations=0
       FILE: Number of large read operations=0
       FILE: Number of write operations=0
       HDFS: Number of bytes read=299
       HDFS: Number of bytes written=50
       HDFS: Number of read operations=9
       HDFS: Number of large read operations=0
       HDFS: Number of write operations=2
   Job Counters
       Launched map tasks=2
       Launched reduce tasks=1
       Data-local map tasks=2
       Total time spent by all maps in occupied slots (ms)=1092408
       Total time spent by all reduces in occupied slots (ms)=130960
       Total time spent by all map tasks (ms)=136551
       Total time spent by all reduce tasks (ms)=16370
       Total vcore-seconds taken by all map tasks=136551
       Total vcore-seconds taken by all reduce tasks=16370
       Total megabyte-seconds taken by all map tasks=139828224
       Total megabyte-seconds taken by all reduce tasks=16762880
   Map-Reduce Framework
       Map input records=6
       Map output records=12
       Map output bytes=117
       Map output materialized bytes=103
       Input split bytes=230
       Combine input records=12
       Combine output records=8
       Reduce input groups=7
       Reduce shuffle bytes=103
       Reduce input records=8
       Reduce output records=7
       Spilled Records=16
       Shuffled Maps =2
       Failed Shuffles=0
       Merged Map outputs=2
       GC time elapsed (ms)=862
       CPU time spent (ms)=6640
       Physical memory (bytes) snapshot=352227328
       Virtual memory (bytes) snapshot=1084698624
       Total committed heap usage (bytes)=257761280
   Shuffle Errors
       BAD_ID=0
       CONNECTION=0
       IO_ERROR=0
       WRONG_LENGTH=0
       WRONG_MAP=0
       WRONG_REDUCE=0
   File Input Format Counters
       Bytes Read=69
   File Output Format Counters
       Bytes Written=50

查看输出结果
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -ls /user/spishhw/output
Found 2 items
-rw-r--r--   1 spishhw supergroup          0 2014-04-03 17:13 /user/spishhw/output/_SUCCESS
-rw-r--r--   1 spishhw supergroup         50 2014-04-03 17:13 /user/spishhw/output/part-r-00000
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -more /user/spishhw/output/part-r-00000
-more: Unknown command
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -cat /user/spishhw/output/part-r-00000
2.3   1
bill   1
fail   1
hadoop   4
hello   3
ok   1
world   1
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$

0 0