hadoop-2.3.0 配置

来源:互联网 发布:淘宝客如何推广商品 编辑:程序博客网 时间:2024/05/16 17:45

1: 下载 hadoop-2.3.0.tar.gz; ubuntu-12.04.4-desktop-i386.iso

;jdk-7u25-linux-i586.tar.gz;vmware


2:安装vmware, ubuntu,


3:安装JDK7

参考 :http://blog.csdn.net/shaohui2014/article/details/22271845


4:配置hadoop

#hadoop setting 27mar14
export HADOOP_HOME=/home/spishhw/opt/hadoop-2.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH



core-site.xml

<configuration>
    <property>
         <name>fs.defaultFS</name>
         <value>hdfs://localhost:9000</value>
    </property>
</configuration>



hdfs-site.xml

<configuration>
    <property>
        <!-- NameNode工作目录,须预先存在 -->
        <name>dfs.namenode.name.dir</name>
        <value>file:/tmp/hadoop-2.3.0/dfs-name</value>
    </property>
    <property>
        <!-- DataNode工作目录 -->
        <name>dfs.datanode.data.dir</name>
        <value>file:/tmp/hadoop-2.3.0/dfs-data</value>
    </property>
    <property>
        <!-- 文件(副本)的存储数量 -->
        <name>dfs.replication</name>
        <!-- 小于或等于附属机数量。默认3 -->
        <value>1</value>
    </property>
    <property>
        <!-- 可以从网页端监控hdfs -->
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
    </property>
</configuration>


mapred-site.xml

<configuration>
    <property>
        <name>mapred.job.tracker</name>
        <value>master:9001</value>
   </property>
   <property>
        <!-- map-reduce运行框架 -->
       <name>mapreduce.framework.name</name>
        <!-- yarn:分布式模式 -->
    <value>yarn</value>
   </property>
</configuration>

yarn-site.xml

<configuration>
    <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
    </property>
<!-- Site specific YARN configuration properties -->
  <property>
  <description>The address of the applications manager interface in the RM.</description>
  <name>yarn.resourcemanager.address</name>
  <value>127.0.0.1:18040</value>
  </property>

  <property>
  <description>The address of the scheduler interface.</description>
  <name>yarn.resourcemanager.scheduler.address</name>
  <value>127.0.0.1:18030</value>
  </property>

  <property>
  <description>The address of the RM web application.</description>
  <name>yarn.resourcemanager.webapp.address</name>
  <value>127.0.0.1:18088</value>
  </property>
 
  <property>
  <description>The address of the resource tracker interface.</description>
  <name>yarn.resourcemanager.resource-tracker.address</name>
  <value>127.0.0.1:8025</value>
  </property>

<!-- Site specific YARN configuration properties -->
<property>
    <name>yarn.scheduler.minimum-allocation-mb</name>
    <value>128</value>
    <description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-mb</name>
    <value>2048</value>
    <description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
  </property>
  <property>
    <name>yarn.scheduler.minimum-allocation-vcores</name>
    <value>1</value>
    <description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
  </property>
  <property>
    <name>yarn.scheduler.maximum-allocation-vcores</name>
    <value>2</value>
    <description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
  </property>
  <property>
    <name>yarn.nodemanager.resource.memory-mb</name>
    <value>4096</value>
    <description>Physical memory, in MB, to be made available to running containers</description>
  </property>
  <property>
    <name>yarn.nodemanager.resource.cpu-vcores</name>
    <value>4</value>
    <description>Number of CPU cores that can be allocated for containers.</description>
  </property>
</configuration>


创建目录

mkdir -p  tmp/hadoop-2.3.0/dfs-name
mkdir -p tmp/hadoop-2.3.0/dfs-data


启动服务

hdfs namenode -format

hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode    -------当datanode 启动不成功的时候,删除tmp/hadoop-2.3.0/dfs-data 里的数据
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
jps ----当启动成功后,显示如下

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ jps
394 Jps
30015 NodeManager
29843 NameNode
29962 ResourceManager
29897 DataNode

访问下面的

http://localhost:18088/cluster/apps/FINISHED   --------------
http://spishhw-vm:8042/node/allApplications   -----------------
http://127.0.0.1:50070/dfshealth.html#tab-overview  ---------

准备运行 workcount

hdfs dfs -mkdir -p /user/spishhw/input


hadoop fs -put file*.txt /user/spishhw/input

(上传文件到input)

显示文件

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ hadoop fs -ls /user/spishhw/input
Found 2 items
-rw-r--r--   1 spishhw supergroup         36 2014-04-03 17:10 /user/spishhw/input/file1.txt
-rw-r--r--   1 spishhw supergroup         33 2014-04-03 17:10 /user/spishhw/input/file2.txt


hdfs dfsadmin -safemode leave  去除安全模式

sudo ufw disable 关闭 防火墙


要让本机开放SSH服务就需要安装openssh-server:
sudo apt-get install openssh-server


查看sshserver是否运行:

spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ ps -e |grep ssh
 2313 ?        00:00:01 ssh-agent
14480 ?        00:00:00 sshd

运行sshserver:
sudo /etc/init.d/ssh start


go to path to run the sample

cd /opt/hadoop-2.3.0/share/hadoop/mapreduce/sources


hadoop jar hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount  input output


spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop jar hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount  input output
14/04/03 17:11:10 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:18040
14/04/03 17:11:14 INFO input.FileInputFormat: Total input paths to process : 2
14/04/03 17:11:14 INFO mapreduce.JobSubmitter: number of splits:2
14/04/03 17:11:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1396516102162_0001
14/04/03 17:11:16 INFO impl.YarnClientImpl: Submitted application application_1396516102162_0001
14/04/03 17:11:17 INFO mapreduce.Job: The url to track the job: http://spishhw-vm:18088/proxy/application_1396516102162_0001/
14/04/03 17:11:17 INFO mapreduce.Job: Running job: job_1396516102162_0001
14/04/03 17:11:51 INFO mapreduce.Job: Job job_1396516102162_0001 running in uber mode : false
14/04/03 17:11:51 INFO mapreduce.Job:  map 0% reduce 0%
14/04/03 17:12:55 INFO mapreduce.Job:  map 100% reduce 0%
14/04/03 17:13:25 INFO mapreduce.Job:  map 100% reduce 100%
14/04/03 17:13:29 INFO mapreduce.Job: Job job_1396516102162_0001 completed successfully
14/04/03 17:13:30 INFO mapreduce.Job: Counters: 49
    File System Counters
        FILE: Number of bytes read=97
        FILE: Number of bytes written=255490
        FILE: Number of read operations=0
        FILE: Number of large read operations=0
        FILE: Number of write operations=0
        HDFS: Number of bytes read=299
        HDFS: Number of bytes written=50
        HDFS: Number of read operations=9
        HDFS: Number of large read operations=0
        HDFS: Number of write operations=2
    Job Counters
        Launched map tasks=2
        Launched reduce tasks=1
        Data-local map tasks=2
        Total time spent by all maps in occupied slots (ms)=1092408
        Total time spent by all reduces in occupied slots (ms)=130960
        Total time spent by all map tasks (ms)=136551
        Total time spent by all reduce tasks (ms)=16370
        Total vcore-seconds taken by all map tasks=136551
        Total vcore-seconds taken by all reduce tasks=16370
        Total megabyte-seconds taken by all map tasks=139828224
        Total megabyte-seconds taken by all reduce tasks=16762880
    Map-Reduce Framework
        Map input records=6
        Map output records=12
        Map output bytes=117
        Map output materialized bytes=103
        Input split bytes=230
        Combine input records=12
        Combine output records=8
        Reduce input groups=7
        Reduce shuffle bytes=103
        Reduce input records=8
        Reduce output records=7
        Spilled Records=16
        Shuffled Maps =2
        Failed Shuffles=0
        Merged Map outputs=2
        GC time elapsed (ms)=862
        CPU time spent (ms)=6640
        Physical memory (bytes) snapshot=352227328
        Virtual memory (bytes) snapshot=1084698624
        Total committed heap usage (bytes)=257761280
    Shuffle Errors
        BAD_ID=0
        CONNECTION=0
        IO_ERROR=0
        WRONG_LENGTH=0
        WRONG_MAP=0
        WRONG_REDUCE=0
    File Input Format Counters
        Bytes Read=69
    File Output Format Counters
        Bytes Written=50


查看输出结果
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -ls /user/spishhw/output
Found 2 items
-rw-r--r--   1 spishhw supergroup          0 2014-04-03 17:13 /user/spishhw/output/_SUCCESS
-rw-r--r--   1 spishhw supergroup         50 2014-04-03 17:13 /user/spishhw/output/part-r-00000
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -more /user/spishhw/output/part-r-00000
-more: Unknown command
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -cat /user/spishhw/output/part-r-00000
2.3    1
bill    1
fail    1
hadoop    4
hello    3
ok    1
world    1
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$


0 0
原创粉丝点击