hadoop-2.3.0 配置
来源:互联网 发布:淘宝客如何推广商品 编辑:程序博客网 时间:2024/05/16 17:45
1: 下载 hadoop-2.3.0.tar.gz; ubuntu-12.04.4-desktop-i386.iso
;jdk-7u25-linux-i586.tar.gz;vmware
2:安装vmware, ubuntu,
3:安装JDK7
参考 :http://blog.csdn.net/shaohui2014/article/details/22271845
4:配置hadoop
#hadoop setting 27mar14
export HADOOP_HOME=/home/spishhw/opt/hadoop-2.3.0
export PATH=$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$PATH
core-site.xml
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://localhost:9000</value>
</property>
</configuration>
hdfs-site.xml
<configuration>
<property>
<!-- NameNode工作目录,须预先存在 -->
<name>dfs.namenode.name.dir</name>
<value>file:/tmp/hadoop-2.3.0/dfs-name</value>
</property>
<property>
<!-- DataNode工作目录 -->
<name>dfs.datanode.data.dir</name>
<value>file:/tmp/hadoop-2.3.0/dfs-data</value>
</property>
<property>
<!-- 文件(副本)的存储数量 -->
<name>dfs.replication</name>
<!-- 小于或等于附属机数量。默认3 -->
<value>1</value>
</property>
<property>
<!-- 可以从网页端监控hdfs -->
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
</configuration>
mapred-site.xml
<configuration>
<property>
<name>mapred.job.tracker</name>
<value>master:9001</value>
</property>
<property>
<!-- map-reduce运行框架 -->
<name>mapreduce.framework.name</name>
<!-- yarn:分布式模式 -->
<value>yarn</value>
</property>
</configuration>
yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<!-- Site specific YARN configuration properties -->
<property>
<description>The address of the applications manager interface in the RM.</description>
<name>yarn.resourcemanager.address</name>
<value>127.0.0.1:18040</value>
</property>
<property>
<description>The address of the scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.address</name>
<value>127.0.0.1:18030</value>
</property>
<property>
<description>The address of the RM web application.</description>
<name>yarn.resourcemanager.webapp.address</name>
<value>127.0.0.1:18088</value>
</property>
<property>
<description>The address of the resource tracker interface.</description>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>127.0.0.1:8025</value>
</property>
<!-- Site specific YARN configuration properties -->
<property>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>128</value>
<description>Minimum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
<description>Maximum limit of memory to allocate to each container request at the Resource Manager.</description>
</property>
<property>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
<description>The minimum allocation for every container request at the RM, in terms of virtual CPU cores. Requests lower than this won't take effect, and the specified value will get allocated the minimum.</description>
</property>
<property>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
<description>The maximum allocation for every container request at the RM, in terms of virtual CPU cores. Requests higher than this won't take effect, and will get capped to this value.</description>
</property>
<property>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
<description>Physical memory, in MB, to be made available to running containers</description>
</property>
<property>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
<description>Number of CPU cores that can be allocated for containers.</description>
</property>
</configuration>
创建目录
mkdir -p tmp/hadoop-2.3.0/dfs-name
mkdir -p tmp/hadoop-2.3.0/dfs-data
启动服务
hdfs namenode -format
hadoop-daemon.sh start namenode
hadoop-daemon.sh start datanode -------当datanode 启动不成功的时候,删除tmp/hadoop-2.3.0/dfs-data 里的数据
yarn-daemon.sh start resourcemanager
yarn-daemon.sh start nodemanager
jps ----当启动成功后,显示如下
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ jps
394 Jps
30015 NodeManager
29843 NameNode
29962 ResourceManager
29897 DataNode
访问下面的
http://localhost:18088/cluster/apps/FINISHED --------------
http://spishhw-vm:8042/node/allApplications -----------------
http://127.0.0.1:50070/dfshealth.html#tab-overview ---------
准备运行 workcount
hdfs dfs -mkdir -p /user/spishhw/input
hadoop fs -put file*.txt /user/spishhw/input
(上传文件到input)
显示文件
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ hadoop fs -ls /user/spishhw/input
Found 2 items
-rw-r--r-- 1 spishhw supergroup 36 2014-04-03 17:10 /user/spishhw/input/file1.txt
-rw-r--r-- 1 spishhw supergroup 33 2014-04-03 17:10 /user/spishhw/input/file2.txt
hdfs dfsadmin -safemode leave 去除安全模式
sudo ufw disable 关闭 防火墙
要让本机开放SSH服务就需要安装openssh-server:
sudo apt-get install openssh-server
查看sshserver是否运行:
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/etc/hadoop$ ps -e |grep ssh
2313 ? 00:00:01 ssh-agent
14480 ? 00:00:00 sshd
运行sshserver:
sudo /etc/init.d/ssh start
go to path to run the sample
cd /opt/hadoop-2.3.0/share/hadoop/mapreduce/sources
hadoop jar hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount input output
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop jar hadoop-mapreduce-examples-2.3.0-sources.jar org.apache.hadoop.examples.WordCount input output
14/04/03 17:11:10 INFO client.RMProxy: Connecting to ResourceManager at /127.0.0.1:18040
14/04/03 17:11:14 INFO input.FileInputFormat: Total input paths to process : 2
14/04/03 17:11:14 INFO mapreduce.JobSubmitter: number of splits:2
14/04/03 17:11:14 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1396516102162_0001
14/04/03 17:11:16 INFO impl.YarnClientImpl: Submitted application application_1396516102162_0001
14/04/03 17:11:17 INFO mapreduce.Job: The url to track the job: http://spishhw-vm:18088/proxy/application_1396516102162_0001/
14/04/03 17:11:17 INFO mapreduce.Job: Running job: job_1396516102162_0001
14/04/03 17:11:51 INFO mapreduce.Job: Job job_1396516102162_0001 running in uber mode : false
14/04/03 17:11:51 INFO mapreduce.Job: map 0% reduce 0%
14/04/03 17:12:55 INFO mapreduce.Job: map 100% reduce 0%
14/04/03 17:13:25 INFO mapreduce.Job: map 100% reduce 100%
14/04/03 17:13:29 INFO mapreduce.Job: Job job_1396516102162_0001 completed successfully
14/04/03 17:13:30 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=97
FILE: Number of bytes written=255490
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=299
HDFS: Number of bytes written=50
HDFS: Number of read operations=9
HDFS: Number of large read operations=0
HDFS: Number of write operations=2
Job Counters
Launched map tasks=2
Launched reduce tasks=1
Data-local map tasks=2
Total time spent by all maps in occupied slots (ms)=1092408
Total time spent by all reduces in occupied slots (ms)=130960
Total time spent by all map tasks (ms)=136551
Total time spent by all reduce tasks (ms)=16370
Total vcore-seconds taken by all map tasks=136551
Total vcore-seconds taken by all reduce tasks=16370
Total megabyte-seconds taken by all map tasks=139828224
Total megabyte-seconds taken by all reduce tasks=16762880
Map-Reduce Framework
Map input records=6
Map output records=12
Map output bytes=117
Map output materialized bytes=103
Input split bytes=230
Combine input records=12
Combine output records=8
Reduce input groups=7
Reduce shuffle bytes=103
Reduce input records=8
Reduce output records=7
Spilled Records=16
Shuffled Maps =2
Failed Shuffles=0
Merged Map outputs=2
GC time elapsed (ms)=862
CPU time spent (ms)=6640
Physical memory (bytes) snapshot=352227328
Virtual memory (bytes) snapshot=1084698624
Total committed heap usage (bytes)=257761280
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=69
File Output Format Counters
Bytes Written=50
查看输出结果
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -ls /user/spishhw/output
Found 2 items
-rw-r--r-- 1 spishhw supergroup 0 2014-04-03 17:13 /user/spishhw/output/_SUCCESS
-rw-r--r-- 1 spishhw supergroup 50 2014-04-03 17:13 /user/spishhw/output/part-r-00000
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -more /user/spishhw/output/part-r-00000
-more: Unknown command
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$ hadoop fs -cat /user/spishhw/output/part-r-00000
2.3 1
bill 1
fail 1
hadoop 4
hello 3
ok 1
world 1
spishhw@spishhw-vm:~/opt/hadoop-2.3.0/share/hadoop/mapreduce/sources$
- hadoop-2.3.0 配置
- ubuntu下hadoop 2.3.0配置
- hadoop-2.2.0安装配置
- hadoop 2.2.0 详细配置
- hadoop 2.2.0集群配置
- Hadoop 2.2.0 配置安装
- Hadoop 2.2.0安装配置
- Hadoop 2.2.0 HA配置
- 任务配置 (Hadoop 2.2.0)
- hadoop-2.6.0部署配置
- Hadoop学习---- Mac OSX下Hadoop 2.3.0安装及配置
- centos hadoop-2.3.0(HA+Federation+YARN)配置
- Ubuntu下配置Hadoop 2.3.0单节点模式
- 【hadoop学习】Hadoop配置
- hadoop之hadoop配置
- Hadoop 配置
- hadoop 配置
- hadoop 配置
- IDEA 13.0.3 项目部署细节记录
- Android 倒计时功能的实现(CountDownTimer)
- windows内核情景分析---进程线程
- 企业网站建设的六大价值
- eclipse运行android项目出现"The connection to adb is down, and a severe error has occured."的问题
- hadoop-2.3.0 配置
- 深入理解Java:SimpleDateFormat安全的时间格式化
- LINQ Aggregate 取集合中连续递增记录
- linux shell awk sed 截取需要的内容
- oracle用户密码有效期问题
- windows内核情景分析 --APC
- 集合框架
- android 超强效率、性能优化(转载)
- 广州传智播客0330首期 PHP就业班开班典礼