hadoop安装部署(伪分布及集群)
来源:互联网 发布:js setcookie 作用域 编辑:程序博客网 时间:2024/04/30 08:05
hadoop安装部署(伪分布及集群)
@(HADOOP)[hadoop]
- hadoop安装部署伪分布及集群
- 第一部分伪分布式
- 一环境准备
- 二安装hdfs
- 三安装YARN
- 第二部分集群安装
- 一规划
- 一硬件资源
- 二基本资料
- 二环境配置
- 一统一用户名密码并为jediael赋予执行所有命令的权限
- 二创建目录mntjediael
- 三修改用户名及etchosts文件
- 一规划
第一部分:伪分布式
一、环境准备
1、安装linux、jdk
2、下载hadoop2.6.0,并解压
3、配置免密码ssh
(1)检查是否可以免密码:
$ ssh localhost
(2)若否:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa$ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
4、在/etc/profile中添加以下内容
#hadoop settingexport PATH=$PATH:/mnt/jediael/hadoop-2.6.0/bin:/mnt/jediael/hadoop-2.6.0/sbinexport HADOOP_HOME=/mnt/jediael/hadoop-2.6.0export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/nativeexport HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
二、安装hdfs
1、配置etc/hadoop/core-site.xml:
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value> </property></configuration>
2、配置etc/hadoop/hdfs-site.xml:
<configuration> <property> <name>dfs.replication</name> <value>1</value> </property></configuration>
3、格式化namenode
$ bin/hdfs namenode -format
4、启动hdfs
$ sbin/start-dfs.sh
5、打开页面验证hdfs安装成功
http://localhost:50070/
6、运行自带示例
(1)创建目录
$ bin/hdfs dfs -mkdir /user$ bin/hdfs dfs -mkdir /user/jediael
(2)复制文件
bin/hdfs dfs -put etc/hadoop input
(3)运行示例
$ bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar grep input output 'dfs[a-z.]+’
(4)检查输出结果
$ bin/hdfs dfs -cat output/*6 dfs.audit.logger4 dfs.class3 dfs.server.namenode.2 dfs.period2 dfs.audit.log.maxfilesize2 dfs.audit.log.maxbackupindex1 dfsmetrics.log1 dfsadmin1 dfs.servers1 dfs.replication1 dfs.file
(5)关闭hdfs
$ sbin/stop-dfs.sh
三、安装YARN
1、配置etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
2、配置etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property></configuration>
3、启动yarn
$ sbin/start-yarn.sh
4、打开页面检查yarn
http://localhost:8088/
5、运行一个map-reduce job
$ bin/hadoop fs -mkdir /input$ bin/hadoop fs -copyFromLocal /etc/profile /input$ cd /mnt/jediael/hadoop-2.6.0/share/hadoop/mapreduce$ /mnt/jediael/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output
查看结果:
$/mnt/jediael/hadoop-2.6.0/bin/hadoop fs -cat /output/*
第二部分:集群安装
一、规划
(一)硬件资源
10.171.29.191 master
10.171.94.155 slave1
10.251.0.197 slave3
(二)基本资料
用户: jediael
目录:/mnt/jediael/
二、环境配置
(一)统一用户名密码,并为jediael赋予执行所有命令的权限
#passwd # useradd jediael # passwd jediael # vi /etc/sudoers
增加以下一行:
jediael ALL=(ALL) ALL
(二)创建目录/mnt/jediael
$sudo chown jediael:jediael /opt $ cd /opt $ sudo mkdir jediael
注意:/opt必须是jediael的,否则会在format namenode时出错。
(三)修改用户名及/etc/hosts文件
1、修改/etc/sysconfig/network
NETWORKING=yes HOSTNAME=*******
2、修改/etc/hosts
10.171.29.191 master
10.171.94.155 slave1
10.251.0.197 slave3
注 意hosts文件不能有127.0.0.1 *配置,否则会导致出现异常。org.apache.hadoop.ipc.Client: Retrying connect to server: master/10.171.29.191:9000. Already trie
3、hostname命令
hostname ****
(四)配置免密码登录
以上命令在master上使用jediael用户执行:
$ ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa $ cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys 然后,将authorized_keys复制到slave1,slave2scp ~/.ssh/authorized_keys slave1:~/.ssh/ scp ~/.ssh/authorized_keys slave2:~/.ssh/
注意
(1)若提示.ssh目录不存在,则表示此机器从未运行过ssh,因此运行一次即可创建.ssh目录。
(2).ssh/的权限为600,authorized_keys的权限为700,权限大了小了都不行。
(五)在3台机器上分别安装java,并设置相关环境变量
参考http://blog.csdn.net/jediael_lu/article/details/38925871
(六)下载hadoop-2.6.0.tar.gz,并将其解压到/mnt/jediael
wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-2.6.0/hadoop-2.6.0.tar.gz
tar -zxvf hadoop-2.6.0.tar.gz
三、修改配置文件
【3台机器上均要执行,一般先在一台机器上配置完成,再用scp复制到其它机器】
(一)hadoop_env.sh
export JAVA_HOME=/usr/java/jdk1.7.0_51
(二)修改core-site.xml
<property> <name>hadoop.tmp.dir</name> <value>/mnt/tmp</value> <description>Abase for other temporary directories.</description> </property> <property> <name>fs.defaultFS</name> <value>hdfs://master:9000</value> </property> <property> <name>io.file.buffer.size</name> <value>4096</value> </property>
(三)修改hdfs-site.xml
<property> <name>dfs.replication</name> <value>2</value> </property>
(四)修改mapred-site.xml
<property> <name>mapreduce.framework.name</name> <value>yarn</value> <final>true</final> </property><property> <name>mapreduce.jobtracker.http.address</name> <value>master:50030</value></property><property> <name>mapreduce.jobhistory.address</name> <value>master:10020</value></property><property> <name>mapreduce.jobhistory.webapp.address</name> <value>master:19888</value></property> <property> <name>mapred.job.tracker</name> <value>http://master:9001</value> </property>
(五)修改yarn.xml
<property> <name>yarn.resourcemanager.hostname</name> <value>master</value> </property><property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value></property><property> <name>yarn.resourcemanager.address</name> <value>master:8032</value></property><property> <name>yarn.resourcemanager.scheduler.address</name> <value>master:8030</value></property><property> <name>yarn.resourcemanager.resource-tracker.address</name> <value>master:8031</value></property><property> <name>yarn.resourcemanager.admin.address</name> <value>master:8033</value></property><property> <name>yarn.resourcemanager.webapp.address</name> <value>master:8088</value></property>
(六)修改slaves
slaves:
slave1 slave3
四、启动并验证
1、格式 化namenode
[jediael@master hadoop-1.2.1]$ bin/hadoop namenode -format
2、启动hadoop【此步骤只需要在master上执行】
[jediael@master hadoop-1.2.1]$ bin/start-all.sh
3、验证1:向hdfs中写入内容
[jediael@master hadoop-2.6.0]$ bin/hadoop fs -ls /[jediael@master hadoop-2.6.0]$ bin/hadoop fs -mkdir /test[jediael@master hadoop-2.6.0]$ bin/hadoop fs -ls / Found 1 itemsdrwxr-xr-x - jediael supergroup 0 2015-04-19 23:41 /test
4、验证:登录页面
NameNode http://ip:50070
5、查看各个主机的java进程
(1)master:
$ jps3694 NameNode3882 SecondaryNameNode7216 Jps4024 ResourceManager
(2)slave1:
$ jps1913 NodeManager2673 Jps1801 DataNode
(3)slave3:
$ jps1942 NodeManager2252 Jps1840 DataNode
五、运行一个完整的mapreduce程序:运行自带的wordcount程序
$ bin/hadoop fs -mkdir /input$ bin/hadoop fs -ls / Found 2 itemsdrwxr-xr-x - jediael supergroup 0 2015-04-20 18:04 /inputdrwxr-xr-x - jediael supergroup 0 2015-04-19 23:41 /test$ bin/hadoop fs -copyFromLocal etc/hadoop/mapred-site.xml.template /input $ pwd/mnt/jediael/hadoop-2.6.0/share/hadoop/mapreduce$ /mnt/jediael/hadoop-2.6.0/bin/hadoop jar hadoop-mapreduce-examples-2.6.0.jar wordcount /input /output15/04/20 18:15:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable15/04/20 18:15:48 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id15/04/20 18:15:48 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=15/04/20 18:15:49 INFO input.FileInputFormat: Total input paths to process : 115/04/20 18:15:49 INFO mapreduce.JobSubmitter: number of splits:115/04/20 18:15:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local657082309_000115/04/20 18:15:50 INFO mapreduce.Job: The url to track the job: http://localhost:8080/15/04/20 18:15:50 INFO mapreduce.Job: Running job: job_local657082309_000115/04/20 18:15:50 INFO mapred.LocalJobRunner: OutputCommitter set in config null15/04/20 18:15:50 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter15/04/20 18:15:50 INFO mapred.LocalJobRunner: Waiting for map tasks15/04/20 18:15:50 INFO mapred.LocalJobRunner: Starting task: attempt_local657082309_0001_m_000000_015/04/20 18:15:50 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]15/04/20 18:15:50 INFO mapred.MapTask: Processing split: hdfs://master:9000/input/mapred-site.xml.template:0+226815/04/20 18:15:51 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)15/04/20 18:15:51 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 10015/04/20 18:15:51 INFO mapred.MapTask: soft limit at 8388608015/04/20 18:15:51 INFO mapred.MapTask: bufstart = 0; bufvoid = 10485760015/04/20 18:15:51 INFO mapred.MapTask: kvstart = 26214396; length = 655360015/04/20 18:15:51 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer15/04/20 18:15:51 INFO mapred.LocalJobRunner:15/04/20 18:15:51 INFO mapred.MapTask: Starting flush of map output15/04/20 18:15:51 INFO mapred.MapTask: Spilling map output15/04/20 18:15:51 INFO mapred.MapTask: bufstart = 0; bufend = 1698; bufvoid = 10485760015/04/20 18:15:51 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26213916(104855664); length = 481/655360015/04/20 18:15:51 INFO mapred.MapTask: Finished spill 015/04/20 18:15:51 INFO mapred.Task: Task:attempt_local657082309_0001_m_000000_0 is done. And is in the process of committing15/04/20 18:15:51 INFO mapred.LocalJobRunner: map15/04/20 18:15:51 INFO mapred.Task: Task 'attempt_local657082309_0001_m_000000_0' done.15/04/20 18:15:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local657082309_0001_m_000000_015/04/20 18:15:51 INFO mapred.LocalJobRunner: map task executor complete.15/04/20 18:15:51 INFO mapred.LocalJobRunner: Waiting for reduce tasks15/04/20 18:15:51 INFO mapred.LocalJobRunner: Starting task: attempt_local657082309_0001_r_000000_015/04/20 18:15:51 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]15/04/20 18:15:51 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@39be5e0115/04/20 18:15:51 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=1015/04/20 18:15:51 INFO reduce.EventFetcher: attempt_local657082309_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events15/04/20 18:15:51 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local657082309_0001_m_000000_0 decomp: 1566 len: 1570 to MEMORY15/04/20 18:15:51 INFO reduce.InMemoryMapOutput: Read 1566 bytes from map-output for attempt_local657082309_0001_m_000000_015/04/20 18:15:51 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 1566, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->156615/04/20 18:15:51 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning15/04/20 18:15:51 INFO mapred.LocalJobRunner: 1 / 1 copied.15/04/20 18:15:51 INFO reduce.MergeManagerImpl: finalMerge called with 1 in-memory map-outputs and 0 on-disk map-outputs15/04/20 18:15:51 INFO mapred.Merger: Merging 1 sorted segments15/04/20 18:15:51 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1560 bytes15/04/20 18:15:51 INFO reduce.MergeManagerImpl: Merged 1 segments, 1566 bytes to disk to satisfy reduce memory limit15/04/20 18:15:51 INFO reduce.MergeManagerImpl: Merging 1 files, 1570 bytes from disk15/04/20 18:15:51 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce15/04/20 18:15:51 INFO mapred.Merger: Merging 1 sorted segments15/04/20 18:15:51 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 1560 bytes15/04/20 18:15:51 INFO mapred.LocalJobRunner: 1 / 1 copied.15/04/20 18:15:51 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords15/04/20 18:15:51 INFO mapreduce.Job: Job job_local657082309_0001 running in uber mode : false15/04/20 18:15:51 INFO mapreduce.Job: map 100% reduce 0%15/04/20 18:15:51 INFO mapred.Task: Task:attempt_local657082309_0001_r_000000_0 is done. And is in the process of committing15/04/20 18:15:51 INFO mapred.LocalJobRunner: 1 / 1 copied.15/04/20 18:15:51 INFO mapred.Task: Task attempt_local657082309_0001_r_000000_0 is allowed to commit now15/04/20 18:15:51 INFO output.FileOutputCommitter: Saved output of task 'attempt_local657082309_0001_r_000000_0' to hdfs://master:9000/output/_temporary/0/task_local657082309_0001_r_00000015/04/20 18:15:51 INFO mapred.LocalJobRunner: reduce > reduce15/04/20 18:15:51 INFO mapred.Task: Task 'attempt_local657082309_0001_r_000000_0' done.15/04/20 18:15:51 INFO mapred.LocalJobRunner: Finishing task: attempt_local657082309_0001_r_000000_015/04/20 18:15:51 INFO mapred.LocalJobRunner: reduce task executor complete.15/04/20 18:15:52 INFO mapreduce.Job: map 100% reduce 100%15/04/20 18:15:52 INFO mapreduce.Job: Job job_local657082309_0001 completed successfully15/04/20 18:15:52 INFO mapreduce.Job: Counters: 38 File System Counters FILE: Number of bytes read=544164 FILE: Number of bytes written=1040966 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=4536 HDFS: Number of bytes written=1196 HDFS: Number of read operations=15 HDFS: Number of large read operations=0 HDFS: Number of write operations=4 Map-Reduce Framework Map input records=43 Map output records=121 Map output bytes=1698 Map output materialized bytes=1570 Input split bytes=114 Combine input records=121 Combine output records=92 Reduce input groups=92 Reduce shuffle bytes=1570 Reduce input records=92 Reduce output records=92 Spilled Records=184 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=123 CPU time spent (ms)=0 Physical memory (bytes) snapshot=0 Virtual memory (bytes) snapshot=0 Total committed heap usage (bytes)=269361152 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=2268 File Output Format Counters $ /mnt/jediael/hadoop-2.6.0/bin/hadoop fs -cat /output/*
- hadoop安装部署(伪分布及集群)
- hadoop安装部署(伪分布及集群)
- Hadoop伪分布集群安装
- hadoop伪分布部署
- redhat 5.4部署单机伪分布Hadoop集群
- 集群安装完毕,该如何测试和使用集群-hadoop单机(伪分布)
- hadoop伪分布安装
- hadoop 伪分布安装
- hadoop伪分布安装
- hadoop伪分布安装
- Hadoop伪分布安装
- hadoop单机部署 伪分布
- ubuntu安装hadoop(伪分布)
- Spark伪分布安装(依赖Hadoop)
- 个人hadoop学习总结:Hadoop集群+HBase集群+Zookeeper集群+chukwa监控(包括单机、伪分布、完全分布安装操作)
- Hadoop伪分布环境部署之Hadoop
- Hadoop伪分布模式安装
- hadoop的伪分布安装
- 【HDU 2553】 N皇后问题
- 用 GitLab CI 进行持续集成
- JProfiler9.1.1安装和远程调试linux环境Tomcat配置
- startService与bindService的区别
- asp.net 验证正则表达式
- hadoop安装部署(伪分布及集群)
- linux下安装配置Java环境
- 自定义控件
- 欧拉函数
- leetcode 240. Search a 2D Matrix II
- Centos7 tftp安装配置
- Arrays转List的那些坑
- hadoop关键进程
- leetcode--Largest Number