Hadoop伪分布式安装过程
来源:互联网 发布:csol抽奖软件 编辑:程序博客网 时间:2024/05/22 12:05
随着互联网的普及,前期的数据积累已经到达了量的积累到质的飞跃的临界时期。Volume:数据量;Velocity:增长迅速;Variety:种类增多;Value:巨大的价值。数据的4v越来越引起我们的关注。数据量从G到T单机都可以挺好的解决但是,现在的数据量已经到达PB EB的数量级,不能简单的通过提高单机硬件来处理数据,消耗的时间代价是我们不能忍受的,于是Hadoop就应用而生:Google的三篇学术论文引领了行业的变革(GFS,MapReduce,BigTable)。 Google的三大核心技术MapReduce、GFS和BigTable的论文都已经被翻译成高质量的中文,更巧的是,这三篇中译版的原发地都是CSDN的Blog。其中最新的一篇是张凌云在一个月之前发表的MapReduce论文,最早的一篇是Xu Lei发表于2005年11月的GFS论文。这三篇论文翻译质量相当高。
MapReduce:http://blog.csdn.net/active1001/archive/2007/07/02/1675920.aspx;
GFS:http://blog.csdn.net/xuleicsu/archive/2005/11/10/526386.aspx;
BigTale:http://blog.csdn.net/accesine960/archive/2006/02/09/595628.aspx。
Doug Cutting根据论文思想开发了Hadoop(HDFS,MapReduce,HBase)
最简化配置Hadoop伪分布式集群步骤:
环境:centos6.7+hadoop2.7.2
步骤一:配置JAVA JDK ,修改主机名,SSH免登录
~/.bashrc环境变量[root@master home]#vi ~/.bashrc export HADOOP_HOME=/usr/local/hadoop-2.7.2 export JAVA_HOME=/home/work/jdk1.7.0 export PATH=$JAVA_HOME/bin:$HADOOP_HOME/bin:$PATH环境变量生效 [root@master jdk1.6.0_25]# source ~/.bashrc [root@master jdk1.6.0_25]# javac修改主机名 [root@masternetwork-scripts]# vi /etc/sysconfig/network NETWORKING=yes HOSTNAME=localhost.hadoop [root@master network-scripts]# hostname设置hosts文件: [root@localhost ~]# vim /etc/hosts 192.168.145.129 localhost localhost.hadoop配置SSH实现基于公钥方式无密码登录生成公钥 [root@master ~]# ssh-keygen -t rsa -P '' -f ~/.ssh/id_rsa #生成SSH认证公钥,连续回车即可 [root@master ~]#cd ~/.ssh #切换至root用户的ssh配置目录 [root@master ~]#ls id_rsa id_rsa.pub配置授权 [root@master ~]#cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys [root@master ~]# chmod 700 ~/.ssh [root@master ~]# chmod 600 ~/.ssh/authorized_keys测试 [root@master ~]#ssh localhost
步骤二:解压hadoop压缩包,配置配置文件
1.[root@localhost hadoop-2.7.2]# vim etc/hadoop/core-site.xml<property> <name>hadoop.tmp.dir</name> <value>file:/home/work/dfs/tmp</value></property><property> <name>fs.defaultFS</name> <value>hdfs://localhost:9000</value></property>2.[root@localhost hadoop-2.7.2]# vim etc/hadoop/hdfs-site.xml<property> <name>dfs.replication</name> <value>1</value></property><property> <name>dfs.namenode.name.dir</name> <value>file:/home/work/dfs/name</value></property><property> <name>dfs.namenode.data.dir</name> <value>file:/home/work/dfs/data</value></property>3.[root@localhost hadoop-2.7.2]# vim etc/hadoop/hadoop-env.shexport JAVA_HOME=/home/...../jdk1.7.0_45(不设置也可以)4.hadoop-2.7.2/etc/hadoop/slaves 文件中添加192.168.145.129(自己的datanote节点ip地址)5.执行bin/hadoop namenode -format 格式化hadoop6.sbin/start-dfs.sh //启动hdfs文件系统7.jps //查看hadoop的相关进程存储进程namenodesecondearynamenodeDatanode运算:resourcemanagernodemanager8.bin/hadoop fs -ls / //查看hdfs文件系统的信息,第一次执行为空,因为刚格式化好,里面没放入文件更多命令通过帮助命令: hadoop fs -helphttp://localhost:50070http://localhost:8088http://localhost:9000all link is ok
用验证YARN:
运行一下hadoop提供的demo中的WordCount程序:
在linux中创建一个input文件夹 里面放两位文件 test1.txt 内容是hello word test2.txt内容是hello hadoop
[root@localhost hadoop-2.7.2]# hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /in/* /out16/04/15 10:13:47 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable16/04/15 10:13:49 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id16/04/15 10:13:49 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=16/04/15 10:13:50 INFO input.FileInputFormat: Total input paths to process : 216/04/15 10:13:50 INFO mapreduce.JobSubmitter: number of splits:216/04/15 10:13:51 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local866631907_000116/04/15 10:13:51 INFO mapreduce.Job: The url to track the job: http://localhost:8080/16/04/15 10:13:51 INFO mapreduce.Job: Running job: job_local866631907_000116/04/15 10:13:51 INFO mapred.LocalJobRunner: OutputCommitter set in config null16/04/15 10:13:51 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 116/04/15 10:13:51 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.hadoop.mapreduce.lib.output.FileOutputCommitter16/04/15 10:13:52 INFO mapred.LocalJobRunner: Waiting for map tasks16/04/15 10:13:52 INFO mapred.LocalJobRunner: Starting task: attempt_local866631907_0001_m_000000_016/04/15 10:13:52 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 116/04/15 10:13:52 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]16/04/15 10:13:52 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/in/input/test2.txt:0+1316/04/15 10:13:52 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)16/04/15 10:13:52 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 10016/04/15 10:13:52 INFO mapred.MapTask: soft limit at 8388608016/04/15 10:13:52 INFO mapred.MapTask: bufstart = 0; bufvoid = 10485760016/04/15 10:13:52 INFO mapred.MapTask: kvstart = 26214396; length = 655360016/04/15 10:13:52 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer16/04/15 10:13:52 INFO mapreduce.Job: Job job_local866631907_0001 running in uber mode : false16/04/15 10:13:53 INFO mapreduce.Job: map 0% reduce 0%16/04/15 10:13:54 INFO mapred.LocalJobRunner:16/04/15 10:13:54 INFO mapred.MapTask: Starting flush of map output16/04/15 10:13:54 INFO mapred.MapTask: Spilling map output16/04/15 10:13:54 INFO mapred.MapTask: bufstart = 0; bufend = 21; bufvoid = 10485760016/04/15 10:13:54 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/655360016/04/15 10:13:54 INFO mapred.MapTask: Finished spill 016/04/15 10:13:55 INFO mapred.Task: Task:attempt_local866631907_0001_m_000000_0 is done. And is in the process of committing16/04/15 10:13:55 INFO mapred.LocalJobRunner: map16/04/15 10:13:55 INFO mapred.Task: Task 'attempt_local866631907_0001_m_000000_0' done.16/04/15 10:13:55 INFO mapred.LocalJobRunner: Finishing task: attempt_local866631907_0001_m_000000_016/04/15 10:13:55 INFO mapred.LocalJobRunner: Starting task: attempt_local866631907_0001_m_000001_016/04/15 10:13:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 116/04/15 10:13:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]16/04/15 10:13:55 INFO mapred.MapTask: Processing split: hdfs://localhost:9000/in/input/test1.txt:0+1116/04/15 10:13:55 INFO mapred.MapTask: (EQUATOR) 0 kvi 26214396(104857584)16/04/15 10:13:55 INFO mapred.MapTask: mapreduce.task.io.sort.mb: 10016/04/15 10:13:55 INFO mapred.MapTask: soft limit at 8388608016/04/15 10:13:55 INFO mapred.MapTask: bufstart = 0; bufvoid = 10485760016/04/15 10:13:55 INFO mapred.MapTask: kvstart = 26214396; length = 655360016/04/15 10:13:55 INFO mapred.MapTask: Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer16/04/15 10:13:55 INFO mapreduce.Job: map 100% reduce 0%16/04/15 10:13:55 INFO mapred.LocalJobRunner:16/04/15 10:13:55 INFO mapred.MapTask: Starting flush of map output16/04/15 10:13:55 INFO mapred.MapTask: Spilling map output16/04/15 10:13:55 INFO mapred.MapTask: bufstart = 0; bufend = 19; bufvoid = 10485760016/04/15 10:13:55 INFO mapred.MapTask: kvstart = 26214396(104857584); kvend = 26214392(104857568); length = 5/655360016/04/15 10:13:55 INFO mapred.MapTask: Finished spill 016/04/15 10:13:55 INFO mapred.Task: Task:attempt_local866631907_0001_m_000001_0 is done. And is in the process of committing16/04/15 10:13:55 INFO mapred.LocalJobRunner: map16/04/15 10:13:55 INFO mapred.Task: Task 'attempt_local866631907_0001_m_000001_0' done.16/04/15 10:13:55 INFO mapred.LocalJobRunner: Finishing task: attempt_local866631907_0001_m_000001_016/04/15 10:13:55 INFO mapred.LocalJobRunner: map task executor complete.16/04/15 10:13:55 INFO mapred.LocalJobRunner: Waiting for reduce tasks16/04/15 10:13:55 INFO mapred.LocalJobRunner: Starting task: attempt_local866631907_0001_r_000000_016/04/15 10:13:55 INFO output.FileOutputCommitter: File Output Committer Algorithm version is 116/04/15 10:13:55 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]16/04/15 10:13:55 INFO mapred.ReduceTask: Using ShuffleConsumerPlugin: org.apache.hadoop.mapreduce.task.reduce.Shuffle@416f86a016/04/15 10:13:55 INFO reduce.MergeManagerImpl: MergerManager: memoryLimit=363285696, maxSingleShuffleLimit=90821424, mergeThreshold=239768576, ioSortFactor=10, memToMemMergeOutputsThreshold=1016/04/15 10:13:55 INFO reduce.EventFetcher: attempt_local866631907_0001_r_000000_0 Thread started: EventFetcher for fetching Map Completion Events16/04/15 10:13:55 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local866631907_0001_m_000001_0 decomp: 25 len: 29 to MEMORY16/04/15 10:13:55 INFO reduce.InMemoryMapOutput: Read 25 bytes from map-output for attempt_local866631907_0001_m_000001_016/04/15 10:13:55 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 25, inMemoryMapOutputs.size() -> 1, commitMemory -> 0, usedMemory ->2516/04/15 10:13:55 INFO reduce.LocalFetcher: localfetcher#1 about to shuffle output of map attempt_local866631907_0001_m_000000_0 decomp: 27 len: 31 to MEMORY16/04/15 10:13:55 INFO reduce.InMemoryMapOutput: Read 27 bytes from map-output for attempt_local866631907_0001_m_000000_016/04/15 10:13:55 INFO reduce.MergeManagerImpl: closeInMemoryFile -> map-output of size: 27, inMemoryMapOutputs.size() -> 2, commitMemory -> 25, usedMemory ->5216/04/15 10:13:55 INFO reduce.EventFetcher: EventFetcher is interrupted.. Returning16/04/15 10:13:55 INFO mapred.LocalJobRunner: 2 / 2 copied.16/04/15 10:13:55 INFO reduce.MergeManagerImpl: finalMerge called with 2 in-memory map-outputs and 0 on-disk map-outputs16/04/15 10:13:55 INFO mapred.Merger: Merging 2 sorted segments16/04/15 10:13:55 INFO mapred.Merger: Down to the last merge-pass, with 2 segments left of total size: 35 bytes16/04/15 10:13:55 INFO reduce.MergeManagerImpl: Merged 2 segments, 52 bytes to disk to satisfy reduce memory limit16/04/15 10:13:55 INFO reduce.MergeManagerImpl: Merging 1 files, 54 bytes from disk16/04/15 10:13:55 INFO reduce.MergeManagerImpl: Merging 0 segments, 0 bytes from memory into reduce16/04/15 10:13:55 INFO mapred.Merger: Merging 1 sorted segments16/04/15 10:13:55 INFO mapred.Merger: Down to the last merge-pass, with 1 segments left of total size: 41 bytes16/04/15 10:13:55 INFO mapred.LocalJobRunner: 2 / 2 copied.16/04/15 10:13:55 INFO Configuration.deprecation: mapred.skip.on is deprecated. Instead, use mapreduce.job.skiprecords16/04/15 10:13:57 INFO mapred.Task: Task:attempt_local866631907_0001_r_000000_0 is done. And is in the process of committing16/04/15 10:13:57 INFO mapred.LocalJobRunner: 2 / 2 copied.16/04/15 10:13:57 INFO mapred.Task: Task attempt_local866631907_0001_r_000000_0 is allowed to commit now16/04/15 10:13:58 INFO output.FileOutputCommitter: Saved output of task 'attempt_local866631907_0001_r_000000_0' to hdfs://localhost:9000/out/_temporary/0/task_local866631907_0001_r_00000016/04/15 10:13:58 INFO mapred.LocalJobRunner: reduce > reduce16/04/15 10:13:58 INFO mapred.Task: Task 'attempt_local866631907_0001_r_000000_0' done.16/04/15 10:13:58 INFO mapred.LocalJobRunner: Finishing task: attempt_local866631907_0001_r_000000_016/04/15 10:13:58 INFO mapred.LocalJobRunner: reduce task executor complete.16/04/15 10:13:58 INFO mapreduce.Job: map 100% reduce 100%16/04/15 10:13:58 INFO mapreduce.Job: Job job_local866631907_0001 completed successfully16/04/15 10:13:58 INFO mapreduce.Job: Counters: 35 File System Counters FILE: Number of bytes read=821775 FILE: Number of bytes written=1664870 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=61 HDFS: Number of bytes written=24 HDFS: Number of read operations=25 HDFS: Number of large read operations=0 HDFS: Number of write operations=5 Map-Reduce Framework Map input records=2 Map output records=4 Map output bytes=40 Map output materialized bytes=60 Input split bytes=210 Combine input records=4 Combine output records=4 Reduce input groups=3 Reduce shuffle bytes=60 Reduce input records=4 Reduce output records=3 Spilled Records=8 Shuffled Maps =2 Failed Shuffles=0 Merged Map outputs=2 GC time elapsed (ms)=95 Total committed heap usage (bytes)=455946240 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=24 File Output Format Counters Bytes Written=24
测试集群工作状态的一些指令 :bin/hdfs dfsadmin -report 查看hdfs的各节点状态信息bin/hdfs haadmin -getServiceState nn1 获取一个namenode节点的HA状态sbin/hadoop-daemon.sh start namenode 单独启动一个namenode进程./hadoop-daemon.sh start zkfc 单独启动一个zkfc进程
ps: 在解压后的文件有时候文件的权限所属用户组丢失要用chown -R ydcun:ydcun hadoop2.7.2进行修改
- Hadoop伪分布式安装过程
- [Hadoop]Hadoop 在windows 上伪分布式的安装过程
- Hadoop 伪分布式安装
- hadoop 伪分布式安装
- Hadoop 伪分布式安装
- HADOOP伪分布式安装
- hadoop伪分布式安装
- Hadoop伪分布式安装
- Hadoop伪分布式安装
- hadoop伪分布式安装
- hadoop伪分布式安装
- 伪分布式Hadoop安装
- hadoop伪分布式安装
- hadoop伪分布式安装
- Hadoop 伪分布式安装
- 伪分布式安装Hadoop
- 伪分布式安装hadoop
- Hadoop伪分布式安装
- 一个回文字符串的例子
- LeetCode-7小结
- android_ExpandableListView
- C#获取本机上所有网络接口及真实IP地址信息
- 【U3D】Unity引擎链接mySQL数据库
- Hadoop伪分布式安装过程
- iOS:关于“类”这个super且接地气的爸爸
- 二叉搜索树BinarySearchTree的实现
- 初学鸡精回归分析暨spss初演练小课题
- 用OpenInventor实现的NeHe OpenGL教程-第一课
- 用OpenInventor实现的NeHe OpenGL教程-第二课
- 用OpenInventor实现的NeHe OpenGL教程-第三课
- 环的基本概念
- 用OpenInventor实现的NeHe OpenGL教程-第四课