Hadoop单节点部署步骤
来源:互联网 发布:网络征婚骗局大全 编辑:程序博客网 时间:2024/06/05 04:37
标签: 伪分布式 单节点 hadoop 大数据
- 1.Hadoop概况
- 2.Hadoop安装前准备
- 3.Hadoop安装步骤
- 4.Hadoop程序启动
- 5.测试配置情况
wordcount简单案例测试
小序:Hadoop单节点部署主要涉及到了yarn,mapreduce,hdfs相关进程,里面涉及部分配置;目前已经测试通过,各功能均可使用,如有不足之处,请指教;
1.Hadoop概况
1.1 Hadoop常见生态架构有:
1.1.1 Yarn,MapReduce,HDFS
1.1.2 Hbase:基于的非关系型数据库
1.1.3 Hive:提供Sql语句查询,相当于客户端,提供给不懂JAVA的人员使用
1.1.4 Oozie:工作流调度,协作框架
1.1.5 Zookeeper:是一个高可用的分布式数据管理与系统协调框架
1.1.6 Flume:把文件日志抽取出来放在HDFS集群中
1.1.7 Sqoop:把关系型数据库中的表导入到HDFS
1.2 Hadoop四个重要组成部分
The project includes these modules:
1.2.1 Hadoop Common: The common utilities that support the other Hadoop modules.
1.2.2 Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.(设计理念:一次写入多次读取;包含namenode和datanode两个进程,namenode存储元数据:标注数据存储的位置,datanode:存储数据)
1.2.3 Hadoop YARN: A framework for job scheduling and cluster resource management.(资源管理,如:CPU,内存,虚拟代码等,包含resourcemanager和nodemanager两个进程)
1.2.4 Hadoop MapReduce: A YARN-based system for parallel processing of large data sets.
备注:可参考官方网站:http://hadoop.apache.org/
1.3 Hadoop安装模式
1.3.1 单机模式
1.3.2 伪分布式模式(单机)
1.3.3 分布式模式
1.4 Hadoop版本
1.4.1 Apache Hadoop,Hadoop 2.0主要由Yahoo独立出来的hortonworks公司主持开发,是免费项目。
1.4.2 Cloudera Hadoop,是Cloudera公司的发行版,我们将该版本称为CDH(Cloudera Distribution Hadoop)
2 Hadoop安装前准备
2.1 服务器准备
2.2 主机名修改
主机名可以根据自己实际需要修改
修改方法:[hadoop@xxx hadoop-2.5.0]$ vi /etc/sysconfig/network
将HOSTNAME设置成自己需要的主机名,如:HOSTNAME=xxx
2.3 创建普通用户
这里创建hadoop用户,用于hadoop部署
//创建用户
[root@xxx ~]# useradd hadoop
//创建密码
[root@xxx ~]# echo hadoop | passwd –stdin hadoop
2.4 关闭防火墙和selinux
关闭防火墙:service iptabls stop
防火墙开机不启动:chkconfig iptables off
关闭selinux:vi /etc/sysconfig/selinux
将SELINUX值修改为disabled如:SELINUX=disabled
2.5 修改Hosts
vi /etc/hosts
添加(ip hostname)如:192.168.x.x xxx
2.6 修改hadoop用户默认目录
[root@xxx ~]# vi /etc/passwd
将hadoop默认目录修改为空间较大的目录下
2.7 JDK安装
2.7.1 下载JDK,并上传至服务器
2.7.2 解压JDK:
[hadoop@xxx software]$ tar zxvf jdk-7u67-linux-x64.tar.gz -C /data/hadoop/modules/
2.7.3 修改环境变量
vi .bash_profileJAVA_HOME=/data/hadoop/modules/jdk1.7.0_67PATH=$PATH:$JAVA_HOME/bin&source .bash_profile
2.7.5 卸载默认安装的JDK
//查找已经安装的JDK[hadoop@xxx ~]$ rpm -qa | grep java//卸载安装的JDK[hadoop@xxx ~]$ rpm -e --nodeps java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64 tzdata-java-2013g-1.el6.noarch java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
3.Hadoop安装步骤
3.1 下载需要安装的Hadoop包
下载地址:http://hadoop.apache.org/releases.html,本次安装2.5.0,并上传至服务器
3.2 解压安装包
[hadoop@xxx software]$ tar zxvf hadoop-2.5.0.tar.gz -C /data/hadoop/modules
3.3 Notepad++工具使用
可以远程修改配置,插件->NppFTP->show NppFTP windows 工具下载地址:http://pan.baidu.com/s/1mizCfpQ
3.4 配置修改:
3.4.1 环境变量修改
hadoop-env.sh修改:export JAVA_HOME=/data/hadoop/modules/jdk1.7.0_67yarn-env.sh修改 :export JAVA_HOME=/data/hadoop/modules/jdk1.7.0_67mapred-env.sh修改:export JAVA_HOME=/data/hadoop/modules/jdk1.7.0_67
core-site.xml:
<property> <name>fs.defaultFS</name> <!--访问集群的入口地址--> <value>hdfs://xxx:8020</value> </property> <property> <name>hadoop.tmp.dir</name> <!--定义数据所在目录--> <value>/data/hadoop/modules/hadoop-2.5.0/data</value> </property>
hdfs-site.xml:
<property> <!--考虑数据安全,副本数据默认为3--> <name>dfs.replication</name> <value>1</value> </property> <property> <name>dfs.namenode.http-address</name> <!--定义namenode所在服务器--> <value>xxx:50070</value> </property>
yarn-site.xml
<property> <!--声明哪台服务器做resourcemanager--> <name>yarn.resourcemanager.hostname</name> <value>hadoop-senior.ibeifeng.com</value> </property> <property> <!--日志聚合,将日志上传至HDFS--> <name>yarn.log-aggregation-enable</name> <value>true</value> </property> <property> <!--日志聚合周期--> <name>yarn.log-aggregation.retain-seconds</name> <value>86400</value> </property> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property>
mapred-site.xml
<property> <!--查看历史执行情况--> <name>mapreduce.jobhistory.webapp.address</name> <value>hadoop-senior.ibeifeng.com:19888</value> </property> <!--启动命令:sbin/mr-jobhistory-daemon.sh start historyserver--> <property> <!--说明MapReduce运行在yarn上--> <name>mapreduce.framework.name</name> <value>yarn</value> </property>
配置修改可参考官方文档:http://hadoop.apache.org/docs/r2.5.2/hadoop-project-dist/hadoop-common/SingleCluster.html
4 Hadoop程序启动
4.1 HDFS启动
4.1.1 HDFS第一次启动,需要先格式化,格式化命令如下:
[hadoop@xxx hadoop-2.5.0]$ bin/hdfs namenode -format
4.1.2 namenode,datanode起动:
[hadoop@xxx hadoop-2.5.0]$ sbin/hadoop-daemon.sh start namenode[hadoop@xxx hadoop-2.5.0]$ sbin/hadoop-daemon.sh start datanode
4.1.3 使用jps命令查看进程是否起动
[hadoop@xxx hadoop-2.5.0]$ jps101865 DataNode101753 NameNode
4.1.4 HDFS页面登录
http://137.32.126.106:50070
4.2 yarn进程启动
4.2.1 resourcemanager启动
[hadoop@xxx hadoop-2.5.0]$ sbin/yarn-daemon.sh start resourcemanager
4.2.2 nodemanager启动
[hadoop@xxx hadoop-2.5.0]$ sbin/yarn-daemon.sh start nodemanager
4.2.3 jps命令查看yarn进程启动情况
[hadoop@xxx hadoop-2.5.0]$ jps126366 NameNode130149 NodeManager130732 JobHistoryServer5774 Jps129891 ResourceManager126477 DataNode[hadoop@xxx hadoop-2.5.0]$
4.2.3 页面访问地址
http://137.32.126.106:8088/cluster
5 测试配置情况
wordcount简单案例测试
5.1 在HDFS上创建目录
[hadoop@xxx hadoop-2.5.0]$ bin/hdfs dfs -mkdir /input
5.2 上传测试文件
[hadoop@xxx hadoop-2.5.0]$ bin/hdfs dfs -put sort.txt /input
5.3 查看上传文件
[hadoop@xxx hadoop-2.5.0]$ bin/hdfs dfs -cat /input/sort.txt
5.4 统计测试
[hadoop@xxx hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/sort.txt /output17/05/31 13:19:34 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable17/05/31 13:19:35 INFO client.RMProxy: Connecting to ResourceManager at xxx/137.32.126.106:803217/05/31 13:19:36 INFO input.FileInputFormat: Total input paths to process : 117/05/31 13:19:36 INFO mapreduce.JobSubmitter: number of splits:117/05/31 13:19:36 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1496207600784_000117/05/31 13:19:37 INFO impl.YarnClientImpl: Submitted application application_1496207600784_000117/05/31 13:19:37 INFO mapreduce.Job: The url to track the job: http://xxx:8088/proxy/application_1496207600784_0001/17/05/31 13:19:37 INFO mapreduce.Job: Running job: job_1496207600784_000117/05/31 13:19:46 INFO mapreduce.Job: Job job_1496207600784_0001 running in uber mode : false17/05/31 13:19:46 INFO mapreduce.Job: map 0% reduce 0%17/05/31 13:19:52 INFO mapreduce.Job: map 100% reduce 0%17/05/31 13:19:59 INFO mapreduce.Job: map 100% reduce 100%17/05/31 13:19:59 INFO mapreduce.Job: Job job_1496207600784_0001 completed successfully17/05/31 13:19:59 INFO mapreduce.Job: Counters: 49 File System Counters FILE: Number of bytes read=59 FILE: Number of bytes written=193963 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=166 HDFS: Number of bytes written=37 HDFS: Number of read operations=6 HDFS: Number of large read operations=0 HDFS: Number of write operations=2 Job Counters Launched map tasks=1 Launched reduce tasks=1 Data-local map tasks=1 Total time spent by all maps in occupied slots (ms)=3925 Total time spent by all reduces in occupied slots (ms)=4307 Total time spent by all map tasks (ms)=3925 Total time spent by all reduce tasks (ms)=4307 Total vcore-seconds taken by all map tasks=3925 Total vcore-seconds taken by all reduce tasks=4307 Total megabyte-seconds taken by all map tasks=4019200 Total megabyte-seconds taken by all reduce tasks=4410368 Map-Reduce Framework Map input records=5 Map output records=10 Map output bytes=110 Map output materialized bytes=59 Input split bytes=96 Combine input records=10 Combine output records=4 Reduce input groups=4 Reduce shuffle bytes=59 Reduce input records=4 Reduce output records=4 Spilled Records=8 Shuffled Maps =1 Failed Shuffles=0 Merged Map outputs=1 GC time elapsed (ms)=70 CPU time spent (ms)=2390 Physical memory (bytes) snapshot=434692096 Virtual memory (bytes) snapshot=1822851072 Total committed heap usage (bytes)=402653184 Shuffle Errors BAD_ID=0 CONNECTION=0 IO_ERROR=0 WRONG_LENGTH=0 WRONG_MAP=0 WRONG_REDUCE=0 File Input Format Counters Bytes Read=70 File Output Format Counters Bytes Written=37[hadoop@xxx hadoop-2.5.0]$
5.5 查看统计结果
[hadoop@xxx hadoop-2.5.0]$ bin/hdfs dfs -cat /output/part-r-0000017/05/31 15:38:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicablehadoop 4mapreduce 2verion 1yarn 3
5.6 日志查看
.log:通过log4j记录的,记录大部分应用程序的日志信息
.out:记录标准输出和标准错误日志,少量记录
日志目录:/data/hadoop/modules/hadoop-2.5.0/logs
5.7 hdfs 常用shell
-ls
-put … 上传
-cat -text 查看文件内容
-mkdir [-p]
-mv
-cp
-du
-chmod
- Hadoop单节点部署步骤
- hadoop 单节点的实现详细步骤
- hadoop ubuntu (单节点)部署
- 在Ubuntu上部署Hadoop单节点
- Hadoop 1.2.1 单节点安装(Single Node Setup)步骤
- hadoop的单节点伪分布安装步骤
- Hadoop入门--HDFS(单节点)配置和部署 (一)
- hadoop 2.4.1 部署--2 单节点安装
- Hadoop环境部署(单节点和集群)
- Hadoop集群安装部署---单节点伪分布式
- openfire单节点部署
- TiDB 单节点部署
- HADOOP单节点安装
- hadoop单节点安装
- hadoop单节点安装
- Hadoop单节点
- hadoop单节点安装
- Hadoop单节点配置
- 购
- svn commit 时提示 Commit failed (details follow) Unable to create pristine install stream 系统找不到指定的路径
- Java(JCo3)与SAP系统相互调用 外部系统(Java)调用BAPI函数...
- 当 better-scroll 遇见 Vue
- Swift
- Hadoop单节点部署步骤
- EventBus使用详解(二)——EventBus使用进阶 2014-11-04 16:45 88833人阅读 评论(81) 收藏 举报 分类: 5、andriod开发(149) 版权声明:本文
- windows7环境下配置Java开发环境
- 浏览器中可能存在的bug
- PHP命名空间详解。
- 【Java】【内部类】Java 内部类简介
- 常用的正则表达式
- eclipse项目有小红叉,但是找不到在哪里
- 二叉树 逐层遍历