Hadoop中的HDFS学习
来源:互联网 发布:vs2010 编译php扩展 编辑:程序博客网 时间:2024/05/21 17:02
实验环境
服务器列表
环境准备
三台服务器,都最小化安装CentOS 6.6,设置主机名,静态IP地址。
CentOS 6.6 最小化安装,默认是没有Java环境的,需要安装Java环境。
下载Java运行环境的安装介质:jre-7u80-linux-x64.tar.gz
# tar xvfz jre-7u80-linux-x64.tar.gz# mv jre1.7.0_80/ /opt
在/etc/profile中设置Java环境变量
export JAVA_HOME=/opt/jre1.7.0_80PATH=$JAVA_HOME/bin:$PATHexport PATH
退出控制台,重新登录服务器,查看Java运行环境
[root@namenode ~]# java -versionjava version "1.7.0_80"Java(TM) SE Runtime Environment (build 1.7.0_80-b15)Java HotSpot(TM) 64-Bit Server VM (build 24.80-b11, mixed mode)
在其他两台服务器,也安装Java运行环境。
设置三台服务器之间SSH无密码登录
CentOS最小化安装没有安装scp,以及ssh客户端程序。通过rpm包安装如下:
# rpm -ivh libedit-2.11-4.20080712cvs.1.el6.x86_64.rpm# rpm -ivh openssh-clients-5.3p1-104.el6.x86_64.rpmlibedit是openssh的依赖包
注: 通过SSH服务远程访问Linux服务器,连接非常慢,这时需要关闭SSH的DNS反解析,添加下面一行:
UseDNS no
虽然配置文件中[UseDNS yes]被注释点,但默认开关就是yes。(SSH服务默认启用了DNS反向解析的功能)
同时在SSH客户端上,设置本地的DNS解析,编辑/etc/hosts文件,增加如下配置:
192.168.3.69 namenode namenode.abc.local192.168.3.70 datanode1 datanode1.abc.local192.168.3.71 datanode2 datanode2.abc.local
注:在本机上安装好openssh-clients后,利用scp想把本地文件传到远程,这个时候,报错
-bash: scp: command not found。这是因为需要在远程也要安装openssh-client,要有scp程序。
在namenode上操作,生成本机的公钥,密码文件。
[root@namenode ~]# ssh-keygen -t rsaGenerating public/private rsa key pair.Enter file in which to save the key (/root/.ssh/id_rsa): #采用默认的文件存放密钥Created directory '/root/.ssh'.Enter passphrase (empty for no passphrase): #直接Enter,不输入密码Enter same passphrase again: #直接Enter,不输入密码Your identification has been saved in /root/.ssh/id_rsa. #生成密钥文件Your public key has been saved in /root/.ssh/id_rsa.pub. #生成公钥文件The key fingerprint is:02:e0:5b:d0:53:19:25:48:e2:61:5a:a3:14:9e:d0:a6 root@namenode.abc.localThe key's randomart image is:+--[ RSA 2048]----+|.+Xo.o++. ||+B+*+ .. ||o=o o. ||E o . || . . S || . || || || |+-----------------+
在本机实现ssh登录本机无需密码
# cat /root/.ssh/id_rsa.pub >> /root/.ssh/authorized_keysauthorized_keys文件是新增文件,在ssh配置文件中使用
同时修改SSH服务的配置文件/etc/ssh/sshd_config。
RSAAuthentication yes # 去掉注释,开启RAS认证PubkeyAuthentication yes # 去掉注释AuthorizedKeysFile .ssh/authorized_keys # 去掉注释
重启SSH服务。
/etc/init.d/sshd restart
实现远程登录无需密码,需要将公钥文件上传到datanode1,将namenode上的/root/.ssh/id_rsa.pub上传到datanode1的/tmp目录。
# scp /root/.ssh/id_rsa.pub root@192.168.3.70:/tmp此时,因为还未配置ssh无密码登录,还是需要输入密码,才能把文件上传过去。
在datanode1上,将namenode的公钥导入到SSH认证文件中。
[root@datanode1 ~]# cat /tmp/id_rsa.pub >> /root/.ssh/authorized_keysauthorized_keys文件是新增文件,在ssh配置文件中使用
修改datanode1的SSH服务的配置文件/etc/ssh/sshd_config。
RSAAuthentication yes # 去掉注释,开启RAS认证PubkeyAuthentication yes # 去掉注释AuthorizedKeysFile .ssh/authorized_keys # 去掉注释
重启SSH服务。
/etc/init.d/sshd restart
在namenode上验证ssh无密码登录
[root@namenode ~]# ssh root@192.168.3.70The authenticity of host '192.168.3.70 (192.168.3.70)' can't be established.RSA key fingerprint is c4:1f:56:68:f8:44:c7:d9:cc:97:b9:47:1c:37:bb:a7.Are you sure you want to continue connecting (yes/no)? yesWarning: Permanently added '192.168.3.70' (RSA) to the list of known hosts.Last login: Mon Aug 10 18:47:02 2015 from 192.168.3.64[root@datanode1 ~]#
把namenode上的公钥文件上传到datanode2后,进行同样的操作。
同样,datanode1的公钥也要放到namenode,datanode2上,datanode2的公钥也要放到namenode,datanode1上。三台服务器之间,都要能够ssh无密码登录。
安装Hadoop
下载Hadoop的安装介质:hadoop-2.7.1.tar.gz。
上传到namenode上,解压到/opt目录
# tar xvfz hadoop-2.7.1.tar.gz -C /opt/
在/opt/hadoop-2.7.1目录下创建数据存放的文件夹:tmp、hdfs、hdfs/data、hdfs/name。
[root@namenode hadoop-2.7.1]# mkdir tmp[root@namenode hadoop-2.7.1]# mkdir hdfs[root@namenode hadoop-2.7.1]# cd hdfs/[root@namenode hdfs]# mkdir data[root@namenode hdfs]# mkdir name
配置hadoop的运行环境/opt/hadoop-2.7.1/etc/hadoop/hadoop-env.sh
# The java implementation to use.export JAVA_HOME=/opt/jre1.7.0_80
配置namenode的运行参数/opt/hadoop-2.7.1/etc/hadoop/core-site.xml
<configuration> <property> <name>fs.defaultFS</name> <value>hdfs://namenode.abc.local:9000</value> </property> <property> <name>hadoop.tmp.dir</name> <value>file:/opt/hadoop-2.7.1/tmp</value> </property> <property> <name>io.file.buffer.size</name> <value>131702</value> </property></configuration>
fs.defaultFS设置为NameNode的URI,io.file.buffer.size设置为在顺序文件中读写的缓存大小。
配置hdfs的运行参数/opt/hadoop-2.7.1/etc/hadoop/hdfs-site.xml
<configuration> <property> <name>dfs.namenode.name.dir</name> <value>file:/opt/hadoop-2.7.1/hdfs/name</value> </property> <property> <name>dfs.datanode.data.dir</name> <value>file:/opt/hadoop-2.7.1/hdfs/data</value> </property> <property> <name>dfs.replication</name> <value>2</value> </property> <property> <name>dfs.webhdfs.enabled</name> <value>true</value> </property> <property> <name>dfs.namenode.secondary.http-address</name> <value>datanode1.abc.local:9000</value> </property></configuration>
namenode的hdfs-site.xml是必须将dfs.webhdfs.enabled属性设置为true,否则就不能使用webhdfs的LISTSTATUS、LISTFILESTATUS等需要列出文件、文件夹状态的命令,因为这些信息都是由namenode来保存的。
hadoop 2.7.1 解决了namenode单点故障的问题,必须设置第二个namenode,通过dfs.namenode.secondary.http-address进行设置。
配置mapred的运行参数/opt/hadoop-2.7.1/etc/hadoop/mapred-site.xml
<configuration> <property> <name>mapreduce.framework.name</name> <value>yarn</value> </property></configuration>
配置yarn的运行参数/opt/hadoop-2.7.1/etc/hadoop/yarn-site.xml
<configuration> <property> <name>yarn.nodemanager.aux-services</name> <value>mapreduce_shuffle</value> </property> <property> <name>yarn.resourcemanager.hostname</name> <value>namenode.abc.local</value> </property></configuration>
在datanode上安装hadoop
通过scp工具,将namenode上的hadoop上传到datanode上。
# cd /opt/# scp -r hadoop-2.7.1 root@192.168.3.70:/opt/# scp -r hadoop-2.7.1 root@192.168.3.71:/opt/
启动hadoop的hdfs环境
在namenode上,执行hadoop的命令
# cd /opt/hadoop-2.7.1/sbin# ./start-dfs.sh 15/08/12 03:15:16 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableStarting namenodes on [namenode.abc.local]namenode.abc.local: starting namenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-namenode-namenode.abc.local.outdatanode2.abc.local: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-datanode-datanode2.abc.local.outdatanode1.abc.local: starting datanode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-datanode-datanode1.abc.local.outStarting secondary namenodes [datanode1.abc.local]datanode1.abc.local: starting secondarynamenode, logging to /opt/hadoop-2.7.1/logs/hadoop-root-secondarynamenode-datanode1.abc.local.out15/08/12 03:15:36 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
HDFS的操作
格式化HDFS
执行命令
# cd /opt/hadoop-2.7.1/bin# ./hdfs namenode -format..........................15/08/12 03:48:58 INFO namenode.FSImage: Allocated new BlockPoolId: BP-486254444-192.168.3.69-143932253882715/08/12 03:48:59 INFO common.Storage: Storage directory **/opt/hadoop-2.7.1/hdfs/name** has been successfully formatted.15/08/12 03:48:59 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 015/08/12 03:48:59 INFO util.ExitUtil: Exiting with status 015/08/12 03:48:59 INFO namenode.NameNode: SHUTDOWN_MSG: /************************************************************SHUTDOWN_MSG: Shutting down NameNode at namenode.abc.local/192.168.3.69************************************************************/该命令在namenode上关闭了hadoop的进程,但是在datanode1,datanode2上并没有关闭hadoop的进程。可以通过stop-dfs.sh来关闭datanode上的进程。
执行完命令后,在namenode的/opt/hadoop-2.7.1/hdfs/name目录下,生成文件。
[root@namenode name]# tree.└── current ├── fsimage_0000000000000000000 ├── fsimage_0000000000000000000.md5 ├── seen_txid └── VERSION
重启hdfs系统。
在namenode的/opt/hadoop-2.7.1/hdfs/name目录下,查看文件
[root@namenode name]# tree.├── current│ ├── edits_0000000000000000001-0000000000000000002│ ├── edits_0000000000000000003-0000000000000000004│ ├── edits_0000000000000000005-0000000000000000006│ ├── edits_0000000000000000007-0000000000000000008│ ├── edits_0000000000000000009-0000000000000000010│ ├── edits_0000000000000000011-0000000000000000012│ ├── edits_0000000000000000013-0000000000000000014│ ├── edits_0000000000000000015-0000000000000000016│ ├── edits_0000000000000000017-0000000000000000018│ ├── edits_0000000000000000019-0000000000000000020│ ├── edits_0000000000000000021-0000000000000000022│ ├── edits_0000000000000000023-0000000000000000024│ ├── edits_0000000000000000025-0000000000000000026│ ├── edits_0000000000000000027-0000000000000000028│ ├── edits_0000000000000000029-0000000000000000030│ ├── edits_0000000000000000031-0000000000000000032│ ├── edits_0000000000000000033-0000000000000000034│ ├── edits_0000000000000000035-0000000000000000036│ ├── edits_0000000000000000037-0000000000000000038│ ├── edits_0000000000000000039-0000000000000000040│ ├── edits_0000000000000000041-0000000000000000042│ ├── edits_0000000000000000043-0000000000000000044│ ├── edits_0000000000000000045-0000000000000000046│ ├── edits_0000000000000000047-0000000000000000047│ ├── edits_inprogress_0000000000000000048│ ├── fsimage_0000000000000000046│ ├── fsimage_0000000000000000046.md5│ ├── fsimage_0000000000000000047│ ├── fsimage_0000000000000000047.md5│ ├── seen_txid│ └── VERSION└── in_use.lock #该文件,说明NameNode已经启动
在datanode1,datanode2上的/opt/hadoop-2.7.1/hdfs/data目录下,查看文件。
[root@datanode1 data]# tree.├── current│ ├── BP-486254444-192.168.3.69-1439322538827│ │ ├── current│ │ │ ├── dfsUsed│ │ │ ├── finalized│ │ │ ├── rbw│ │ │ └── VERSION│ │ ├── scanner.cursor│ │ └── tmp│ └── VERSION└── in_use.lock #该文件,说明DataNode已经启动
由于datanode1设置为第二个namenode,所以在/opt/hadoop-2.7.1/tmp目录下,生成了文件。
[root@datanode1 tmp]# tree.└── dfs └── namesecondary ├── current │ ├── edits_0000000000000000001-0000000000000000002 │ ├── edits_0000000000000000003-0000000000000000004 │ ├── edits_0000000000000000005-0000000000000000006 │ ├── edits_0000000000000000007-0000000000000000008 │ ├── edits_0000000000000000009-0000000000000000010 │ ├── edits_0000000000000000011-0000000000000000012 │ ├── edits_0000000000000000013-0000000000000000014 │ ├── edits_0000000000000000015-0000000000000000016 │ ├── edits_0000000000000000017-0000000000000000018 │ ├── edits_0000000000000000019-0000000000000000020 │ ├── edits_0000000000000000021-0000000000000000022 │ ├── edits_0000000000000000023-0000000000000000024 │ ├── edits_0000000000000000025-0000000000000000026 │ ├── edits_0000000000000000027-0000000000000000028 │ ├── edits_0000000000000000029-0000000000000000030 │ ├── edits_0000000000000000031-0000000000000000032 │ ├── edits_0000000000000000033-0000000000000000034 │ ├── edits_0000000000000000035-0000000000000000036 │ ├── edits_0000000000000000037-0000000000000000038 │ ├── edits_0000000000000000039-0000000000000000040 │ ├── edits_0000000000000000041-0000000000000000042 │ ├── edits_0000000000000000043-0000000000000000044 │ ├── edits_0000000000000000045-0000000000000000046 │ ├── edits_0000000000000000048-0000000000000000049 │ ├── fsimage_0000000000000000047 │ ├── fsimage_0000000000000000047.md5 │ ├── fsimage_0000000000000000049 │ ├── fsimage_0000000000000000049.md5 │ └── VERSION └── in_use.lock
向HDFS中放入文件
新建测试文件
# mkdir -p /root/input_data# cd /root/input_data/# echo "This is a test." >> test_data.txt
执行hadoop命令,放入文件
# cd /opt/hadoop-2.7.1/bin/# ./hadoop fs -put /root/input_data/ /input_data15/08/13 03:16:20 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
把/root/input_data目录下的文件,拷贝进HDFS的/input_data目录下
执行hadoop命令,查看文件
# ./hadoop fs -ls /input_data15/08/13 03:20:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicableFound 1 items-rw-r--r-- 2 root supergroup 16 2015-08-13 03:20 /input_data/test_data.txt
针对HDFS的操作命令
# ./hadoop fs Usage: hadoop fs [generic options][-appendToFile <localsrc> ... <dst>][-cat [-ignoreCrc] <src> ...][-checksum <src> ...][-chgrp [-R] GROUP PATH...][-chmod [-R] <MODE[,MODE]... | OCTALMODE> PATH...][-chown [-R] [OWNER][:[GROUP]] PATH...][-copyFromLocal [-f] [-p] [-l] <localsrc> ... <dst>][-copyToLocal [-p] [-ignoreCrc] [-crc] <src> ... <localdst>][-count [-q] [-h] <path> ...][-cp [-f] [-p | -p[topax]] <src> ... <dst>][-createSnapshot <snapshotDir> [<snapshotName>]][-deleteSnapshot <snapshotDir> <snapshotName>][-df [-h] [<path> ...]][-du [-s] [-h] <path> ...][-expunge][-find <path> ... <expression> ...][-get [-p] [-ignoreCrc] [-crc] <src> ... <localdst>][-getfacl [-R] <path>][-getfattr [-R] {-n name | -d} [-e en] <path>][-getmerge [-nl] <src> <localdst>][-help [cmd ...]]**[-ls [-d] [-h] [-R] [<path> ...]]**[-mkdir [-p] <path> ...][-moveFromLocal <localsrc> ... <dst>][-moveToLocal <src> <localdst>][-mv <src> ... <dst>]**[-put [-f] [-p] [-l] <localsrc> ... <dst>]**[-renameSnapshot <snapshotDir> <oldName> <newName>]**[-rm [-f] [-r|-R] [-skipTrash] <src> ...]**[-rmdir [--ignore-fail-on-non-empty] <dir> ...][-setfacl [-R] [{-b|-k} {-m|-x <acl_spec>} <path>]|[--set <acl_spec> <path>]][-setfattr {-n name [-v value] | -x name} <path>][-setrep [-R] [-w] <rep> <path> ...][-stat [format] <path> ...][-tail [-f] <file>][-test -[defsz] <path>][-text [-ignoreCrc] <src> ...][-touchz <path> ...][-truncate [-w] <length> <path> ...][-usage [cmd ...]]Generic options supported are-conf <configuration file> specify an application configuration file-D <property=value> use value for given property-fs <local|namenode:port> specify a namenode-jt <local|resourcemanager:port> specify a ResourceManager-files <comma separated list of files> specify comma separated files to be copied to the map reduce cluster-libjars <comma separated list of jars> specify comma separated jar files to include in the classpath.-archives <comma separated list of archives> specify comma separated archives to be unarchived on the compute machines.The general command line syntax isbin/hadoop command [genericOptions] [commandOptions]
- Hadoop中的HDFS学习
- hadoop中的HDFS
- Hadoop中的HDFS
- Hadoop学习之HDFS
- hadoop学习--HDFS
- hadoop-hdfs学习笔记
- hadoop-hdfs学习1
- hadoop-hdfs学习2
- Hadoop学习——HDFS中的Snapshot和Checkpoint
- 《hadoop学习》关于hdfs中的namenode和datanode详解
- hadoop学习--HDFS详细学习
- Hadoop学习笔记之---HDFS
- hadoop学习笔记<三>----HDFS
- hadoop基础学习-hdfs原理
- hadoop学习笔记-HDFS原理
- Hadoop学习笔记---HDFS简介
- Hadoop学习笔记二---HDFS
- Hadoop学习笔记二---HDFS
- Chrome上最好用的广告拦截插件:AdBlock
- 君と彼女の恋
- day20
- house robber
- iOS-数据库sqlite的使用
- Hadoop中的HDFS学习
- 黑马程序员Java学习笔记之多线程(并发)
- 曝刘亦菲奢华豪宅 面积堪比4个足球场,有钱就是这么任性!
- bzoj 1503: [NOI2004]郁闷的出纳员 (伸展树)
- FZU 1848 – ZeroZeroZeros (找规律+二分)
- 如何将DrawerLayout显示在ActionBar/Toolbar和status bar之间
- 三目运算 小tip
- OpenWRT上安装FreeSWITCH
- wcf使用X509证书加密传输