(4-2)block数据块

来源:互联网 发布:hmcl启动器json 编辑:程序博客网 时间:2024/06/03 16:38
Block是最基本的存储单元


HDFS Client上传数据到HDFS时,会先在本地缓存数据,当数据达到一个Block大小时,请求NameNode分配一个Block。NameNode会把Block所在的DataNode的地址告诉HDFS Client。HDFS Client会直接和DataNode通信,把数据写到DataNode节点一个Block文件中。



设置数据块大小:hdfs-site.xml
dfs.blocksize
134217728


dfs.datanode.data.dir



Block元数据信息  (.meta)
 /usr/local/mydata/dfs/data/current/BP-1476006134-192.168.1.10-1427374210743/current/finalized/subdir0/subdir0


如:
-rw-r--r--. 1 root root  33574 3月  29 18:07 blk_1073741851-rw-r--r--. 1 root root    271 3月  29 18:07 blk_1073741851_1027.meta-rw-r--r--. 1 root root 103997 3月  29 18:07 blk_1073741852-rw-r--r--. 1 root root    823 3月  29 18:07 blk_1073741852_1028.meta


清空HDFS,这里面也清空了。


hdfs存储一定依赖操作系统的文件管理


HDFS是分布式文件系统上的一层文件管理系统。


Linux文件系统上的数据存储到HDFS上面不压缩,可以看Block元数据信息。




DataNode中副本管理
hdfs-site.xml
dfs.replication




数据存储:Replication Pipelining
假设dfs.replication=3
当HDFS Client上传数据时,向NameNode申请Block,NameNode给Client三个DataNode的地址,Client会把数据上传到第一个DataNode的Block中。然后第一个DataNode把数据传给第二个DataNode,第二个DataNode再把数据传给第三个DataNode。




HDFS文件归档操作:
合并HDFS小文件:
/input
/input/a.txt
/input/b.txt


[root@i-love-you hadoop]# bin/hdfs dfs -ls /dir
Found 2 items
-rw-r--r--   1 root supergroup         13 2015-03-30 20:49 /dir/a.txt
-rw-r--r--   1 root supergroup         18 2015-03-30 20:49 /dir/b.txt


创建文件:
[root@i-love-you hadoop]# bin/hadoop archive -archiveName c.har /dir /dest




[root@i-love-you hadoop]# bin/hadoop archive -archiveName c.har -p /dir /dest15/03/30 21:02:41 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803215/03/30 21:02:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803215/03/30 21:02:45 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:803215/03/30 21:02:47 INFO mapreduce.JobSubmitter: number of splits:115/03/30 21:02:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1427717039268_000115/03/30 21:02:52 INFO impl.YarnClientImpl: Submitted application application_1427717039268_000115/03/30 21:02:54 INFO mapreduce.Job: The url to track the job: http://i-love-you:8088/proxy/application_1427717039268_0001/15/03/30 21:02:54 INFO mapreduce.Job: Running job: job_1427717039268_000115/03/30 21:03:36 INFO mapreduce.Job: Job job_1427717039268_0001 running in uber mode : false15/03/30 21:03:36 INFO mapreduce.Job:  map 0% reduce 0%15/03/30 21:04:25 INFO mapreduce.Job:  map 100% reduce 0%15/03/30 21:04:53 INFO mapreduce.Job:  map 100% reduce 100%15/03/30 21:04:55 INFO mapreduce.Job: Job job_1427717039268_0001 completed successfully15/03/30 21:04:57 INFO mapreduce.Job: Counters: 49        File System Counters                FILE: Number of bytes read=206                FILE: Number of bytes written=214211                FILE: Number of read operations=0                FILE: Number of large read operations=0                FILE: Number of write operations=0                HDFS: Number of bytes read=418                HDFS: Number of bytes written=236                HDFS: Number of read operations=17                HDFS: Number of large read operations=0                HDFS: Number of write operations=7        Job Counters                Launched map tasks=1                Launched reduce tasks=1                Other local map tasks=1                Total time spent by all maps in occupied slots (ms)=48204                Total time spent by all reduces in occupied slots (ms)=20795                Total time spent by all map tasks (ms)=48204                Total time spent by all reduce tasks (ms)=20795                Total vcore-seconds taken by all map tasks=48204                Total vcore-seconds taken by all reduce tasks=20795                Total megabyte-seconds taken by all map tasks=49360896                Total megabyte-seconds taken by all reduce tasks=21294080        Map-Reduce Framework                Map input records=3                Map output records=3                Map output bytes=194                Map output materialized bytes=206                Input split bytes=116                Combine input records=0                Combine output records=0                Reduce input groups=3                Reduce shuffle bytes=206                Reduce input records=3                Reduce output records=0                Spilled Records=6                Shuffled Maps =1                Failed Shuffles=0                Merged Map outputs=1                GC time elapsed (ms)=467                CPU time spent (ms)=3810                Physical memory (bytes) snapshot=293769216                Virtual memory (bytes) snapshot=1690853376                Total committed heap usage (bytes)=136450048        Shuffle Errors                BAD_ID=0                CONNECTION=0                IO_ERROR=0                WRONG_LENGTH=0                WRONG_MAP=0                WRONG_REDUCE=0        File Input Format Counters                Bytes Read=271        File Output Format Counters                Bytes Written=0





查看内容:
[root@i-love-you hadoop]# bin/hadoop fs -ls -R har:///dest/c.har-rw-r--r--   1 root supergroup         13 2015-03-30 20:49 har:///dest/c.har/a.txt-rw-r--r--   1 root supergroup         18 2015-03-30 20:49 har:///dest/c.har/b.txt




[root@i-love-you hadoop]# bin/hdfs dfs -ls /dest
Found 1 items

drwxr-xr-x   - root supergroup          0 2015-03-30 21:04 /dest/c.har




0 0
原创粉丝点击