HDFS小结

来源:互联网 发布:淄博用友软件 编辑:程序博客网 时间:2024/06/01 08:34
1、HDFS: Motivation:
(1)Based on Google’s GFS
(2)Redundant storage of massive amounts of data on cheap and unreliable computers
(3)Why not use an existing file system? 
        – Different workload and design priorities;
        – Handles much bigger dataset sizes than other filesystems
2、HDFS Design Decisions
(1)Files stored as blocks-Much larger size than most filesystems (default is 64MB)
(2)Reliability through replication
           – Each block replicated across 3+ DataNodes
(3)Single master (NameNode) coordinates access, metadata
           – Simple centralized management
(4)No data caching-– Little benefit due to large data sets, streaming reads
(5)Familiar interface, but customize the API
          – Simplify the problem; focus on distributed apps
3、HDFS Client Block Diagram
4、Based on GFS Architecture
5、Metadata
(1)Single NameNode stores all metadata
          – Filenames, locations on DataNodes of each file
(2)Maintained entirely in RAM for fast lookup
(3)DataNodes store opaque file contents in “block” objects on underlying local filesystem
6、HDFS Conclusions
(1)HDFS supports large-scale processing workloads on commodity hardware
            –designed to tolerate frequent component failures;
            –optimized for huge files that are mostly appended and read
           – filesystem interface is customized for the job, but still retains familiarity for developers
           – simple solutions can work (e.g., single master)
(2)Reliably stores several TB in individual clusters

原创粉丝点击