HDFS小结
来源:互联网 发布:淄博用友软件 编辑:程序博客网 时间:2024/06/01 08:34
1、HDFS: Motivation:
(1)Based on Google’s GFS
(2)Redundant storage of massive amounts of data on cheap and unreliable computers
(3)Why not use an existing file system?
– Different workload and design priorities;
– Handles much bigger dataset sizes than other filesystems
2、HDFS Design Decisions
(1)Files stored as blocks-Much larger size than most filesystems (default is 64MB)
(2)Reliability through replication
– Each block replicated across 3+ DataNodes
(3)Single master (NameNode) coordinates access, metadata
– Simple centralized management
(4)No data caching-– Little benefit due to large data sets, streaming reads
(5)Familiar interface, but customize the API
– Simplify the problem; focus on distributed apps
3、HDFS Client Block Diagram
4、Based on GFS Architecture
5、Metadata
(1)Single NameNode stores all metadata
– Filenames, locations on DataNodes of each file
(2)Maintained entirely in RAM for fast lookup
(3)DataNodes store opaque file contents in “block” objects on underlying local filesystem
6、HDFS Conclusions
(1)HDFS supports large-scale processing workloads on commodity hardware
–designed to tolerate frequent component failures;
–optimized for huge files that are mostly appended and read
– filesystem interface is customized for the job, but still retains familiarity for developers
– simple solutions can work (e.g., single master)
(2)Reliably stores several TB in individual clusters
- HDFS小结
- HDFS小结
- HDFS小结
- 使用hdfs小结
- HDFS-上传下载-细节小结
- hdfs HA架构小结
- Spark连接Hadoop读取HDFS问题小结
- (2-3)DateNode+小结HDFS
- Spark连接Hadoop读取HDFS问题小结
- Spark连接Hadoop读取HDFS问题小结
- [HBase] bulk-load装载hdfs数据到hbase小结
- HDFS
- HDFS
- HDFS
- HDFS
- HDFS
- HDFS
- HDFS
- MapReduce小结
- css3的border-radius圆角
- 傻瓜式安装Fedora-19
- PHP的静态方法介绍
- uncompress 解压*.Z find 根据名称查找
- HDFS小结
- Maven核心概念(1)--坐标
- Quartz.Net的使用(简单配置方法)定时任务框架
- PHP抽象类
- MapReduce Algorithms for Big Data Analysis
- DCMI 接口DMA 传送数据问题
- 执行能力--执行项目
- linux系统挂接NFS文件解析
- DOT--A Matrix Model for Analyzing,Optimizing and Deploying Software for Big Data Analytics in Distri