Sector/Sphere:High Performance Distributed File System and Parallel Data Processing Engine
来源:互联网 发布:oracle数据割接 编辑:程序博客网 时间:2024/04/29 11:51
1. Overview
sector/sphere was created by Dr. Yunhong Gu in 2006 and it is now maintained by a group of open source developers, available from : http://sector.sourceforge.net/
sector : Distrubuted file system
sphere: parallel data processing framework
There is a test, in some cases,sector/sphere is about twice as fast as Hadoop
2. Sector
Sector system architecture:
Security Server: maintains user accounts, user passwd, file access infomation, ip addresses of the authorized slave nodes
Master: maintains the metadata of the files stored in the syste, controls the running of all slave nodes, responds to users' requests
Slaves: the nodes that store the files managed by the system and process the data upon the request of a sector client
The clients includes:
1. sector file system client api: access sector files in applications using the c++ api
2. sector system tools
3. FUSE: mount sector file system as a local directory
4. sphere programming api
A more detail figure:
Feature:
1. Compared to Hadoop, sector does not split user files into blocks, instead, every sector slice is stored as one single file in the native file system
2. Sector runs an independent security server, this design allows different security service providers to be deployed. In addition, multiple sector masters can user the same security service
3. Topology aware and application aware
4. uses UDP for message passing and UDT for transfer
Replication:
1. provide software level falut tolerance(no hardware RAID is required)
2. all files are replicated to a specific number by defalut
3. by default, replication is created on furthest node
UDT:
A high performance data transfer protocol designed for transferring large volumetric datasets over high speed wide area networks. Such settings are typically disadvantageous for the more common TCP protocol.
UDT uses UDP to transfer bulk data with its own reliability control and congestion control mechanisms. The new protocol can transfer data at a much higher speed than TCP does.
Limitations:
1. File size if limited by available space individual storage nodes
2. Users my need to split their datasets into proper sizes
3. Sector is designed to provide high throughput on large datases, rather than extreme low latency on small files
3. Sphere
Sphere is a parallel data processing engine integrated in Sector and it can be used to process data stored in Sector in parallel,
Sphere users a stream processing computing paradigm. A stream is an abstraction in sphere and it represents either a dataset or a part of a dataset(A sector dataset consists of one of more physical files)
This figure illustrates how sphere processes the segments in a stream.
SPE: Sphere Proccessing Engine
1. Processing multiple input streams.
2. Shuffling input streams.
Interested guyscan refer to: “Sector and Sphere: The Design and Implementation of a High Performance Data Cloud”
4. References
Sector and Sphere: The Design and Implementation of a High Performance Data Cloud
http://sector.sourceforge.net/
http://en.wikipedia.org/wiki/Sector/Sphere
http://dongxicheng.org/mapreduce/streaming-mapreduce-sphere/
http://en.wikipedia.org/wiki/UDP-based_Data_Transfer_Protocol
http://udt.sourceforge.net/
- Sector/Sphere:High Performance Distributed File System and Parallel Data Processing Engine
- High Performance Parallel Database Processing and Grid Databases
- ceph翻译 Ceph: A Scalable, High-Performance Distributed File System
- Process Algebra for Parallel and Distributed Processing
- What is the difference between distributed and parallel processing operating system?
- Install _ zimg - A lightweight and high performance image storage and processing system.
- High Performance Post-Processing
- High Availability for the Hadoop Distributed File System (HDFS)
- ExtremeDB performance-distributed sql engine
- High Performance Data Mining - Scaling Algorithms, Applications and Systems
- Yandex Big Data Essentials Week1 Scaling Distributed File System
- 7. Scaling and Parallel Processing
- Enabling High Performance Data Transfers
- Enabling High Performance Data Transfers
- 【转】The Hadoop Distributed File System: Architecture and Design
- Bigtable: A Distributed Storage System for Structured Data : part7 Performance Evaluation
- High Performance Switches and Routers
- Sector/Sphere安装配置
- 会话与状态管理
- xmlbeans使用scomp命令报错 java.io.IOException
- Tomcat中文乱码问题的原理和解决方法
- Unicode 串转换成Char类型串
- IMS Now: What, Why & Where?
- Sector/Sphere:High Performance Distributed File System and Parallel Data Processing Engine
- 二叉树的遍历
- Unity3D 游戏引擎之控制模型移动旋转与碰撞(七)
- ora-12705纠结的错误
- UBL descriptor ti dm365 引导分析 RBL,UBLU-BOOT
- CursorAdapter
- Visual Studio 工程后缀的文件详解
- Java中printf的用法(转载)
- 6个HelloWorld