Bigtable: A Distributed Storage System for Structured Data : part10 Related Work

来源：互联网发布：辐射4捏脸数据存档编辑：程序博客网时间：2024/05/22 13:10

10 Related Work
The Boxwood project [24] has components that overlap in some ways with Chubby, GFS, and Bigtable, since it provides for distributed agreement, locking, distributed chunk storage, and distributed B-tree storage.
In each case where there is overlap, it appears that the Boxwood’s component is targeted at a somewhat lower level than the corresponding Google service.
The Boxwood project’s goal is to provide infrastructure for building higher-level services such as file systems or databases, while the goal of Bigtable is to directly support client applications that wish to store data.
Many recent projects have tackled the problem of providing distributed storage or higher-level services over wide area networks, often at “Internet scale.”
This includes work on distributed hash tables that began with projects such as CAN, Chord, Tapestry, and Pastry.
These systems address concerns that do not arise for Bigtable, such as highly variable bandwidth,untrusted participants, or frequent reconfiguration; decentralized control and Byzantine fault tolerance are not Bigtable goals.

10相关工作
Boxwood项目具有在某些方面与Chubby，GFS和Bigtable重叠的组件，因为它提供了分布式协议，锁定，分布式块存储和分布式B树存储。
在每个重叠的情况下，Boxwood的组件看起来都比相应的Google服务略低。
Boxwood项目的目标是提供用于构建更高级别服务（如文件系统或数据库）的基础设施，而Bigtable的目标是直接支持希望存储数据的客户端应用程序。
许多最近的项目已经解决了通常在“互联网规模”的广域网上提供分布式存储或更高级别的服务的问题。
这包括从CAN，Chord，Tapestry和Pastry等项目开始的分布式哈希表的工作。
这些系统解决了Bigtable不出现的问题，例如高度可变的带宽，不受信任的参与者或频繁的重新配置;分散控制和拜占庭容错不是Bigtable的目标。

In terms of the distributed data storage model that one might provide to application developers, we believe the key-value pair model provided by distributed B-trees or distributed hash tables is too limiting.
Key-value pairs are a useful building block, but they should not be the only building block one provides to developers.
The model we chose is richer than simple key-value pairs,and supports sparse semi-structured data.
Nonetheless,it is still simple enough that it lends itself to a very efficient flat-file representation, and it is transparent enough (via locality groups) to allow our users to tune important behaviors of the system.
Several database vendors have developed parallel databases that can store large volumes of data.
Oracle’s Real Application Cluster database uses shared disks to store data (Bigtable uses GFS) and a distributed lock manager (Bigtable uses Chubby).
IBM’s DB2 Parallel Edition is based on a shared-nothing architecture similar to Bigtable.
Each DB2 server is responsible for a subset of the rows in a table which it stores in a local relational database.
Both products provide a complete relational model with transactions.

对于可能向应用程序开发人员提供的分布式数据存储模型，我们认为由分布式B树或分布式哈希表提供的键值对模型太过于限制。
键值对是一个有用的构建块，但它们不应该是开发人员提供的唯一构建块。
我们选择的模型比简单的键值对更加丰富，并且支持稀疏的半结构化数据。
尽管如此，它仍然足够简单，它适用于非常有效的平面文件表示，并且它透明度足够（通过地区组），以允许我们的用户调整系统的重要行为。
几个数据库供应商已经开发出可以存储大量数据的并行数据库。
Oracle的Real Application Cluster数据库使用共享磁盘来存储数据（Bigtable使用GFS）和分布式锁管理器（Bigtable使用Chubby）。
IBM的DB2并行版基于与Bigtable类似的无共享架构。
每个DB2服务器负责其存储在本地关系数据库中的表中的一行子集。
两个产品都提供了一个完整的关系模型与交易。

Bigtable locality groups realize similar compression and disk read performance benefits observed for other systems that organize data on disk using column-based rather than row-based storage, including C-Store and commercial products such as Sybase IQ ,SenSage, KDB+ , and the ColumnBM storage layer in MonetDB/X100 .
Another system that does vertical and horizontal data partioning into flat files and achieves good data compression ratios is AT&T’s Daytona database .
Locality groups do not support CPU-cache-level optimizations, such as those described by Ailamaki.

The manner in which Bigtable uses memtables and SSTables to store updates to tablets is analogous to the way that the Log-Structured Merge Tree stores updates to index data.
In both systems, sorted data is buffered in memory before being written to disk, and reads must merge data from memory and disk.
C-Store and Bigtable share many characteristics:
both systems use a shared-nothing architecture and have two different data structures, one for recent writes, and one for storing long-lived data, with a mechanism for moving data from one form to the other.

Bigtable地区组对于使用基于列的存储（包括C-Store和商业产品，如Sybase IQ，SenSage，KDB +和ColumnBM）使用基于行的存储来组织磁盘上的数据的其他系统实现类似的压缩和磁盘读取性能优势存储层在MonetDB / X100。
AT＆T的Daytona数据库的另一个将垂直和水平数据分成平面文件并实现良好数据压缩比的系统。
位置组不支持CPU缓存级优化，例如由Ailamaki描述的优化。

BigTable使用memtables和SSTables将更新存储到 tablets 的方式类似于Log-Structured Merge Tree 存储更新索引数据的方式。
在这两个系统中，排序数据在写入磁盘之前被缓存在内存中，读取必须合并来自内存和磁盘的数据。
C-Store和Bigtable分享许多特点：
两个系统都使用无共享架构，并具有两种不同的数据结构，一种用于最近的写入，另一种用于存储长命数据，具有将数据从一种形式移动到另一种形式的机制。

The systems differ significantly in their API:
C-Store behaves like a relational database, whereas Bigtable provides a lower level read and write interface and is designed to support many thousands of such operations per second per server.
C-Store is also a “read-optimized relational DBMS”,whereas Bigtable provides good performance on both read-intensive and write-intensive applications.

Bigtable’s load balancer has to solve some of the same kinds of load and memory balancing problems faced by shared-nothing databases (e.g.).
Our problem is somewhat simpler:
(1) we do not consider the possibility of multiple copies of the same data, possibly in alternate forms due to views or indices;
(2) we let the user tell us what data belongs in memory and what data should stay on disk, rather than trying to determine this dynamically;
(3) we have no complex queries to execute or optimize.

这些系统的API有很大不同：
C-Store的行为就像一个关系数据库，而Bigtable提供了一个较低级别的读写接口，并且每个服务器每秒支持数千个这样的操作。
C-Store也是一个“读取优化的关系DBMS”，而Bigtable在读取密集型和写密集型应用程序中提供了良好的性能。
BigTable的负载平衡器必须解决与无共享数据库（例如）所面临的相同种类的负载和内存平衡问题。
我们的问题有点简单：
（1）我们不考虑由于观点或指数而可能以替代形式复制相同数据的可能性;
（2）我们让用户告诉我们什么数据属于内存，哪些数据应该留在磁盘上，而不是动态地确定这些数据;
（3）我们没有复杂的查询来执行或优化。

阅读全文

0 0