Cassandra - A Decentralized Structured Storage System

来源:互联网 发布:介绍大数据主持词 编辑:程序博客网 时间:2024/05/02 04:42

Related Work

Replica files for high availability at the expense of consistency

Ficus

Coda

DFS not using any centralized server

Farsite

DFS using master/slave

GFS master is now made fault tolerant using Chubby abstraction

Distrubted Relational Database allows disconnected operations and provides eventual data consistency

Bayou

Allow disconnected operations and guarantee eventual consistency

allow application level resolution

Bayou

Perform system level conflict resolution

Coda

Ficus

Data Model

table

distributed multi dimensional map indexed by a key

key

a string with no size restrictions

value

an object which is highly structured

operation

every operation under a single row key is atomic per replica

column

columns are grouped toghther into sets called column families similar to Bigtable

Cassandra exposes two kinds of columns families: Simple and Super column families.

Super column families can be visualize as a column family within a column family

API

insert(table, key, rowMutation)

get(table, key, columnName)

delete(table, key, columnName)

System Architecture

Characteristics the system needs to have

load balancing

membership

failure detection

failure recovery

replica

synchronization

overload handling

state transfer

concurrency

job scheduling

request marshalling

request routing

system monitoring

alarming

configuration management

Partitioning

consistent hashing

uses an order preserving hash function

the basic consistent hasing algorithm presents some changes

1.the random position assignment of each node on the ring leads to non-uniform data and load distribution

2.the basic algorithm is oblivious to the heterogeneity in the performance of nodes, solutions to this problem:

? 1) every node get assigned to multiple positions in the circle(like in Dynamo)

? 2)analyze load information on the ring and have lightly loaded nodes move on the ring to alleviate heavily loaded nodes

Cassandra uses the second solution

Replication

Each key k, is assigned to a coordinator node.

The coordinator is in charge of the replication of the data items

Cassandra elects a leader amongst its nodes using a system called Zookeeper

all nodes on joining the cluster contact the leader who tells them for what ranges they are replicas for

preference list

every node is aware of every other node in the system

Cassandra provides durability guarantees by relaxing the quorum requirements

each row is replicated across multiple data centers

data centers are connected through high speed network links

Membership

Cluster membership: based on Scuttlebutt: a very efficient anti-entropy Gossip based mechanism

Failure Detection

use a modified version if the Accrual Failure Detector

? the failure detection module does't emit a Boolean value stating a node is up or down

? instead the failure detection module emits a value which represents a suspicion level for each of monitored nodes

Bootstrapping

when a node starts for the first time, it chooses a random token for its position in the ring

for fault tolerance, the mapping is persisted to disk locally and also in Zookeeper

the token information is then gossiped around the cluster

the node reads its configuration file which contains a list of a few contact points within the cluster

the initial contact points are called seeds of the cluster

Scaling the Cluster

Local Persistence

relies on the local file system for data persistence

typical write operation:

? a commit log for durability and recoverability

? an update into an in-memory data structure

typical read operation:

? queries the in-memory data structure

? looiking into the files on disk

? a bloom filter, summarizing the keys in the file, is also stored in each file and also kept in memory

? a key in a column family could have many columns which are further away from the key.

? maintain column indices which allow us to jump to the right chunk on disck for column retrieval

?Implementation Details

the Cassandra process on a single machine is primarily consists of the following abstractions:

? partitioning module

? cluster membership

? failure detection module

? storage engine module

? using java

?    

原创粉丝点击