Are Cloud Based Memory Architectures the Next Big Thing

来源：互联网发布：linux ntpclient 命令编辑：程序博客网时间：2024/05/20 09:09

Are Cloud Based Memory Architectures the Next Big Thing?

Tue, 03/17/2009 - 00:54 — Todd Hoff

Are Cloud Based Memory Architectures the Next Big Thing? (191)

Weare on the edge of two potent technological changes: Clouds and MemoryBased Architectures. This evolution will rip open a chasm where newplayers can enter and prosper. Google is the master of disk. You can'tbeat them at a game they perfected. Disk based databases like SimpleDBand BigTableare complicated beasts, typical last gasp products of any agingtechnology before a change. The next era is the age of Memory and Cloudwhich will allow for new players to succeed. The tipping point is soon.

Let's take a short trip down web architecture lane:

It's 1993: Yahoo runs on FreeBSD, Apache, Perl scripts and a SQL database

It's 1995: Scale-up the database.

It's 1998: LAMP

It's 1999: Stateless + Load Balanced + Database + SAN

It's 2001: In-memory data-grid.

It's 2003: Add a caching layer.

It's 2004: Add scale-out and partitioning.

It's 2005: Add asynchronous job scheduling and maybe a distributed file system.

It's 2007: Move it all into the cloud.

It's 2008: Cloud + web scalable database.

It's 20??: Cloud + Memory Based Architectures

You may disagree with the timing of various innovations and youwould be correct. I couldn't find a history of the evolution of websitearchitectures so I just made stuff up. If you have any betterinformation please let me know.

Why might cloud based memory architectures be the next big thing?For now we'll just address the memory based architecture part of thequestion, the cloud component is covered a little later.

Behold the power of keeping data in memory:

Google query results are now served in under an astonishingly fast200ms, down from 1000ms in the olden days. The vast majority of thisgreat performance improvement is due to holding indexes completely inmemory. Thousands of machines process each query in order to makesearch results appear nearly instantaneously.

This text was adapted from notes on Google Fellow Jeff Dean keynote speech at WSDM 2009.

Google isn't the only one getting a performance bang from moving data into memory. Both LinkedIn and Digg keep the graph of their network social network in memory. Facebookhas northwards of 800 memcached servers creating a reservoir of 28terabytes of memory enabling a 99% cache hit rate. Even little guys canhandle 100s of millions of events per day by using memory instead of disk.

With their new Unified Computingstrategy Cisco is also entering the memory game. Their new machines"will be focusing on networking and memory" with servers crammed with384 GB of RAM, fast processors, and blazingly fast processorinterconnects. Just what you need when creating memory based systems.

Memory is the System of Record

What makes Memory Based Architectures different from traditional architectures is that memory is the system of record.Typically disk based databases have been the system of record. Disk hasbeen King, safely storing data away within its castle walls. Disk beingslow we've ended up wrapping disks in complicated caching anddistributed file systems to make them perform.

Sure, memory is used as all over the place as cache, but we'realways supposed to pretend that cache can be invalidated at any timeand old Mr. Reliable, the database, will step in and provide thecorrect values. In Memory Based Architectures memory is where the"official" data values are stored.

Cachingalso serves a different purpose. The purpose behind cache basedarchitectures is to minimize the data bottleneck through to disk.Memory based architectures can address the entire end-to-endapplication stack. Data in memory can be of higher reliability andavailability than traditional architectures.

Memory Based Architectures initially developed out of the need in some applications spaces for very low latencies.The dramatic drop of RAM prices along with the ability of servers tohandle larger and larger amounts of RAM has caused memory architecturesto verge on going mainstream. For example, someone recently calculatedthat 1TB of RAM across 40 servers at 24 GB per server would cost anadditional $40,000. Which is really quite affordable given the cost ofthe servers. Projecting out, 1U and 2U rack-mounted servers will soonsupport a terabyte or more or memory.

RAM = High Bandwidth and Low Latency

Why are Memory Based Architectures so attractive? Compared to diskRAM is a high bandwidth and low latency storage medium. Depending onwho you ask the bandwidth of RAM is 5 GB/s. The bandwidth of disk isabout 100 MB/s. RAM bandwidth is many hundreds of times faster. RAMwins. Modern hard drives have latencies under 13 milliseconds.When many applications are queued for disk reads latencies can easilybe in the many second range. Memory latency is in the 5 nanosecondrange. Memory latency is 2,000 times faster. RAM wins again.

RAM is the New Disk

The superiority of RAM is at the heart of the RAM is the New Disk paradigm. As an architecture it combines the holy quadrinity of computing:

Performance is better because data is accessed from memory instead of through a database to a disk.

Scalabilityis linear because as more servers are added data is transparently loadbalanced across the servers so there is an automated in-memory sharding.

Availability is higher because multiple copies of data are kept in memory and the entire system reroutes on failure.

Application development is faster because there’s only onelayer of software to deal with, the cache, and its API is simple. Allthe complexity is hidden from the programmer which means all adeveloper has to do is get and put data.

Access disk on the critical path of any transaction limits boththroughput and latency. Committing a transaction over the networkin-memory is faster than writing through to disk. Reading data frommemory is also faster than reading data from disk. So the idea is toskip disk, except perhaps as an asynchronous write-behind option,archival storage, and for large files.

Or is Disk is the the new RAM?

To be fair there is also a Disk is the the new RAM, RAM is the New Cacheparadigm too. This somewhat counter intuitive notion is that a clusterof about 50 disks has the same bandwidth of RAM, so the bandwidthproblem is taken care of by adding more disks.

The latency problem is handled by reorganizing data structures andlow level algorithms. It's as simple as avoiding piecemeal reads andorganizing algorithms around moving data to and from memory in verylarge batches and writing highly parallelized programs. While I have nodoubt this approach can be made to work by very clever people in manydomains, a large chunk of applications are more time in the randomaccess domain space for which RAM based architectures are a better fit.

Grids and a Few Other Definitions

There's a constellation of different concepts centered around MemoryBased Architectures that we'll need to understand before we canunderstand the different products in this space. They include:

Compute Grid - parallel execution. A Compute Grid is a set of CPUs on which calculations/jobs/work is run. Problems are broken up into smaller tasks and spread across nodes in the grid. The result is calculated faster because it is happening in parallel.

Data Grid - a system that deals with data — the controlled sharing and management of large amounts of distributed data.

In-Memory Data Grid (IMDG) - parallel in-memory datastorage. Data Grids are scaled horizontally, that is by adding morenodes. Data contention is removed removed by partitioning data acrossnodes.

Colocation- Business logic and object state are colocated within the sameprocess. Methods are invoked by routing to the object and having theobject execute the method on the node it was mapped to. Latency is lowbecause object state is not sent across the wire.

Grid Computing - Compute Grids + Data Grids

Cloud Computing - datacenter + API. The API allows the set of CPUs in the grid to be dynamically allocated and deallocated.

Who are the Major Players in this Space?

With that bit of background behind us, there are several major players in this space (in alphabetical order):

Coherence - is a peer-to-peer,clustered, in-memory data management system. Coherence is a good matchfor applications that need write-behind functionality when working witha database and you require multiple applications have ACID transactionson the database. Java, JavaEE, C++, and .NET.

GemFire - an in-memory data caching solution that provides low-latency and near-zero downtime along with horizontal & global scalability. C++, Java and .NET.

GigaSpaces -GigaSpaces attacks the whole stack: Compute Grid, Data Grid, Message,Colocation, and Application Server capabilities. This makes for greatercomplexity, but it means there's less plumbing that needs to be writtenand developers can concentrate on writing business logic. Java, C, or .Net.

GridGain - A compute gridthat can operate over many data grids. It specializes in thetransparent and low configuration implementation of features. Java only.

Terracotta - Terracotta is network-attached memory that allows you share memory and do anythingacross a cluster. Terracotta works its magic at the JVM level andprovides: high availability, an end of messaging, distributed caching,a single JVM image. Java only.

WebSphere eXtreme Scale. Operatesas an in-memory data grid that dynamically caches, partitions,replicates, and manages application data and business logic acrossmultiple servers.

This class of products has generally been called In-Memory DataGrids (IDMG), though not all the products fit snugly in this category.There's quite a range of different features amongst the differentproducts.

I tossed IDMG the acronym in favor of Memory Based Architecturesbecause the "in-memory" part seems redundant, the grid part has givenway to the cloud, the "data" part really can include both data andcode. And there are other architectures that will exploit memory yetwon't be classic IDMG. So I just used Memory Based Architecture asthat's the part that counts.

Given the wide differences between the products there's no canonical architecture. As an example here's a diagram of how GigaSpaces In-Memory-Data-Grid on the Cloud works.

Some key points to note are:

A POJO(Plain Old Java Object) is written through a proxy using a hash-baseddata routing mechanism to be stored in a partition on a ProcessingUnit. Attributes of the object are used as a key. This isstraightforward hash based partitioning like you would use withmemcached.

You are operating through GigaSpace's framework/container sothey can automatically handle things like messaging, sending changeevents, replication, failover, master-worker pattern, map-reduce,transactions, parallel processing, parallel query processing, andwrite-behind to databases.

Scaling is accomplished by dividing your objects into morepartitions and assigning the partitions to Processing Unit instanceswhich run on nodes-- a scale-out strategy. Objects are kept in RAM andthe objects contain both state and behavior. A Service Grid componentsupports the dynamic creation and termination of Processing Units.

Not conceptually difficult and familiar to anyone who has usedcaching systems like memcached. Only is this case memory is not just acache, it's the system of record.

Obviously there are a million more juicy details at play, but that'sthe gist of it. Admittedly GigaSpaces is on the full featured side ofthe product equation, but from a memory based architecture perspectivethe ideas should generalize. When you shard a database, for example,you generally lose the ability to execute queries, you have to do allthe assembly yourself. By using GigaSpaces framework you get a lot ofvery high-end features like parallel query processing for free.

The power of this approach certainly comes in part from familiarconcepts like partitioning. But the speed of memory versus disk alsoallows entire new levels of performance and reliability in a relativelysimple and easy to understand and deploy package.

NimbusDB - the Database in the Cloud

Jim Starkey,President of NimbusDB, is not following the IDMG gang's lead. He'staking a completely fresh approach based on thinking of the cloud as anew platform unto itself. Starting from scratch, what would a databasefor the cloud look like?

Jim is in position to answer this question as he has created a transactional database engine for MySQL named Falcon and added multi-versioning support to InterBase, the first relational database to feature MVCC (Multiversion Concurrency Control).

What defines the cloud as a platform? Here's are some thoughts from Jim I copied out of the Cloud Computinggroup. You'll notice I've quoted Jim way way too much. I did thatbecause Jim is an insightful guy, he has a lot of interesting things tosay, and I think he has a different spin on the future of databases inthe cloud than anyone else I've read. He also has the advantage ofcourse of not having a shipping product, but we shall see.

I've probably said this before, but the cloud is a newcomputing platform that some have learned to exploit, others arescrambling to master, but most people will see as nothing but a minorvariation on what they're already doing. This is not new. When timesharing as invented, the batch guys considered it as remote job entry,just a variation on batch. When departmental computing came along(VAXes, et al), the timesharing guys considered it nothing buttimesharing on a smaller scale. When PCs and client/server computingcame along, the departmental computing guys (i.e. DEC), considered PCsto be a special case of smart terminals. And when the Internet blewinto town, the client server guys considered it as nothing more than aglobal scale LAN. So the batchguys are dead, the timesharing guys aredead, the departmental computing guys are dead, and the client serverguys are dead. Notice a pattern?

The reason that databases are important to cloud computing isthat virtually all applications involve the interaction of client datawith a shared, persistent data store. And while application processingcan be easily scaled, the limiting factor is the database system. So ifyou plan to do anything more than play Tetris in the cloud, the issueof database management should be foremost in your mind.

Disks are the limiting factors in contemporary databasesystems. Horrible things, disk. But conventional wisdom is that youbuild a clustered database system by starting with a distributed filesystem. Wrong. Evolution is faster processors, bigger memory, bettertools. Revolution
is a different way of thinking, a different topology, a different way of putting the parts together.

What I'm arguing is that a cloud is a different platform, andwhat works well for a single computer doesn't work at all well incloud, and things that work well in a cloud don't work at all on thesingle computer system. So it behooves us to re-examine a lot anancient and honorable assumptions to see if they make any sense at allin this brave new world.

Sharing a high performance disk system is fine on a single computer, troublesome in a cluster, and miserable on a cloud.

I'm a database guy who's had it with disks. Didn't much likethe IBM 1301, and disks haven't gotten much better since. Ugly, warty,slow, things that require complex subsystems to hide their miserablecharacteristics. The alternative is to use the memory in a cloud as adistributed L2
cache. Yes, disks are still there, but they're out of the performance loop except for data so stale that nobody has it memory.

Another machine or set of machines is just as good as a disk.You can quibble about reliable power, etc, but write queuing disks havethe same problem.

Once you give up the idea of logs and page caches in favor ofasynchronous replications, life gets a great deal brighter. It reallydoes make sense to design to the strengths of cloud(redundancy) ratherthan their weaknesses (shared anything).

And while one guys is fetching his 100 MB per second, the diskis busy and everyone else is waiting in line contemplating existence.Even the cheapest of servers have two gigabit ethernet channels andswitch. The network serves everyone in parallel while the disk issingle threaded

I favor data sharing through a formal abstraction like arelational database. Shared objects are things most programmers aregood at handling. The fewer the things that application developers needto manage the more likely it is that the application will work.

I buy the model of object level replication, but only as asubstrate for something with a more civilized API. Or in other words,it's a foundation, not a house.

I'd much rather have a pair of quad-core processors running asindependent servers than contending for memory on a dual socket server.I don't object to more cores per processor chip, but I don't want topay for die size for cores perpetually stalled for memory.

The object substrate worries about data distribution and whoshould see what. It doesn't even know it's a database. SQL semanticsare applied by an engine layered on the object substrate. The SQLengine doesn't worry or even know that it's part of a distributeddatabase -- it just executes SQL statements. The black magic is MVCC.

I'm a database developing building a database system forclouds. Tell me what you need. Here is my first approximation: Adatabase that scales by adding more computers and degrades gracefullywhen machines are yanked out; A database system that never needs to beshut down; Hardware and software fault tolerance; Multi-site archivingfor disaster survival; A facility to reach into the past to recoverfrom human errors (drop table customers; oops;); Automatic loadbalancing

MySQL scales with read replication which requires a fulldatabase copy to start up. For any cloud relevant application, that'sprobably hundreds of gigabytes. That makes it a mighty poor candidatefor on-demand virtual servers.

Do remember that the primary function of a database system isto maintain consistency. You don't want a dozen people each drainingthe last thousand buckets from a bank account or a debit to happenwithout the corresponding credit.

Whether the data moves to the work or the work moves to thedata isn't that important as long as they both end up a the same placewith as few intermediate round trips as possible.

In my area, for example, databases are either limited by thebiggest, ugliest machine you can afford *or* you have to learn tooperation without consistent, atomic transactions. A bad rock / hardplace choice that send the cost of scalable application developmentthrough the ceiling. Once we solve that, applications that server20,000,000 users will be simple and cheap to write. Who knows wherethat will go?

To paraphrase our new president, we must reject the false choice between data consistency and scalability.

Cloud computing is about using many computers to scaleproblems that were once limited by the capabilities of a singlecomputer. That's what makes clouds exciting, at least to me. But mostwill argue that cloud computing is a better economic model for runningmany instances of a
single computer. Bah, I say, bah!

Cloud computing is a wonder new platform. Let's not let thedinosaurs waiting for extinction define it as a minor variation of whatthey've been doing for years. They will, of course, but this (and thedinosaurs) will pass.

The revolutionary idea is that applications don't run on asingle computer but an elastic cloud of computers that grows andcontracts by demand. This, in turn, requires an applicationsinfrastructure that can a) run a single application across as manymachines as necessary, and b) run many applications on the samemachines without any of the cross talk and software maintenanceproblems of years past. No, the software infrastructure required toenable this is not mature and certainly not off the shelf, but manysmart folks are working on it.

There's nothing limiting in relational except the companiesthat build them. A relational database can scale as well as BigTableand SimpleDB but still be transactional. And, unlike BigTable andSimpleDB, a relational database can model relationships and do exoticthings like transferring money from one account to another without"breaking the bank.". It is true that existing relational databasesystems are largely constrained to single cpu or cluster with a sharedfile system, but we'll get over that.

Personally, I don't like masters any more than I like slaves.I strongly favor peer to peer architectures with no single point offailure. I also believe that database federation is a work-around
rather than a feature. If a database system had sufficient capacity,reliability, and availability, nobody would ever partition or sharddata. (If one database instance is a headache, a million tiny ones is ahorrible, horrible migraine.)

Logic does need to be pushed to the data, which is whyrelational database systems destroyed hierarchical (IMS), network(CODASYL), and OODBMS. But there is a constant need to push semanticshigher to further reduce the number of round trips between applicationsemantics and the database systems. As for I/O, a database system thatcan use the cloud as an L2 cache breaks free from dependencies on filesystems. This means that bandwidth and cycles are the limiting factors,not I/O capacity.

What we should be talking about is trans-server applicationarchitecture, trans-server application platforms, both, or whether onewill make the other unnecessary.

If you scale, you don't/can't worry about server reliability. Money spent on (alleged) server reliability is money wasted.

If you view the cloud as a new model for scalableapplications, it is a radical change in computing platform. Most peoplesee the cloud through the lens of EC2, which is just another way to run a server that you have to manage and control, then the cloud is little more than a rather
boring business model. When clouds evolve to point that applicationsand databases can utilize whatever resources then need to meet demandwithout the constraint of single machine limitations, we'll havesomething really neat.

On MVCC: Forget about the concept of master. Synchronizingslaves to a master is hopeless. Instead, think of a transaction as atemporal view of database state; different transactions
will have different views. Certain critical operations must beserialized, but that still doesn't require that all nodes haveidentical views of database state.

Low latency is definitely good, but I'm designing the systemto support geographically separated sub-clouds. How well that worksunder heavy load is probably application specific. If the amount ofvolatile data common to the sub-clouds is relatively low, it shouldwork just fine provided there is enough bandwidth to handle thereplication messages.

MVCC tracks multiple versions to provide a transaction with aview of the database consistent with the instant it started whilepreventing a transaction from updating a piece of data that it couldnot see. MVCC is consistent, but it is not serializable. Opinions varybetween academia and the real world, but most database practitionersrecognize that the consistency provided by MVCC is sufficient forprogrammers of modest skills to product robust applications.

MVCC, heretofore, has been limited to single node databases.Applied to the cloud with suitable bookkeeping to control visibility ofupdates on individual nodes, MVCC is as close to black magic as you arelikely to see in your lifetime, enabling concurrency and consistencywith mostly non-blocking, asynchronous messaging. It does, however,dispense with the idea that a cloud has at any given point of time asingle definitive state. Serializability implemented with recordlocking is an attempt to make distributed system march in lock-step sothat the result is as if there there no parallelism between nodes. MVCCrecognizes that parallelism is the key to scalability. Data that is afew microseconds old is not a problem as long as updates don't collide.

Jim certainly isn't shy with his opinions :-)

My summary of what he wants to do with NimbusDB is:

Make a scalable relational database in the cloud where youcan use normal everyday SQL to perform summary functions, definereferential integrity, and all that other good stuff.

Transactions scale using a distributed version of MVCC, which I do not believe has been done before. This is the key part of the plan and a lot depends on it working.

The database is stored primarily in RAM which makes cloud level scaling of an RDBMS possible.

The database will handle all the details of scaling in thecloud. To the developer it will look like just a very large highlyavailable database.

I'm not sure if NimbusDB will support a compute grid and map-reducetype functionality. The low latency argument for data and codecollocation is a good one, so I hope it integrates some sort ofextension mechanism.

Why might NimbusDB be a good idea?

Keeps simple things simple. Web scale databaseslike BigTable and SimpleDB make simple things difficult. They are fullof quotas, limits, and restrictions because by their very nature theyare just a key-value layer on top of a distributed file system. Thedatabase knows as little about the data as possible. If you want tobuild a sequence number for a comment system, for example, it takescomplicated sharding logicto remove write contention. Developers are used to SQL and arecomfortable working within the transaction model, so the transition tocloud computing would be that much easier. Now, to be fair, who knowsif NimbusDB will be able to scale under high load either, but we needto make simple things simple again.

Language independence. Notice the that IDMG productsare all language specific. They support some combination of.Net/Java/C/C++. This is because they need low level object knowledgeto transparently implement their magic. This isn't bad, but it doesmean if you use Python, Erlang, Ruby, or any other unsupported languagethen you are out of luck. As many problems as SQL has, one of its greatgifts is programmatic universal access.

Separates data from code. Data is forever, code changesall the time. That's one of the common reasons for preferring adatabase instead of an objectbase. This also dovetails with thelanguage independence issue. Any application can access data from anylanguage and any platform from now and into the future. That's a goodquality to have.

The smart money has been that cloud level scaling requiresabandoning relational databases and distributed transactions. That'swhy we've seen an epidemic of key-valuedatabases and eventually consistent semantics. It will be fascinatingto see if Jim's combination of Cloud + Memory + MVCC can prove theinsiders wrong.

Are Cloud Based Memory Architectures the Next Big Thing?

We've gone through a couple of different approaches to deploying Memory Based Architectures. So are they the next big thing?

Adoption has been slow because it's new and different and thatinertia takes a while to overcome. Historically tools haven't made iteasy for early adopters to make the big switch, but that is changingwith easier to deploy cloud based systems. And current architectures,with a lot of elbow grease, have generally been good enough.

But we are seeing a wide convergence on caching as way to make slowdisks perform. Truly enormous amounts of effort are going into addingcache and then trying to keep the database and applications all in-syncwith cache as bottom up and top down driven changes flow through thesystem.

After all that work it's a simple step to wonder why that extralayer is needed when the data could have just as well be kept in memoryfrom the start. Now add the ease of cloud deployments and the ease ofcreating scalable, low latency applications that are still easy toprogram, manage, and deploy. Building multiple complicated layers ofapplication code just to make the disk happy will make less and lesssense over time.

We are on the edge of two potent technological changes: Clouds andMemory Based Architectures. This evolution will rip open a chasm wherenew players can enter and prosper. Google is the master of disk. Youcan't beat them at a game they perfected. Disk based databases likeSimpleDB and BigTable are complicated beasts, typical last gaspproducts of any aging technology before a change. The next era is theage of Memory and Cloud which will allow for new players to succeed.The tipping point is soon.

GridGain: One Compute Grid, Many Data Grids

GridGain vs Hadoop

Cameron Purdy: Defining a Data Grid

Compute Grids vs. Data Grids

Performance killer: Disk I/O by Nathanael Jones

RAM is the new disk... by Steven Robbins

Talk on disk as the new RAM by Greg Linden

Disk-Based Parallel Computation, Rubik's Cube, and Checkpointing by Gene Cooperman, Northeastern Professor, High Performance Computing Lab - Disk is the the new RAM and RAM is the new cache

Disk is the new disk by David Hilley.

Latency lags bandwidth by David A. Patterson

InfoQ Article - RAM is the new disk... by Nati Shalom

Tape is Dead Disk is Tape Flash is Disk RAM Locality is King by Jim Gray

Product: ScaleOut StateServer is Memcached on Steroids

Cameron Purdy: Defining a Data Grid

Compute Grids vs. Data Grids

Latency is Everywhere and it Costs You Sales - How to Crush it

Virtualization for High Performance Computing by Shai Fultheim

Multi-Multicore Single System Image / Cloud Computing. A Good Idea? (part 1) by Greg Pfister

How do you design and handle peak load on the Cloud ? by Cloudiquity.

Defining a Data Grid by Cameron Purdy

The Share-Nothing Architecture by Zef Hemel.

Scaling memcached at Facebook

Cache-aside, write-behind, magic and why it sucks being an Oracle customer by Stefan Norberg.

Introduction to Terracotta by Mike

The five-minute rule twenty years later, and how flash memory changes the rules by Goetz Graefe

Are Cloud Based Memory Architectures the Next Big Thing

Are Cloud Based Memory Architectures the Next Big Thing?

Memory is the System of Record

RAM = High Bandwidth and Low Latency

RAM is the New Disk

Or is Disk is the the new RAM?

Grids and a Few Other Definitions

Who are the Major Players in this Space?

NimbusDB - the Database in the Cloud

Are Cloud Based Memory Architectures the Next Big Thing?

Related Articles