数据库Shardings

来源：互联网发布：in by 是什么软件编辑：程序博客网时间：2024/05/21 11:22

某位大牛的大作

The most important aspect of scalable web architecture is datapartitioning. Most components in a modern data center are completely stateless,meaning they just do batches of work that is handed to them, but don't storeany data long-term. This is true of most web application servers, caches likememcached, and the entire network infrastructure that connects them. Datastorage is becoming a specialized function, delegated most often to relationaldatabases. This makes sense, because stateless servers are easiest to scale -you just keep adding more. Since they don't store anything, failures are easyto handle too - just take it out of rotation.

Stateful servers require more careful attention. If you are storingall of your data in a relational database, and the load on that databaseexceeds its capacity, there is no automatic solution that allows you to simplyadd more hardware and scale up. (One day, there will be, but that's for anotherpost). In the meantime, most websites are building their own scalable clustersusing sharding.

The idea behind sharding is simple: split the data between multiplemachines, and have a way to make sure that you always access data from theright place. For example, consider this simple scheme. Imagine you want tostore data about customers, each of who has "last name" field in yourdatabase. You could create 26 identical databases, and assign each of them aletter of the alphabet. Then, whenever you want to look up data about JohnSmith, you first connect to the "S" database, and then fetch the datayou want. All of the sudden, your single databasesolution just got twenty-six times more capacity added to it.

Devotees of sharding will already be groaning, because this is a badidea. The problem should be familiar to anyone who plays Scrabble: not allletter are equally likely to be used as the first letter of someone's name. Theshards for S and R will be heavily loaded, but the shards for Q, X, and Z willprobably be idle.

A big part of sharding is picking a good partitioning scheme. Manyhave been proposed, and I won't discuss them in detail here. I really enjoyreading stories of how people have tackled this problem. Two of my favoritesare from Flixster and LiveJournal. In the end, they fall into three broadcategories:

Vertical partitioning. In this scheme, all of the data related to a specific feature of aproduct are stored on the same machines. For example, Friendster was famouslyvertically partitioned at one time in its growth curve. A different cluster ofmachines served each of Profiles, Messages, Testimonials, etc.

Key-based partitioning. In this scheme, you use part of the data itself to do thepartitioning. The "letters of the alphabet" scheme I presentedearlier is one (bad) example of this type. More common is to use a one-wayhashing algorithm to map the data to be accessed to one of the shards thatstore it. Certain data types present natural hashes as well: for numeric keys,you can take the key mod N, where N is the number of shards; dates are easilypartitioned by time interval, etc.

Directory-based partitioning. This scheme maintains a lookup table somewhere in the cluster thatsimply keeps track of which data is stored on which shard. This has two majordrawbacks: the central directory can become a single point of failure, andthere is a performance cost for having to consult the directory ever time youwant to access data anywhere in the cluster.

So far, this is just a summary of what all of us who have attemptedto build web-scale architectures considers obvious. For a good in-depth look ateach partitioning scheme, I recommend ScalabilityStrategies Primer: Database Sharding.

// -------------------------------------------------------------------------------------------------------------

To support a single partitioning scheme is easy, especially if youdesign for it from the start. But startups rarely have either luxury. It'salmost impossible to know ahead of time what kind of data partition makes themost sense for any particular part of your application. It also doesn't makesense to design your app for scalability from the start, since you may have tocreate much iteration before you find a version that requires any scalabilityat all. Even worse, it's a rare app that requires only one kind of datapartitioning.

The solution is to build an architecture that works for the startupcondition. To me, the optimal architecture is one that serves four goalssimultaneously. I'll talk about thosegoals, and then propose a system that I believe serves them well.

• The ability to handle increased capacityincrementally

By adding proportional amounts of new hardware. This is the mainreason most people undertake sharding in the first place. Note that mostkey-based partitioning systems do badly on these criteria. Most key-hashingschemes have some breaking point where, if you add just one more shard, youhave to re-hash all of the data (imagine having to add a 27th database serverto the alphabet-hashing scheme discussed above).  

• Incrementaland proportional software complexity.

As you increase the amount of data partitioning, application codegets more complex. Just to cite one common example, when all of your dataresides on a single shard, you can run complex joins against it. If that samedata is partitioned, you need to process the data in other ways. Just as wewant to be able to make proportional investments in hardware to handleincreased load, a good architecture allows us to make proportionally sizedchanges to the code. We want to avoid anything that requires a massive rewritejust to change how data is partitioned.  

• Supportmultiple sharding schemes

Log-based data is often optimally stored in date-based partitions.Customer data is normally partitioned according to customer id. What about logdata about customers? A good architecture allows us to use the right tool forthe job, or even combine tools as necessary.  

• Easy tounderstand

Some sharding schemes are themselves hard to follow, or require youto write code that is hard to read. Especially as teams grow, programmers whouse it wrong can undo a good scheme. It's unrealistic to expect everyone on theteam to understand scalability in detail. Instead, a good architecture hasclear rules that allow them to do the right thing without having to think toohard about it.

With those criteria in mind, let me propose a solution that hasserved me well in several startups. First, I'll describe how it works, and thenI'll lie out how to build it incrementally.

-------------------------------------------------------------------------------------------------------------------------------------------------------

Start by thinking of every kind of data in your application asassociated with some entity. Give each entity a name. Then, whenever you wantto have your system find the shard for a given entity, compose this name into aURL. For example, let's say you have data about customers. A given customerwould have an entity URL that looked like this:

customer://1234

Somewhere in your API, you have a way to access data about a givencustomer. I'm not talking about high-level abstractions (like anobject-oriented view of a customer's data). I'm talking about the actualdata-fetching operation. At some level in your system, there's probably codethat looks something like this:

$connection =new_db_connection("host", port, "dbname");

$statement =$connection->prepare($sql_statement, $params);

$result =$statement->execute();

What I propose is that you make your sharding decisions directly atthe "new_db_connection" seam in the code. The new code would looklike this:

$connection =new_db_connection("customer://1234");

$statement =$connection->prepare($sql_statement, $params);

$result =$statement->execute();

All of the complexity of determining which shard you need to use(which we'll get to in a moment) is hidden behind that simple API. Once ashard-lookup decision is made, there's no additional overhead; your applicationis talking directly to the right database.

In some cases, you can make it even simpler. If you're of the typeof organization that doesn't use db access objects directly, but simply passesSQL statements through a central API, you can try this: consider the entityidentifier as part of the statement itself. I've done this before using astructured comment added to each SQL statement, like this:

/*entity customer://1234 */ SELECT nameFROM customer WHERE id = 1234

At the expense of some additional parsing overhead in your data-accesslayer, you get additional metadata associated with every query in yourapplication. You may start out using this data only for sharding, but you mayfind it has other benefits as well. For example, you might notice that cachinggets a lot easier if you have good metadata about which queries are associatedwith the same entity. It's a very readable approach, and it makes it easy tochange the sharding for a query without having to change a lot of interveningfunction signatures (since the shard info is carried by the statement that'salready passed through).

Whichever API you use, URL-based sharding is very easy tounderstand. I've taught everyone from hard-core programmers to scripters toHTML-designers to use it properly. And implementing it is easy, too. Everylanguage already has a facility for parsing URLs. I normally also use URLs tospecify the shards themselves, too. So, at its heart, the sharding system isreally a way of mapping entity URLs to shard URLs, which look like this:

customer://1234-> mysql://hostname:port/dbname

The scheme part of the shard URL can allow you to use multipleshard-storage types. For example, in one system I built using this technique,the scheme specified the name of the ADODB driver to use for accessing thatdatabase.

No system is entirely without drawbacks, however, and URL shardingis not an exception. In this case, the big drawback is that it works best ifyou store the mappings in a central directory. This gives you the big payoff ofbeing able to support all kinds of sharding with just one system. Each kind ofshard just maps to an appropriate kind of URL:

customer://1234(id)

forums://master(vertical)

log://January/2008(date)

You can even do more complex schemes, like having groups of IDs allmap to the same URL in a bucket system, although I recommend keeping it simplewhenever possible.

If you have a solid cachingsystem in your cluster, like memcached, the overhead of a system like this isreally quite small. The mappings never change (unless you build a datamigration tool, in which case they change rarely) and so are easy to cache.Using memcached's multi-get, which allows the fetching of many keys in parallel, I have written code to aggregate all the shardlookups for a given page and pre-fetch them, reducing the overhead evenfurther.

The actual directory itself is straightforward. You need to storetwo kinds of information. One is a list of all the available shard types, andthe machines available to store those types. Your operations team normally maintainsthis information. Whenever they want to bring new capacity online, they simplyadd a row to the appropriate table. The second is the mapping itself. Thesharding API follows a simple rule: whenever asked to find the shard for a URL,it either pulls it from the directory or, if it's not there, assigns it to oneof the available shards for that entity type. (There's some added complexity asshards get "weighted" for more or less capacity, and for cases wheredata needs to be moved between shards, but these are not too serious to dealwith). I normally recommend you just store this directory on your masterdatabase, but you could use a standalone vertical shard (or even a key-basedpartition!) to store it.

Incremental build-out

The most important attribute of URL-based sharding, though, is thatit easily supports an incremental build-out that is perfect for startups. Wecan start, as most startups do, with a single, central master DB. We can takeadvantage of the many open source libraries out there that don't scaleespecially well, and write as much non-scalable code as we want. When wefinally have a product that starts to get traction, we can practiceJust-in-Time Scalability, as follows:

· When load on the master DB getsabove some agreed-upon threshold, we measure which tables and queries arecausing the most read/write traffic.

· We pick one such query, anddecide to shard it. We always try to find queries that are causing load but arealso relatively self-contained. For example, most applications usually have anumber of highly trafficked tables that are rarely joined with any others;these are excellent candidates for early sharding.

· Once we've picked a query, wehave to identify all of the tables it touches, and all of the queries thataccess those tables. The better your internal API's and data architecture, theeasier this step will be. But even for total spaghetti code, this doesn't takemore than a few hours.

· We change the code to have allof the queries identified in step 3 tagged with the appropriate entity URL thatthey affect. If we've uncovered an entity type that's not previously beensharded, we'll have to add some new shards and shard-types to the centraldirectory.

Since there is already some data about these entities on the masterDB, we need to migrate it to the new, sharded, locations we picked in step 4. Iwon't go into detail about how to do this data migration in this post;coordinating it with the code changes in step 4 is the hard part. You can reada little about it in my post on Just-in-Time Scalability.

Each time we need to add capacity to our application, we go back tostep 1. At IMVU, during our steepest growth curves, we'd have to do thisexercise every week. But we could do the whole thing in a week, and be ready todo it again the next week. Each time, our application got a little morecomplex, and a little more scalable. But at not time did we have to take thesite down for a rewrite.

If your experience is anything like mine, you may be surprised athow much craft remains on your master database. One advantage of this system isthat it doesn't require you to add scalability to any piece of data that isseldom accessed. All that random junk sitting in your master’s databases’scheme may make you feel bad, but it's actually not causing any harm. Only theparts of your app that need to scale will get the sharding treatment.

0 0