Don't use MongoDB

来源：互联网发布：2017免流源码编辑：程序博客网时间：2024/05/22 14:18

Don't use MongoDB=================I've kept quiet for awhile for various political reasons, but I nowfeel a kind of social responsibility to deter people from bankingtheir business on MongoDB.Our team did serious load on MongoDB on a large (10s of millionsof users, high profile company) userbase, expecting, from early goodexperiences, that the long-term scalability benefits touted by 10genwould pan out. We were wrong, and this rant serves to deter youfrom believing those benefits and making the same mistakewe did. If one person avoid the trap, it will have beenworth writing. Hopefully, many more do.Note that, in our experiences with 10gen, they were nearly alwayshelpful and cordial, and often extremely so. But at the sametime, that cannot be reason alone to supress information aboutthe failings of their product.Why this matters----------------Databases must be right, or as-right-as-possible, b/c databasemistakes are so much more severe than almost every other variationof mistake. Not only does it have the largest impact on uptime,performance, expense, and value (the inherit value of the data),but data has *inertia*. Migrating TBs of data on-the-fly isa massive undertaking compared to changing drcses or fixing theaverage logic error in your code. Recovering TBs of data whiledown, limited by what spindles can do for you, is a helplessfeeling.Databases are also complex systems that are effectively blackboxes to the end developer. By adopting a database system,you place absolute trust in their ability to do the right thingwith your data to keep it consistent and available.Why is MongoDB popular?-----------------------To be fair, it must be acknowledged that MongoDB is popular,and that there are valid reasons for its popularity. * It is remarkably easy to get running * Schema-free models that map to JSON-like structures have great appeal to developers (they fit our brains), and a developer is almost always the individual who makes the platform decisions when a project is in its infancy * Maturity and robustness, track record, tested real-world use cases, etc, are typically more important to sysadmin types or operations specialists, who often inherit the platform long after the initial decisions are made * Its single-system, low concurrency read performance benchmarks are impressive, and for the inexperienced evaluator, this is often The Most Important ThingNow, if you're writing a toy site, or a prototype, somethingwhere developer productivity trumps all other considerations,it basically doesn't matter *what* you use. Use whatevergets the job done.But if you're intending to really run a large scale systemon Mongo, one that a business might depend on, simply put:Don't.Why not?--------**1. MongoDB issues writes in unsafe ways *by default* in order towin benchmarks**If you don't issue getLastError(), MongoDB doesn't wait for anyconfirmation from the database that the command was processed.This introduces at least two classes of problems: * In a concurrent environment (connection pools, etc), you may have a subsequent read fail after a write has "finished"; there is no barrier condition to know at what point the database will recognize a write commitment * Any unknown number of save operations can be dropped on the floor due to queueing in various places, things outstanding in the TCP buffer, etc, when your connection drops of the db were to be KILL'd or segfault, hardware crash, you name it**2. MongoDB can lose data in many startling ways**Here is a list of ways we personally experienced records go missing: 1. They just disappeared sometimes. Cause unknown. 2. Recovery on corrupt database was not successful, pre transaction log. 3. Replication between master and slave had *gaps* in the oplogs, causing slaves to be missing records the master had. Yes, there is no checksum, and yes, the replication status had the slaves current 4. Replication just stops sometimes, without error. Monitor your replication status!**3. MongoDB requires a global write lock to issue any write**Under a write-heavy load, this will kill you. If you run a blog,you maybe don't care b/c your R:W ratio is so high.**4. MongoDB's sharding doesn't work that well under load**Adding a shard under heavy load is a nightmare.Mongo either moves chunks between shards so quickly it DOSesthe production traffic, or refuses to more chunks altogether.This pretty much makes it a non-starter for high-trafficsites with heavy write volume.**5. mongos is unreliable**The mongod/config server/mongos architecture is actually prettyreasonable and clever. Unfortunately, mongos is completegarbage. Under load, it crashed anywhere from every few hoursto every few days. Restart supervision didn't always help b/csometimes it would throw some assertion that would bail out acritical thread, but the process would stay running. Doublefail.It got so bad the only usable way we found to run mongos wasto run haproxy in front of dozens of mongos instances, andto have a job that slowly rotated through them and killed themto keep fresh/live ones in the pool. No joke.**6. MongoDB actually once deleted the entire dataset**MongoDB, 1.6, in replica set configuration, would sometimesdetermine the wrong node (often an empty node) was the freshestcopy of the data available. It would then DELETE ALL THE DATAON THE REPLICA (which may have been the 700GB of good data)AND REPLICATE THE EMPTY SET. The database should never nevernever do this. Faced with a situation like that, the databaseshould throw an error and make the admin disambiguate bywiping/resetting data, or forcing the correct configuration.NEVER DELETE ALL THE DATA. (This was a bad day.)They fixed this in 1.8, thank god.**7. Things were shipped that should have never been shipped**Things with known, embarrassing bugs that could cause dataproblems were in "stable" releases--and often we weren't toldabout these issues until after they bit us, and then only b/cwe had a super duper crazy platinum support contract with 10gen.The response was to send up a hot patch and that they werecalling an RC internally, and then run that on our data.**8. Replication was lackluster on busy servers**Replication would often, again, either DOS the master, orreplicate so slowly that it would take far too long andthe oplog would be exhausted (even with a 50G oplog).We had a busy, large dataset that we simply couldnot replicate b/c of this dynamic. It was a harrowing monthor two of finger crossing before we got it onto a differentdatabase system.**But, the real problem:**You might object, my information is out of date; they'vefixed these problems or intend to fix them in the next version;problem X can be mitigated by optional practice Y.Unfortunately, it doesn't matter.The real problem is that so many of these problems existedin the first place.Database developers must be held to a higher standard thanyour average developer. Namely, your priority list shouldtypically be something like: 1. Don't lose data, be very deterministic with data 2. Employ practices to stay available 3. Multi-node scalability 4. Minimize latency at 99% and 95% 5. Raw req/s per resource10gen's order seems to be, #5, then everything else in someorder. #1 ain't in the top 3.These failings, and the implied priorities of the company,indicate a basic cultural problem, irrespective of whateverproblems exist in any single release: a lack of the requisitediscipline to design database systems businesses should bet on.Please take this warning seriously.