The ultimate MySQL high availability solution(压缩版)

来源：互联网发布：php微信网页授权 demo 编辑：程序博客网时间：2024/05/16 02:41

The ultimate MySQL high availability solution

clustering suites（使用集群的工具包来实现HA） is just categorically the wrong approach to database high-availability. 集群本身还是有用的

Clustering suites = FAIL

The following picture shows how a database high-availability setup would be implemented with any of these clustering solutions:

一种框架

So what these clustering frameworks do, is to probe at MySQL from the outside to determine if MySQL still seems to be healthy. In practice you always have a master-slave kind of setup where you are mostly interested in probing the master, but of course the framework is also doing some checks to verify that the slave / standby node is available, in case we should need it later.

If the clustering framework notices that the master MySQL instance is not responding, it initiates failover procedures to the standby node (the slave):

What exactly happens here depends on the setup: for MySQL replication you typically just move thevirtual ip(浮动IP必须能被路由识别，只是一台机子同时使用两个IP，一个固定IP，一个随具体要求而变的IP） to point to the other node, where MySQL is already running.

On a picture it seems simple enough. What could possibly go wrong?

Commonly the agents are written as shell scripts similar to your traditional Unix init script. And I just don't think a complex thing like a high-availability solution should be implemented in bash.简单说，生成的日志多，无用的信息多，配置麻烦

And despite all those log messages, I still managed to make this thing silently fail on startup without any error message. And so forth...And high-availability is a complex topic - this is not just another text editor someone is developing. 提供的信息不充分。So issues like those generally don't give me the confidence that this thing will actually work in production and do the right thing when the network is down.

If your MySQL instance crashes,it doesn't help if your choice of clustering software just starts a new instance somewhere else, or randomly moves a vip somewhere else. You actuallyneed to replicate the data to begin with

clustering solutions漏洞:

At a specified interval - the heartbeat - the clustering solution will see if your MySQL instance is still running.Problem: If you have 5000 trx/sec, you could have fifty-thousand failuresbefore an attempt to fix the error is made(在间隙发送的数据很可能丢失).
The clustering solution takes small peeks into MySQL from the outside,but other than that MySQL remains a black box. (监控和MySQL的本身无关，错误信息）,监控并不能保证各MySQL-Server的数据的一致性。
Typically the clustering software itself will have some communication going on, which has the benefit that it verifies that the network connection between nodes is ok. This is in itself useful, sure. Problem: But just like above, if the clustering software detects that network has failed, it's still mostly unaware of the state of MySQL. The failover decision is done blindly(监控认为网络不好，其实网络好的很,况且Master也没有FAIL,没有必要进行failover)
In a typical setup like above, the clustering software is actually just checking whether there is a network connection between the two MySQL nodes.（只检查网络，并不检查MySQL-Server是否连到对方） Kind of like the guy in the joke who was searching for his keys under the lamp where there was light, not where he actually lost the keys.

In this architecture you typically have one writeable master. When something is wrong with that master, you need to do failover. The failover needs to happen atomically: if you have many application servers you cannot have a situation where some writes are still headed for the old master while others are already happening on the new one. Typically this is handled by using a virtual ip that is assigned to one node at a time. Sounds like a simple solution, but in practice it's entirely possible (for poorly implemented clustering solutions)to simply assign the virtual ip to both nodes at the same time! Robert Hodges has a nice blog posts about problems with virtual ip's.

Often failovers are also expensive,This brings in the problem of false positives: Often these clustering solutions react to small hiccups(打嗝） in the network and start a failover, even when the situation is then quickly restored（没有必要的故障转移).

The committing of transactions, the replication of data and the detection of failuresneed to be done together by the same piece of software.

Why Galera has none of these problems

The above novel gives a background to why I have been so interested in Galera lately. Let's look at how you'd implement a MySQL high-availability cluster with Galera:

Galera implements synchronous multi-master replication. So the first benefit of this is: there is no virtual ip, andI don't need to choose which node is the master and which are the slaves.

It is just needed for the cluster to elect a quorum in error situations - you always need to have at least 3 nodes in quorum based clustering.

So what happens when one node fails?

Notice the contrast to the previous list of problems:

Thanks to synchronous replication and Galera's quorum mechanism, no commits are lost anywhere. When the failure happens, it will be detected as part of the replication process. The Galera nodes themselves figure out which node is thrown out of the cluster, and the remaining ones - who "have quorum" - proceed to commit the transaction. Application nodes that connected to the failing node will of course receive errors as their commits fail.
There is no need for maintaining master-slave state, virtual ip or to do any failover(无Master-Slave之分，数据完全同步).The application can connect to any of the Galera nodes and if the transaction succeeds, then it succeeds. If it fails, retry with another node.(实线表式可连接，但app 一次只连一个Master,Master之间的备份有数据库自己来同步)
since the replication is synchronous there is no slave lag //没有延迟，

But wait, this is not yet the whole story.

用户端的解决方案：

但是，users are not happy, because when a node has failed, the application will continue to try to connect to it and you'll get a lot of connection errors at your application server. (因为他们一根筋，没有人通知他可以连接别的Server).So in practice people still use either virtual ip or some kind of tcp/ip load balancer in front of Galera.(但这是不必要的,还有可能导致单节点问题)

JDBC提供的方案：

It's no wonder, because the needed JDBC parameter is still not really explained anywhere in the MySQL manual.

when using a Galera cluster,you should use mysql:loadbalance: in front of your JDBC connection string. This allows you to then give a list of MySQL nodes,which are all writeable masters. The JDBC driver will connect to any one of them and commit the transaction. If a node is not available, it will just try another one. (If a transaction was already in progress, it will fail with an exception, you can then retry it and it will just connect to a new node.)

You should read the blog post for details, in particular on how to use the blacklist timeout argument.

mysql:loadbalance能起到多大作用？

读写分离如何实现?(手工编码?),在Galera方案中不存在读写分离，因为数据库是完全同步的，即使写一个数据库，实际上写的也是多个数据库。

This again proves the principle that the best solutions also tend to be simple solutions. If your solution is complex, you're not doing it right.

下面的评论也很有意思，还没压缩干净，:-)

hingo's blog
2790 reads

Galera MySQL with a load

Submitted by EllisGL (not verified) on Sun, 2011-07-24 03:49.

Galera MySQL with a load balancer (HAProxy) has been working well for me.

I bet it has! And thank you

Submitted by hingo on Sun, 2011-07-24 17:04.

I bet it has! And thank you for sharing - I haven't actually run Galera in production myself, yet.

Even if my post argues you don't need the loadbalancer (if you use JDBC, for PHP and others this doesn't really exist yet) if you are using a load balancer with Galera, and until now everyone did that, you are still way ahead what everyone else is doing with MySQL replication and other really suboptimal solutions.

Yeah - using a load balancer

Submitted by ellisgl (not verified) on Sun, 2011-07-24 18:32.

Yeah - using a load balancer makes it a bit easier to make it plug-n-play so to say. No need to redo DB connection support in the app. You could augment it further if you needed to with read/write splitting and have a couple nodes used for writing only and just put that in your load balancer as another IP or another port (3307 or something).

I use HAProxy so it's it's

Submitted by ellisgl (not verified) on Mon, 2011-07-25 17:57.

I use HAProxy so it's it's easy to just change one config setting in an app, instead of having to add in the rest of the stuff. Also, HAProxy keeps an eye on each server, so if one is down, you don't have to worry. Also it creates a central location for adding/removing servers from the LB. You solution works, but if you have 10+ apps using the DBs, it can be a pain to update all the configs. Only draw back is that you can't read/write split natively with HAProxy, and would have to be done at the app level.

So the MySQL JDBC driver can

Submitted by hingo on Mon, 2011-07-25 18:35.

So the MySQL JDBC driver can do all that. Except you are right, if you want to change the configuration - like add more nodes - then of course you have to do it for each app server. On the other hand, centralized configuration is convenient, but there is a risk you create a new single point of failure. I don't know about HAProxy specifically, just that this has been my experience in general.

I have my HAProxy

Submitted by ellisgl (not verified) on Mon, 2011-07-25 18:42.

I have my HAProxy implementation backed with Corosync/Pacemaker with a simple 2 way rsync on the configuration. So if the main box goes down, the 2nd comes up pretty quickly.

That's the right way to do

Submitted by hingo on Mon, 2011-07-25 20:29.

That's the right way to do it. (Except, I'm not a fan of corosync/pacemaker either, but at least in theory this is the right way.)

The ultimate MySQL high availability solution | MySQL | Syng's picture

Pingback

Submitted by The ultimate MySQL high availability solution | MySQL | Syng (not verified) on Sun, 2011-07-24 05:36.

[...] comments to that blog I'm indeed convinced that Baron is right. For one thing, it includes the... Read more... Categories: MySQL Share | Related [...]

Nice article -- gently disagree

Submitted by Baron Schwartz (not verified) on Mon, 2011-07-25 11:40.

Henrik,

I support your views except, that there is a large set of people who use traditional replication due to its benefits, and want to manage promoting a replica for faster recovery times (not really HA). For their business needs, all things considered, it can be a Very Good Solution. (I mentioned some of these use cases in my blog post, which you linked to.)

All of the failure modes you mentioned are possible, but it's also possible for the clustering software to verify whether it's safe to do a failover, and refuse to do so otherwise. Or to do it manually. This is the approach we're taking with the Percona Replication Manager project, which is still in early development. In practice, I expect this to result in the tool refusing to do anything in a great many cases, which is FINE. When that happens, they should call us (assuming they are Percona customers) and a human can safely resolve the situation in a matter of minutes.

But for real High Availability with capital letters, you will hear absolutely no argument from me. Replication isn't it, and these external clustering systems aren't the way to go.

MySQL replication basically

Submitted by hingo on Mon, 2011-07-25 16:47.

MySQL replication basically runs the web as we know it. So yes, it's not doing that bad. I'm familiar with your argument here.

And yes, part of the problem is that the clustering solutions available don't impress with their design or implementation. I'm sure a solution by you or Yoshinori-san can seriously improve over current state of the art. But even so, your solution will be based on MySQL replication and thus suboptimal.

And of course, until now we didn't have a better solution. Well, MySQL Cluster, but not for InnoDB. Even now, we have something great like Galera but you basically have to patch and compile your own MySQL server. So yeah, no points for ease of use.

At the same time, I've tried Galera, I think it is quite easy to use when you get it ready packaged. There's no binlog positions you have to keep track of. So it's not only better for data durability, I argue people will find it having better usability too. And not only is the replication itself easier to use, it enables you to simplify your architecture. This is a state I want to reach.

As long as someone who ships binaries of a MySQL fork integrates it first - hint, hint :-) - so we don't need to patch MySQL ourselves.

That's the way to do it.

Submitted by Alex (not verified) on Wed, 2011-07-27 15:19.

Baron, the approach you're taking with Percona Replication Manager is nothing short of laudable. Indeed, they don't trust computers to dispatch air traffic or supervise children in the kindergarten, yet insist on fully automatic failover or it is not "highly available". Task automation is indispensable of course, but in many cases only a human can make the decision.

There is a catch there though - in the case of some massive failure (like the recent EC2 zone outage), human operators may not be able to cope with all requests on a timely basis. However there's little doubt that they would still do a better job than a fully automated tool.

As for the High Availability with capital letters - that term should be ditched in the gutter as it embodies something as unrealistic and unattainable as "eternal bliss". Better Availability would be more appropriate.

HighER Availability

Submitted by Baron Schwartz (not verified) on Wed, 2011-07-27 15:45.

Right, High Availability is always a matter of degree, not an absolute.

This discussion reminds me of something insightful that Josh Berkus once wrote. He asked and answered the question, "what kind of cluster?" He posits that we should focus more on the users, and less on the technology, which I like. I also agree with his three types of cluster users. They match closely the customers I've worked with.

http://it.toolbox.com/blogs/database-soup/the-three-database-clustering-...

Bad users?...

Submitted by Alex (not verified) on Wed, 2011-07-27 17:55.

Thanks for pointing to a great article. Very insightful indeed. Unfortunately it highlights the same old "mathematical correctness vs. use case" cognitive gap. Take just one requirement from Transactional User:

To do this with zero data loss and 99.999% or better uptime.

According to CAP theorem this is a pretty tough call. Because 99.999% uptime pretty much means A+P, whereas zero data loss is C. So Transactional User is forced to compromise and that makes him unhappy, and he thinks it is because developers are not trying hard enough to see to his needs. I guess developers tried for decades, but you can't jump the laws of logic, not in programming at least.

Online User is just patently insane. Damn, I wish, I _really_ wish I could make a Solution that would work well for him! But that's impossible. No wonder they are unhappy.

I think that it is high time that the _users_ mend their ways and make few steps towards developers (i.e. distributed computing realities). It maybe in the form of changing their software, or changing their expectations, but it has to be done - or their expectations will never be met. (And usually, when you're in a dead end, you're on a wrong way anyways). And this is exactly why I applauded Percona's human approach which I see as a change of expectations (in this context).

MySQL-MHA: "MySQL Master High Availability manager and tools"

Submitted by Demetrius Albuquerque (not verified) on Mon, 2011-07-25 16:50.

And what you can say about the MySQL-MHA?http://yoshinorimatsunobu.blogspot.com/2011/07/announcing-mysql-mha-mysq...

Submitted by Running sysbench tests against a Galera cluster | OpenLife.c (not verified) on Wed, 2011-08-10 21:31.

[...] In fact, sysbench makes this very easy as it allows me to give a list of nodes separated by commas, such as --mysql-host=host1,host2,host3. This is in the same spirit I outlined in my recent post The Ultimate MySQL High-Availability Solution. [...]

Upcoming conferences: Highload++ Moscow and Percona Live Lon's picture

Pingback

Submitted by Upcoming conferences: Highload++ Moscow and Percona Live Lon (not verified) on Fri, 2011-09-23 08:13.

[...] / High Availability Solution which is based on my thinking developed in my recent blog post The ultimate MySQL high availability solution and many benchmarks and functional tests I've done while evaluating these [...]

MySQL HA shootout at Percona Live UK | OpenLife.cc's picture

Pingback

Submitted by MySQL HA shootout at Percona Live UK | OpenLife.cc (not verified) on Tue, 2011-10-11 07:51.

[...] truly will be presenting Choosing a MySQL Replication / High Availability Solution. If you've been reading my blog, you already know that I will advice the audience to choose a solution that is based on synchronous [...]

Henrik Ingo: Upcoming conferences: Highload++ Moscow and Per's picture

Pingback

Submitted by Henrik Ingo: Upcoming conferences: Highload++ Moscow and Per (not verified) on Wed, 2011-10-19 20:10.

MySQL/Galera 1.0 Released &amp;laquo; IT Primer's picture

Pingback

Submitted by MySQL/Galera 1.0 Released &laquo; IT Primer (not verified) on Mon, 2011-10-31 05:01.

[...] Ingo described Galera as the ultimate MySQL HA solution. GA_googleAddAttr("AdOpt", "1"); GA_googleAddAttr("Origin", "other"); [...]