[mysql]一次主从数据不一致的问题解决过程

来源:互联网 发布:电脑桌 知乎 编辑:程序博客网 时间:2024/05/16 23:35
mysql主从一致性修复

目录(?)[+]

之前一篇: 主从更换ip之后重新建立同步
情况时这样的
昨天晚上主动2个机器都迁移了,然后今天才把主动重新连接上,但是从库的偏移量是从今天当前时刻开始的,也就是说虽然现在主动看似正常,其实是少了昨天的部分数据,由于从库的数据丢失了,早晚还是要填坑的。

问题

  • 要解决问题就是怎么对比不一致,然后在不影响业务的情况下,修复数据不一致的问题,把从库缺少的数据补上

下面是能想到和找到的几个方案
1 从新从0开始同步,虽然对主库的使用没有影响,但是那么大的数据量,对性能,网络影响有点大,数据丢失的应该很少
2 主库dump数据,锁库,然后同步,不好。 影响业务使用
3 percona-toolkit 中的工具来校验和同步,从介绍上来看是符合现在的情况的,使用上还需要学习和认识才行。

下面是几个参考链接

  • percona-toolkit工具 官方地址
  • MySQL主从服务器数据一致性的核对与修复 简单描述下过程
  • 用pt-table-checksum校验数据一致性 描述工具原理
  • 用pt-table-sync修复不一致的数据 描述了工具原理

操作过程

只把过程和用到的东西解释了下,有些参数选项等还需要查阅文档。两台机器都是centos6.5 mysql版本都是5.6 , 由于是线上环境,这里ip和密码等敏感信息修改了下。

  • 主 192.168.1.100
  • 从 192.168.1.98
  • 修复数据库名 radius

工具安装

主库服务器安装

<code class="hljs vala has-numbering"><span class="hljs-preprocessor">#安装依赖包</span><span class="hljs-preprocessor"># yum install perl-DBI  perl-DBD-MySQL  perl-TermReadKey perl-Time-HiRes</span><span class="hljs-preprocessor">#安装工具</span><span class="hljs-preprocessor"># wget percona.com/get/percona-toolkit.tar.gz</span><span class="hljs-preprocessor"># tar zxvf percona-toolkit-2.2.14.tar.gz</span><span class="hljs-preprocessor"># cd percona-toolkit-2.2.14</span><span class="hljs-preprocessor"># perl Makefile.PL && make && make install </span></code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li></ul>

校验数据一致性

建立用户并授权

注意这里要在主从创建一个同名的用户,可以从主库访问从库,主库本地可以访问主库。工具的使用都是在主库的服务器上进行,使用
pt-table-checksum校验数据一致性。

从库mysql操作

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>,PROCESS, SUPER, REPLICATION SLAVE <span class="hljs-keyword">ON</span> *.* <span class="hljs-keyword">TO</span> <span class="hljs-string">'checksums'</span>@<span class="hljs-string">'192.168.1.100'</span> IDENTIFIED <span class="hljs-keyword">BY</span> <span class="hljs-string">'slavecheck'</span>;</span>flush privileges;</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li></ul>


主库mysql操作

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>, PROCESS, SUPER, REPLICATION SLAVE <span class="hljs-keyword">ON</span> *.* <span class="hljs-keyword">TO</span> <span class="hljs-string">'checksums'</span>@<span class="hljs-string">'192.168.1.100'</span> IDENTIFIED <span class="hljs-keyword">BY</span> <span class="hljs-string">'slavecheck'</span>;</span><span class="hljs-operator"><span class="hljs-keyword">GRANT</span> <span class="hljs-keyword">SELECT</span>,<span class="hljs-keyword">INSERT</span>,<span class="hljs-keyword">UPDATE</span>,<span class="hljs-keyword">DELETE</span> <span class="hljs-keyword">ON</span> radius.checksums <span class="hljs-keyword">TO</span> <span class="hljs-string">'checksums'</span>@<span class="hljs-string">'192.168.1.100'</span>;</span>flush privileges;</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ul>

校验时候需要在主mysql 中新建一张表,新建用户需要有读写的权限,这里是把校验表建立在radius库中。

pt-table-checksum 校验

校验是在主库服务器上进行的

<code class="hljs lasso has-numbering">主库shell中执行pt<span class="hljs-attribute">-table</span><span class="hljs-attribute">-checksum</span> h<span class="hljs-subst">=</span><span class="hljs-string">'192.168.1.100'</span>,u<span class="hljs-subst">=</span><span class="hljs-string">'checksums'</span>,p<span class="hljs-subst">=</span><span class="hljs-string">'slavecheck'</span>,P<span class="hljs-subst">=</span><span class="hljs-number">3306</span> <span class="hljs-attribute">-d</span> radius <span class="hljs-subst">--</span>nocheck<span class="hljs-attribute">-replication</span><span class="hljs-attribute">-filters</span> <span class="hljs-subst">--</span>replicate<span class="hljs-subst">=</span>radius<span class="hljs-built_in">.</span>checksums<span class="hljs-subst">--</span>nocheck<span class="hljs-attribute">-replication</span><span class="hljs-attribute">-filters</span> :不检查复制过滤器,建议启用。后面可以用<span class="hljs-subst">--</span>databases来指定需要检查的数据库。<span class="hljs-subst">--</span>no<span class="hljs-attribute">-check</span><span class="hljs-attribute">-binlog</span><span class="hljs-attribute">-format</span>      : 不检查复制的binlog模式,要是binlog模式是ROW,则会报错。<span class="hljs-subst">--</span>replicate<span class="hljs-attribute">-check</span><span class="hljs-attribute">-only</span> :只显示不同步的信息。<span class="hljs-subst">--</span>replicate<span class="hljs-subst">=</span>    :把checksum的信息写入到指定表中,建议直接写到被检查的数据库当中。 <span class="hljs-subst">--</span>databases<span class="hljs-subst">=</span>    :指定需要被检查的数据库,多个则用逗号隔开。<span class="hljs-subst">--</span>tables<span class="hljs-subst">=</span>       :指定需要被检查的表,多个用逗号隔开h<span class="hljs-subst">=</span><span class="hljs-number">192.168</span><span class="hljs-number">.1</span><span class="hljs-number">.100</span> :Master的地址u<span class="hljs-subst">=</span>checksums         :用户名p<span class="hljs-subst">=</span>slavecheck        :密码P<span class="hljs-subst">=</span><span class="hljs-number">3306</span>          :端口</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li></ul>

这个脚本在主库机器上运行,会自动找到从库地址,并用相同的用户登录,然后对比。

–replicate 选项是建立一个表来存储对比信息,这个表一定要能同步到从库中,如果checksums用户没有建表权限,请自行建立好表

建表语句

<code class="hljs sql has-numbering"><span class="hljs-operator"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">IF</span> <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">EXISTS</span> <span class="hljs-string">`radius`</span>.<span class="hljs-string">`checksums`</span> (     db             <span class="hljs-keyword">CHAR</span>(<span class="hljs-number">64</span>)     <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">NULL</span>,     tbl            <span class="hljs-keyword">CHAR</span>(<span class="hljs-number">64</span>)     <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">NULL</span>,     chunk          <span class="hljs-keyword">INT</span>          <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">NULL</span>,     chunk_time     <span class="hljs-keyword">FLOAT</span>            <span class="hljs-keyword">NULL</span>,     chunk_index    <span class="hljs-keyword">VARCHAR</span>(<span class="hljs-number">200</span>)     <span class="hljs-keyword">NULL</span>,     lower_boundary TEXT             <span class="hljs-keyword">NULL</span>,     upper_boundary TEXT             <span class="hljs-keyword">NULL</span>,     this_crc       <span class="hljs-keyword">CHAR</span>(<span class="hljs-number">40</span>)     <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">NULL</span>,     this_cnt       <span class="hljs-keyword">INT</span>          <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">NULL</span>,     master_crc     <span class="hljs-keyword">CHAR</span>(<span class="hljs-number">40</span>)         <span class="hljs-keyword">NULL</span>,     master_cnt     <span class="hljs-keyword">INT</span>              <span class="hljs-keyword">NULL</span>,     ts             <span class="hljs-keyword">TIMESTAMP</span>    <span class="hljs-keyword">NOT</span> <span class="hljs-keyword">NULL</span> <span class="hljs-keyword">DEFAULT</span> <span class="hljs-keyword">CURRENT_TIMESTAMP</span> <span class="hljs-keyword">ON</span> <span class="hljs-keyword">UPDATE</span> <span class="hljs-keyword">CURRENT_TIMESTAMP</span>,     <span class="hljs-keyword">PRIMARY</span> <span class="hljs-keyword">KEY</span> (db, tbl, chunk),     INDEX ts_db_tbl (ts, db, tbl)  ) ENGINE=INNODB;</span></code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li><li>15</li><li>16</li></ul>

我这里手动建立好表之后出现了如下的错误

<code class="hljs coffeescript has-numbering"><span class="hljs-number">6</span>-<span class="hljs-number">16</span><span class="hljs-attribute">T16</span>:<span class="hljs-number">10</span>:<span class="hljs-number">48</span> The --replicate table `<span class="javascript">radius</span>`.`<span class="javascript">checksums</span>` exists <span class="hljs-literal">on</span> the master but but it has problems <span class="hljs-literal">on</span> these <span class="hljs-attribute">replicas</span>:Table radius.checksums does <span class="hljs-keyword">not</span> exist <span class="hljs-literal">on</span> replica localhost.localdomain</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li></ul>

之前的错误,导致主从复制有问题,去从库查看主动状态,调整是得主从正常。

错误解决完了继续执行(结果有省略)

<code class="hljs r has-numbering">下面继续在主库的shell上检查[root@localhost portal]<span class="hljs-comment"># pt-table-checksum h='192.168.1.100',u='checksums',p='slavecheck',P=3306 -d radius --nocheck-replication-filters --replicate=radius.checksums</span>            TS ERRORS  DIFFS     ROWS  CHUNKS SKIPPED    TIME TABLE<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">21</span>      <span class="hljs-number">0</span>      <span class="hljs-number">1</span>     <span class="hljs-number">8379</span>       <span class="hljs-number">4</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.322</span> radius.account_account<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">21</span>      <span class="hljs-number">0</span>      <span class="hljs-number">1</span>    <span class="hljs-number">11429</span>       <span class="hljs-number">1</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.278</span> radius.account_mac<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">21</span>      <span class="hljs-number">0</span>      <span class="hljs-number">1</span>    <span class="hljs-number">63747</span>       <span class="hljs-number">1</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.329</span> radius.account_smslog<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">21</span>      <span class="hljs-number">0</span>      <span class="hljs-number">0</span>        <span class="hljs-number">0</span>       <span class="hljs-number">1</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.016</span> radius.auth_group<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">21</span>      <span class="hljs-number">0</span>      <span class="hljs-number">0</span>        <span class="hljs-number">0</span>       <span class="hljs-number">1</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.013</span> radius.auth_group_permissions<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">22</span>      <span class="hljs-number">0</span>      <span class="hljs-number">0</span>       <span class="hljs-number">27</span>       <span class="hljs-number">1</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.265</span> radius.auth_permission<span class="hljs-number">06</span>-16T16:<span class="hljs-number">50</span>:<span class="hljs-number">22</span>      <span class="hljs-number">0</span>      <span class="hljs-number">1</span>     <span class="hljs-number">8384</span>       <span class="hljs-number">1</span>       <span class="hljs-number">0</span>   <span class="hljs-number">0.273</span> radius.auth_user<span class="hljs-keyword">...</span><span class="hljs-keyword">...</span></code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li></ul>

出现这种结果,说明已经check了,diffs一栏有不同,说明那些表数据不一致. 现在登录从库的mysql,执行如下语句

<code class="hljs markdown has-numbering">mysql> select * from radius.checksums where master<span class="hljs-emphasis">_cnt <> this_</span>cnt OR master<span class="hljs-emphasis">_crc <> this_</span>crc OR ISNULL(master<span class="hljs-emphasis">_crc) <> ISNULL(this_</span>crc) \G<span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">** 1. row **</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-strong">*****</span><span class="hljs-code">            db: radius</span><span class="hljs-code">           tbl: account_account</span><span class="hljs-code">         chunk: 2</span><span class="hljs-code">    chunk_time: 0.028065</span>   chunk_index: PRIMARYlower_boundary: 1847upper_boundary: 9225<span class="hljs-code">      this_crc: 4f43a2</span><span class="hljs-code">      this_cnt: 7336</span><span class="hljs-code">    master_crc: 9235f7a2</span><span class="hljs-code">    master_cnt: 7379</span><span class="hljs-code">            ts: 2015-06-16 17:00:31</span></code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li><li>6</li><li>7</li><li>8</li><li>9</li><li>10</li><li>11</li><li>12</li><li>13</li><li>14</li></ul>

一共有8条记录,这8张表数据不一致。 大概能看出来缺少了多少数据等。

修复不一致数据

修复不一致数据使用pt-table-sync 工具,使用pt-table-checksum工具的结果。不过这里还是有些坑。在修复之前最好把主mysql数据备份一下,因为会对主库有些写操作,有一点风险。

主库服务器执行

<code class="hljs r has-numbering">[root@localhost portal]<span class="hljs-comment"># pt-table-sync --execute --replicate radius.checksums --sync-to-master h="192.168.1.98",P=3306,u="checksums",p="slavecheck" --ignore-tables radacct,django_session</span>DBI connect(<span class="hljs-string">';host=124.88.52.100;port=3306;mysql_read_default_group=client'</span>,<span class="hljs-string">'checksums'</span>,<span class="hljs-keyword">...</span>) failed: Access denied <span class="hljs-keyword">for</span> user <span class="hljs-string">'checksums'</span>@<span class="hljs-string">'124.88.52.100'</span> (using password: YES) at /usr/local/bin/pt-table-sync line <span class="hljs-number">2220</span>但是直接用mysql连接就没问题</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li></ul>

最后查了下文档,发现还是用户权限的问题。
从库操作

<code class="hljs lasso has-numbering">mysql<span class="hljs-subst">></span> GRANT <span class="hljs-literal">all</span> <span class="hljs-keyword">ON</span> radius<span class="hljs-built_in">.</span><span class="hljs-subst">*</span> <span class="hljs-keyword">TO</span> <span class="hljs-string">'checksums'</span>@<span class="hljs-string">'192.168.1.100'</span>;Query OK, <span class="hljs-number">0</span> <span class="hljs-keyword">rows</span> affected (<span class="hljs-number">0.00</span> sec)mysql<span class="hljs-subst">></span> flush privileges;Query OK, <span class="hljs-number">0</span> <span class="hljs-keyword">rows</span> affected (<span class="hljs-number">0.00</span> sec)</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ul>

主库操作

<code class="hljs lasso has-numbering">mysql<span class="hljs-subst">></span> GRANT <span class="hljs-literal">all</span> <span class="hljs-keyword">ON</span> radius<span class="hljs-built_in">.</span><span class="hljs-subst">*</span> <span class="hljs-keyword">TO</span> <span class="hljs-string">'checksums'</span>@<span class="hljs-string">'192.168.1.100'</span>;Query OK, <span class="hljs-number">0</span> <span class="hljs-keyword">rows</span> affected (<span class="hljs-number">0.00</span> sec)mysql<span class="hljs-subst">></span> flush privileges;Query OK, <span class="hljs-number">0</span> <span class="hljs-keyword">rows</span> affected (<span class="hljs-number">0.00</span> sec)</code><ul style="display: block;" class="pre-numbering"><li>1</li><li>2</li><li>3</li><li>4</li><li>5</li></ul>

新增增删改查权限其实就够了 ,我这偷懒下。。

错误基本解决完了

修复数据

先修复一个不重要的表来实验下(主库操作)

<code class="hljs brainfuck has-numbering"><span class="hljs-comment">pt</span><span class="hljs-literal">-</span><span class="hljs-comment">table</span><span class="hljs-literal">-</span><span class="hljs-comment">sync</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">execute</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">replicate</span> <span class="hljs-comment">radius</span><span class="hljs-string">.</span><span class="hljs-comment">checksums</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">sync</span><span class="hljs-literal">-</span><span class="hljs-comment">to</span><span class="hljs-literal">-</span><span class="hljs-comment">master</span> <span class="hljs-comment">h=192</span><span class="hljs-string">.</span><span class="hljs-comment">168</span><span class="hljs-string">.</span><span class="hljs-comment">1</span><span class="hljs-string">.</span><span class="hljs-comment">98</span><span class="hljs-string">,</span><span class="hljs-comment">P=3306</span><span class="hljs-string">,</span><span class="hljs-comment">u=checksums</span><span class="hljs-string">,</span><span class="hljs-comment">p="slavecheck"</span>  <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">tables</span> <span class="hljs-comment">account_smslog</span><span class="hljs-string">,</span><span class="hljs-comment">radcheck</span> <span class="hljs-literal">-</span><span class="hljs-literal">-</span><span class="hljs-comment">print</span> </code><ul style="display: block;" class="pre-numbering"><li>1</li></ul>

修复完成在执行一次check 主库操作

<code class="hljs lasso has-numbering">pt<span class="hljs-attribute">-table</span><span class="hljs-attribute">-checksum</span> h<span class="hljs-subst">=</span><span class="hljs-string">'192.168.1.100'</span>,u<span class="hljs-subst">=</span><span class="hljs-string">'checksums'</span>,p<span class="hljs-subst">=</span><span class="hljs-string">'slavecheck'</span>,P<span class="hljs-subst">=</span><span class="hljs-number">3306</span> <span class="hljs-attribute">-d</span> radius <span class="hljs-subst">--</span>nocheck<span class="hljs-attribute">-replication</span><span class="hljs-attribute">-filters</span> <span class="hljs-subst">--</span>replicate<span class="hljs-subst">=</span>radius<span class="hljs-built_in">.</span>checksums</code><ul style="display: block;" class="pre-numbering"><li>1</li></ul>

在从库mysql中检查下

<code class="hljs vbnet has-numbering">mysql> <span class="hljs-keyword">select</span> * <span class="hljs-keyword">from</span> radius.checksums <span class="hljs-keyword">where</span> master_cnt <> this_cnt <span class="hljs-keyword">OR</span> master_crc <> this_crc <span class="hljs-keyword">OR</span> ISNULL(master_crc) <> ISNULL(this_crc) \G</code><ul style="display: block;" class="pre-numbering"><li>1</li></ul>

的确少了2张表,说明已经修复好了

接着把其他表修复,然后检查下是否有问题就OK了。

小结

这里主要的问题就是
1 脚本在那里执行(都是在主库服务器,从库只是检查下结果)
2 怎么建立用户,用户应该给予怎样的权限

声明:
本文出自 “orangleliu笔记本” 博客,转载请务必保留此出处http://blog.csdn.net/orangleliu/article/details/46532215 作者orangleliu采用署名-非商业性使用-相同方式共享协议

版权声明:本文为orangleliu(http://blog.csdn.net/orangleliu/)原创文章,文章转载请声明。

0 0
原创粉丝点击