MySQL手动注册binlog文件造成主从异常的原因

吾爱主题 阅读:230 2024-04-01 23:50:24 评论:0

一、问题来源

有一个朋友@水米田 问我,基于POSITION的主从。他做了如下的操作

将备份的一些binlog文件加入到了目录中
修改index文件,加入了这些binlog文件
flush binary logs
然后整个主从环境大量延迟。

二、朋友的测试

下面是另外一个朋友@徐晨亮的测试:

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 master上: (root:db1@xucl:10:30:22)[(none)]> show binary logs; + ---------------------+-----------+ | Log_name      | File_size | + ---------------------+-----------+ | mysql-binlog.000031 |   1019 | | mysql-binlog.000032 |    424 | | mysql-binlog.000033 |    244 | | mysql-binlog.000034 |   2332 | | mysql-binlog.000035 |   2134 | | mysql-binlog.000036 |  845915 | | mysql-binlog.000037 |   11735 | | mysql-binlog.000038 |    284 | | mysql-binlog.000039 |    284 | | mysql-binlog.000040 |    284 | | mysql-binlog.000041 |    284 | | mysql-binlog.000042 |    234 | + ---------------------+-----------+ 12 rows in set (0.00 sec) (root:db1@xucl:10:30:34)[(none)]> purge binary logs to 'mysql-binlog.000039' ; Query OK, 0 rows affected (0.00 sec) (root:db1@xucl:10:30:49)[(none)]> show binary logs; + ---------------------+-----------+ | Log_name      | File_size | + ---------------------+-----------+ | mysql-binlog.000039 |    284 | | mysql-binlog.000040 |    284 | | mysql-binlog.000041 |    284 | | mysql-binlog.000042 |    234 | + ---------------------+-----------+ 4 rows in set (0.00 sec) 执行插入数据 (root:db1@xucl:10:31:23)[xucl]> insert into t values (9,9); 将备份的binlog拷贝回原先的目录并修改 index 文件进行注册 [root@izbp12nspj47ypto9t6vyez logs]# ll 总用量 884 -rw-r ----- 1 mysql mysql  1019 5月 20 22:03 mysql-binlog.000031 -rw-r ----- 1 mysql mysql  424 5月 20 22:03 mysql-binlog.000032 -rw-r ----- 1 mysql mysql  244 5月 20 22:03 mysql-binlog.000033 -rw-r ----- 1 mysql mysql  2332 5月 20 22:03 mysql-binlog.000034 -rw-r ----- 1 mysql mysql  2134 5月 20 22:03 mysql-binlog.000035 -rw-r ----- 1 mysql mysql 845915 5月 20 22:03 mysql-binlog.000036 -rw-r ----- 1 mysql mysql 11735 5月 20 22:05 mysql-binlog.000037 -rw-r ----- 1 mysql mysql  284 5月 20 22:06 mysql-binlog.000038 -rw-r ----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000039 -rw-r ----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000040 -rw-r ----- 1 mysql mysql  284 5月 21 10:28 mysql-binlog.000041 -rw-r ----- 1 mysql mysql  491 5月 21 10:31 mysql-binlog.000042 -rw-r ----- 1 mysql mysql  204 5月 21 10:30 mysql-binlog.index 主库flush binary logs (root:db1@xucl:10:32:51)[(none)]> flush binary logs; Query OK, 0 rows affected (0.01 sec) (root:db1@xucl:10:32:57)[(none)]> show binary logs; + ---------------------+-----------+ | Log_name      | File_size | + ---------------------+-----------+ | mysql-binlog.000031 |   1019 | | mysql-binlog.000032 |    424 | | mysql-binlog.000033 |    244 | | mysql-binlog.000034 |   2332 | | mysql-binlog.000035 |   2134 | | mysql-binlog.000036 |  845915 | | mysql-binlog.000037 |   11735 | | mysql-binlog.000038 |    284 | | mysql-binlog.000039 |    284 | | mysql-binlog.000040 |    284 | | mysql-binlog.000041 |    284 | | mysql-binlog.000042 |    541 | | mysql-binlog.000043 |    234 | + ---------------------+-----------+ 13 rows in set (0.00 sec) 此时,slave报错如下: (root:db1@xucl:10:31:05)[(none)]> show slave status\G *************************** 1. row ***************************          Slave_IO_State:           Master_Host: 127.0.0.1           Master_User: repl           Master_Port: 3306          Connect_Retry: 60         Master_Log_File: mysql-binlog.000035       Read_Master_Log_Pos: 194          Relay_Log_File: izbp12nspj47ypto9t6vyez-relay-bin.000011          Relay_Log_Pos: 373      Relay_Master_Log_File: mysql-binlog.000035         Slave_IO_Running: No        Slave_SQL_Running: Yes         Replicate_Do_DB:       Replicate_Ignore_DB:        Replicate_Do_Table:      Replicate_Ignore_Table:     Replicate_Wild_Do_Table:   Replicate_Wild_Ignore_Table:            Last_Errno: 0            Last_Error:           Skip_Counter: 0       Exec_Master_Log_Pos: 194         Relay_Log_Space: 648         Until_Condition: None          Until_Log_File:          Until_Log_Pos: 0        Master_SSL_Allowed: No        Master_SSL_CA_File:        Master_SSL_CA_Path:         Master_SSL_Cert:        Master_SSL_Cipher:          Master_SSL_Key:      Seconds_Behind_Master: NULL Master_SSL_Verify_Server_Cert: No          Last_IO_Errno: 1236          Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'Cannot replicate GTID-transaction when @@GLOBAL.GTID_MODE = OFF, at file /storage/single/mysql3306/logs/mysql-binlog.000035, position 194.; the first event ' mysql-binlog.000039 ' at 234, the last event read from ' /storage/single/mysql3306/logs/mysql-binlog.000035 ' at 259, the last byte read from ' /storage/single/mysql33'          Last_SQL_Errno: 0          Last_SQL_Error:   Replicate_Ignore_Server_Ids:         Master_Server_Id: 3306           Master_UUID: e8bdf01a-c79b-11e8-82b3-00163e088352         Master_Info_File: mysql.slave_master_info            SQL_Delay: 0       SQL_Remaining_Delay: NULL     Slave_SQL_Running_State: Slave has read all relay log; waiting for more updates        Master_Retry_Count: 86400           Master_Bind:     Last_IO_Error_Timestamp: 190521 10:32:57     Last_SQL_Error_Timestamp:          Master_SSL_Crl:        Master_SSL_Crlpath:        Retrieved_Gtid_Set:        Executed_Gtid_Set: 4c423515-6661-11e9-b767-00163e088352:1-7, e8bdf01a-c79b-11e8-82b3-00163e088352:1-57192          Auto_Position: 0       Replicate_Rewrite_DB:           Channel_Name:        Master_TLS_Version: 1 row in set (0.00 sec) 从slave上的报错来看,在主库flush binary logs后,从库又读取注册的binlog并且又apply了

三、现象说明

从整个测试来看,MySQL视乎将手动注册的文件进行了传输和应用。报错是因为这个库以前是GITD_MODE=ON的,但是测试的时候已经关闭了GTID_MODE,改为了POSITION的模式,这个报错是因为DUMP线程会进行检测:

这个图来自我新写的一个系列(暂时还没发布,大概年底才能写好)。不管怎么说,DUMP线程已经在开始传输老的binlog文件了。那么原因是什么呢?下面我们将进行解释。

四、flush binary logs对binlog的操作

flush binary logs 将包含如下操作:

  • 获取新的binlog文件名字
  • 关闭旧的binlog
  • 关闭index file
  • 打开index file
  • 打开新的binlog
  • 将新的binlog加入到indexfile

具体可以参考函数MYSQL_BIN_LOG::new_file_impl。其中要说明一下获取新的binlog文件名字是通过函数find_uniq_filename实现的,其中包含如下代码:

?
1 2 3 4 5 6 7 8 9 10 11 file_info= dir_info->dir_entry;   for (i= dir_info->number_off_files ; i-- ; file_info++)   {    if (strncmp(file_info->name, start, length) == 0 &&    is_number(file_info->name+length, &number,0))    {     set_if_bigger(max_found, number);    }   } ...   *next= (need_next || max_found == 0) ? max_found + 1 : max_found;

大概意思就是扫描index file文件里面的binlog文件,获取其序号最高的一个,然后加1。栈帧如下:

?
1 2 3 4 5 6 7 8 9 10 11 12 #0 find_uniq_filename (name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", next=0x7fffec5ec678, need_next=true)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:3679 #1 0x000000000187d208 in generate_new_log_name (new_name=0x7fffec5ec6d0 "/mysqldata/mysql3340/log/binlog", new_ext=0x0,    log_name=0x7ffedc011520 "/mysqldata/mysql3340/log/binlog" , is_binlog= true ) at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:3767 #2 0x0000000001884fdb in MYSQL_BIN_LOG::new_file_impl (this=0x2e83640, need_lock_log=false, extra_description_event=0x0)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:7175 #3 0x0000000001884cbb in MYSQL_BIN_LOG::new_file_without_locking (this=0x2e83640, extra_description_event=0x0)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:7099 #4 0x0000000001886b6b in MYSQL_BIN_LOG::rotate (this=0x2e83640, force_rotate=true, check_purge=0x7fffec5ecbfb)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:7775 #5 0x0000000001886d53 in MYSQL_BIN_LOG::rotate_and_purge (this=0x2e83640, thd=0x7ffedc000b90, force_rotate=true)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:7846

因此即便我们手动修改了index file,flush binary logs却不会有问题,因为它实际扫描了index file文件。
同时我们也看到flush binary logs重新加载了index file,这个时候手动修改的index file就生效了,使用show binary logs就能查看到你加入的文件了。

五、主从问题的产生

binlog切换后,DUMP线程也需要读取下一个binlog文件。我们来看看它是怎么确认读取哪一个文件的。

在函数sender.run()中可以找到循环每个binlog文件的代码。下面一句是寻找下一个binlog文件:

?
1 int error= mysql_bin_log.find_next_log(&m_linfo, 0);

mysql_bin_log.find_next_log 包含的代码:

?
1 2 3 4 5 6 7 8 9 my_b_seek(&index_file, linfo->index_file_offset); //进行偏移量偏移     linfo->index_file_start_offset= linfo->index_file_offset;   length=my_b_gets(&index_file, fname, FN_REFLEN)); //读取文件名字 ...    if (normalize_binlog_name(full_fname, fname, is_relay_log)) ...     linfo->index_file_offset= my_b_tell(&index_file); //偏移量从新记录以备下一次使用

我们能够看到DUMP线程并没有实际扫描整个index文件,而是通过一个index文件的偏移量进行读取。如果手动修改index文件那么偏移量就出现了错乱。因此DUMP发送的下一个文件将是不确定的。因此出现了发送手动注册的binlog文件给从库的现象,这种情况下可能出现下面情况:

  1. 如果是GTID_MODE 关闭,使用POSITION那么这种情况一定报错,比如重复的行。
  2. 如果是GTID_MODE 和AUTO_POSITION=1,那么DUMP线程会进行GTID的过滤不发送,因为Event不发送所以延迟会持续一段时间为0。
  3. 如果是GTID_MODE 和AUTO_POSITION=0,那么SQL线程应用GITD_EVENT的时候会进行过滤,延迟可能出现很大的情况。

尽管GTID可能可以屏蔽这种问题,但是DUMP线程已经在考虑发送老的binlog文件了,这是不合适的。

六、purge binary logs能够维护这个偏移量

为什么purge binary logs不会出现问题呢,因为在purge binary logs的语句下,会维护这个偏移量如下:

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 virtual void operator()(THD *thd) {   LOG_INFO* linfo;   mysql_mutex_lock(&thd->LOCK_thd_data);   if ((linfo= thd->current_linfo)) //b binlog.cc:2829   {    /*     Index file offset can be less that purge offset only if     we just started reading the index file. In that case     we have nothing to adjust.    */    if (linfo->index_file_offset < m_purge_offset)     linfo->fatal = (linfo->index_file_offset != 0);    else     linfo->index_file_offset -= m_purge_offset;   }   mysql_mutex_unlock(&thd->LOCK_thd_data);

我们可以看到linfo->index_file_offset -= m_purge_offset;这样一个语句。下面是栈帧:

?
1 2 3 4 5 6 7 8 9 10 11 12 13 #0 Adjust_offset::operator() (this=0x7fffec5ec720, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2831 #1 0x0000000000eef320 in Do_THD::operator() (this=0x7fffec5ec6a0, thd=0x7ffedc000be0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/mysqld_thd_manager.cc:46 #2 0x0000000000eefa0f in std::for_each<THD**, Do_THD> (__first=0x3003358, __last=0x3003368, __f=...)    at /usr/lib/gcc/x86_64-redhat-linux/4 .4.7/../../../.. /include/c ++ /4 .4.7 /bits/stl_algo .h:4200 #3 0x0000000000eeefc0 in Global_THD_manager::do_for_all_thd (this=0x3003340, func=0x7fffec5ec720)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/mysqld_thd_manager .cc:273 #4 0x000000000187ae0a in adjust_linfo_offsets (purge_offset=0) at /mysqldata/percona-server-locks-detail-5.7.22/sql/binlog.cc:2873 #5 0x0000000001883239 in MYSQL_BIN_LOG::remove_logs_from_index (this=0x2e83640, log_info=0x7fffec5ec7d0, need_update_threads=true)    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:6352 #6 0x0000000001883676 in MYSQL_BIN_LOG::purge_logs (this=0x2e83640, to_log=0x7fffec5eca80 "/mysqldata/mysql3340/log/binlog.000001", included=false,    need_lock_index= true , need_update_threads= true , decrease_log_space=0x0, auto_purge= false ) at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:6469 #7 0x000000000187b4b5 in purge_master_logs (thd=0x7ffee0000c00, to_log=0x7ffee0006600 "binlog.000001")    at /mysqldata/percona-server-locks-detail-5 .7.22 /sql/binlog .cc:3127

七、测试POSITION模式下的报错

1、 环境

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 mysql> show binary logs; + ---------------+-----------+ | Log_name   | File_size | + ---------------+-----------+ | binlog.000001 |    198 | | binlog.000002 |    154 | + ---------------+-----------+ 2 rows in set (0.00 sec)   mysql> show create table testcp \G; *************************** 1. row ***************************      Table : testcp Create Table : CREATE TABLE `testcp` (   `id` int (11) NOT NULL ,   PRIMARY KEY (`id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 1 row in set (0.00 sec)   ERROR: No query specified

2、执行一个语句

主库:

?
1 2 3 4 5 6 7 8 9 10 11 mysql> insert into testcp values (20); Query OK, 1 row affected (0.43 sec)   mysql> select * from testcp; + ----+ | id | + ----+ | 10 | | 20 | + ----+ 2 rows in set (0.01 sec)

从库:

?
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 mysql> show slave status \G; *************************** 1. row ***************************          Slave_IO_State: Waiting for master to send event           Master_Host: 192.168.99.41           Master_User: repl           Master_Port: 3340          Connect_Retry: 60         Master_Log_File: binlog.000002       Read_Master_Log_Pos: 417          Relay_Log_File: relay.000004          Relay_Log_Pos: 624      Relay_Master_Log_File: binlog.000002         Slave_IO_Running: Yes        Slave_SQL_Running: Yes ... mysql> select * from testcp; + ----+ | id | + ----+ | 10 | | 20 | + ----+

这个时候DUMP线程index file偏移指针如下:

3、主库执行purge binary logs

做之前拷贝原有binlog.000001为binlog.000001bak因为等会要拷贝回去

?
1 2 3 4 5 6 7 8 9 10 mysql> purge binary logs to 'binlog.000002' ; Query OK, 0 rows affected (3.14 sec)   mysql> show binary logs; + ---------------+-----------+ | Log_name   | File_size | + ---------------+-----------+ | binlog.000002 |    417 | + ---------------+-----------+ 1 row in set (0.00 sec)

因为purge binary logs命令会维护偏移指针,这个时候DUMP线程的index file偏移指针如下:

4、手动更改

将binlog.000001bak拷贝为binlog.000001,然后修改index file将binlog.000001加入回去,然后flush binary logs如下:

?
1 2 3 4 5 6 7 8 9 10 11 mysql> flush binary logs; Query OK, 0 rows affected (0.15 sec)   mysql> show binary logs; + ---------------+-----------+ | Log_name   | File_size | + ---------------+-----------+ | binlog.000001 |     198 | | binlog.000002 |    461 | + ---------------+-----------+ 2 rows in set (0.00 sec)

手动完成这个工作并不会维护DUMP线程的index file偏移指针,因此如下:

就这样DUMP线程重新发送了一次binlog.000002的内容,POSITION的从库只能报错了如下:

?
1 2 3 4 5 6 7 8 9 10 mysql> select * from replication_applier_status_by_worker \G *************************** 1. row ***************************       CHANNEL_NAME:        WORKER_ID: 1        THREAD_ID: NULL      SERVICE_STATE: OFF LAST_SEEN_TRANSACTION: ANONYMOUS    LAST_ERROR_NUMBER: 1062    LAST_ERROR_MESSAGE: Worker 1 failed executing transaction 'ANONYMOUS' at master log binlog.000002, end_log_pos 386; Could not execute Write_rows event on table testmts.testcp; Duplicate entry '20' for key 'PRIMARY' , Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log binlog.000002, end_log_pos 386   LAST_ERROR_TIMESTAMP: 2019-05-21 14:46:26

八、总结

这种操作非常不规范,如果实在要这么做考虑如下步骤:

  • 关闭全部的从库
  • 手动注册binlog文件,修改index 文件
  • flush binary logs
  • 启动从库

这种情况下将会重新初始化DUMP的偏移量指针,应该是没有问题的。还是尽量不要考虑这么做。

以上就是MySQL手动注册binlog文件造成主从异常的原因的详细内容,更多关于MySQL 主从异常的资料请关注服务器之家其它相关文章!

原文链接:https://www.jianshu.com/p/7fa3d0f7458c?utm_campaign=maleskine&utm_content=note&utm_medium=seo_notes&utm_source=recommendation

可以去百度分享获取分享代码输入这里。
声明

1.本站遵循行业规范,任何转载的稿件都会明确标注作者和来源;2.本站的原创文章,请转载时务必注明文章作者和来源,不尊重原创的行为我们将追究责任;3.作者投稿可能会经我们编辑修改或补充。

【腾讯云】云服务器产品特惠热卖中
搜索
标签列表
    关注我们

    了解等多精彩内容