索引问题导致mysql复制lock timeout

遇到一个非常郁闷的问题。MySQL复制备机SQL线程执行一个语句报错:
Error ‘Lock wait timeout exceeded; try restarting transaction’ on query. Default database: ‘brmms2’. Query: ‘UPDATE brmms_user SET gmt_modified = now() , last_login_ip = ‘58.20.77.164’ , last_login_time = now() WHERE member_id = ‘yanyazhang”
这个语句就算我单独提出来执行也会超时。show innodb status的状态如下:
root@localhost : (none) 08:12:46> show innodb status\G
*************************** 1. row ***************************
Type: InnoDB
Name:
Status:
=====================================
100501 20:12:51 INNODB MONITOR OUTPUT
=====================================
Per second averages calculated from the last 8 seconds
———-
SEMAPHORES
———-
OS WAIT ARRAY INFO: reservation count 22, signal count 22
Mutex spin waits 0, rounds 239, OS waits 8
RW-shared spins 20, OS waits 10; RW-excl spins 3, OS waits 3
————
TRANSACTIONS
————
Trx id counter 0 1835507099
Purge done for trx’s n:o < 0 1835505963 undo n:o < 0 0
History list length 44
LIST OF TRANSACTIONS FOR EACH SESSION:
—TRANSACTION 0 0, not started, OS thread id 12
MySQL thread id 801, query id 3585 localhost root
show innodb status
—TRANSACTION 0 1835507098, ACTIVE 64 sec, OS thread id 17 starting index read
mysql tables in use 1, locked 1
LOCK WAIT 3 lock struct(s), heap size 1216, 2 row lock(s)
MySQL thread id 1214, query id 3536 Updating
UPDATE brmms_user SET gmt_modified = now() , last_login_ip = ‘58.20.77.164’ , last_login_time = now() WHERE member_id = ‘yanyazhang’
——- TRX HAS BEEN WAITING 64 SEC FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 10344 page no 3281 n bits 192 index PRIMARY of table brmms2.brmms_user trx id 0 1835507098 lock_mode X locks rec but not gap waiting
Record lock, heap no 121 PHYSICAL RECORD: n_fields 17; compact format; info bits 0
0: len 8; hex 0000000005fd14a7; asc ;; 1: len 6; hex 00006d6795f9; asc mg ;; 2: len 7; hex 00000000c30c2f; asc /;; 3: len 10; hex 79616e79617a68616e67; asc yanyazhang;; 4: len 4; hex 00000000; asc ;; 5: len 4; hex 00000000; asc ;; 6: len 4; hex 00000000; asc ;; 7: len 4; hex 00000000; asc ;; 8: len 3; hex 8fb48e; asc ;; 9: len 12; hex 35382e32302e37372e313634; asc 58.20.77.164;; 10: len 4; hex 4bc59f9c; asc K ;; 11: len 4; hex 4bc59f8f; asc K ;; 12: len 4; hex 4bc59f9c; asc K ;; 13: len 4; hex 80000000; asc ;; 14: len 4; hex 80000000; asc ;; 15: len 4; hex 80000000; asc ;; 16: SQL NULL;

——————
—TRANSACTION 0 1835505145, ACTIVE (PREPARED) 5527 sec, OS thread id 0
1 lock struct(s), heap size 368, 1 row lock(s), undo log entries 1
——–
FILE I/O
——–
I/O thread 0 state: waiting for i/o request (insert buffer thread)
I/O thread 1 state: waiting for i/o request (log thread)
I/O thread 2 state: waiting for i/o request (read thread)
I/O thread 3 state: waiting for i/o request (write thread)
Pending normal aio reads: 0, aio writes: 0,
ibuf aio reads: 0, log i/o’s: 0, sync i/o’s: 0
Pending flushes (fsync) log: 0; buffer pool: 0
16300 OS file reads, 336 OS file writes, 268 OS fsyncs
0.00 reads/s, 0 avg bytes/read, 0.00 writes/s, 0.00 fsyncs/s
————————————-
INSERT BUFFER AND ADAPTIVE HASH INDEX
————————————-
Ibuf: size 1, free list len 104, seg size 106,
27 inserts, 27 merged recs, 27 merges
Hash table size 35401603, node heap has 7 buffer(s)
0.00 hash searches/s, 0.00 non-hash searches/s

LOG

Log sequence number 102 3346391724
Log flushed up to 102 3346391724
Last checkpoint at 102 3346391724
0 pending log writes, 0 pending chkp writes
198 log i/o’s done, 0.00 log i/o’s/second
———————-
BUFFER POOL AND MEMORY
———————-
Total memory allocated 18653842407; in additional pool allocated 16776704
Dictionary memory allocated 4059160
Buffer pool size 1048576
Free buffers 1034004
Database pages 14565
Modified db pages 0
Pending reads 0
Pending writes: LRU 0, flush list 0, single page 0
Pages read 14565, created 0, written 146
0.00 reads/s, 0.00 creates/s, 0.00 writes/s
Buffer pool hit rate 1000 / 1000
————–
ROW OPERATIONS
————–
0 queries inside InnoDB, 0 queries in queue
1 read views open inside InnoDB
Main thread id 9, state: waiting for server activity
Number of rows inserted 52, updated 36, deleted 3, read 364
0.00 inserts/s, 0.00 updates/s, 0.00 deletes/s, 0.00 reads/s
—————————-
END OF INNODB MONITOR OUTPUT
============================

1 row in set, 1 warning (0.00 sec)
这里比较奇怪的地方在于:这个备机上就一个SQL线程在跑,其他的任何东西都没有跑,结果它还被锁超时杀掉了。上面的show innodb status显示
——————
—TRANSACTION 0 1835505145, ACTIVE (PREPARED) 5527 sec, OS thread id 0
1 lock struct(s), heap size 368, 1 row lock(s), undo log entries 1
这个线程是干吗的,难道是它lock住了update语句?

我先后试过如下的方法想去恢复:
1、重启数据库。SQL slave自己执行不过去。手工执行不过去。
2、重启数据库并修改为skip-slave-start模式,手工执行未果。
3、alter table brmms_user drop index idx_brmms_user_member_id, add index idx_brmms_user_Mem_id (member_id);报错lock timeout.
4、stop slave;想利用mysqldump -uroot -p –opt –add-drop-table –default-character-set=utf8 –master-data=2 –single-transaction –complete-insert –log-error=/tmp/mysqldump20100501.log brmms2 brmms_user>brmms_user.sql 或者mysqldump -uroot -p –default-character-set=utf8 brmms2 brmms_user>brmms_user.sql导出数据,然后重新导入。结果在mysqldump的时候就导不出来数据(能够导出表结构,错误日志里面也没有任何信息)
5、创建临时表tmp_pickup_brmms_user,然后insert into temp_brmms_user_pickup select * from brmms_user;。报错lock timeout.

最后通过select into outfile的方式解决了这个问题。步骤如下:
1、确认slave是停掉的。 并且修改SQL_LOG_BIN来避免下面的语句记录到binlog中(如果对应的主机没有这个问题,下面的这些操作完全没有必要通过replication传递到主机执行)。
stop slave;
SET SQL_LOG_BIN=0;
2、导出brmms_user表的所有数据。注意/tmp目录需要有足够的空间。
select * from brmms_user into outfile ‘/tmp/brmms_user.txt’;
3、创建临时表。该表的结构要求跟brmms_user表结构一样。
CREATE TABLE tmp_pickup_brmms_user ( … );
4、将brmms_user的数据导入到tmp_pickup_brmms_user。
load data local infile ‘/tmp/brmms_user.txt’ into table tmp_pickup_brmms_user;
5、改名。将brmms_user改到另外的名字,将临时表名字改为brmms_user
alter table brmms_user rename tmp_pickup_brmms_user_old;
alter table tmp_pickup_brmms_user rename brmms_user;
6、重新启动复制。
start slave;
7、收尾工作:必要的话,删除tmp_pickup_brmms_user_old表。SET SQL_LOG_BIN=1;来恢复session的binlog记录。
drop table tmp_pickup_brmms_user_old;
SET SQL_LOG_BIN=1;

may you success.

今天我们又检查了一下那个有问题的表。发现通过主键更新对应记录也是locktimeout。更新它之前和之后的主键都没有问题。突然回想起这台机器是通过另外一台机器的solaris的snapshot生产的快照,然后直接传到这台机器的。之后就直接启动起来用了。假设

这里在生成snapshot的时候,MySQL的数据(索引)在物理层面上是不一致的。这样的不一致是否是导致MySQL的这个问题的原因列。有待进一步的确认。

这里我们也知道了solaris的snapshot还是有风险的,如果以后还有备机的搭建,可以考虑先停止MySQL,然后才做snapshot。保证数据的物理一致性。

may you success

14 Comments

  1. Kasey Mcconnel说道:

    When I basically like to research more information of this nature I generally have a look at article banks and key in my search, congrats on your fantastic article.

  2. Great information over again! Thanks;)

  3. MP3 Downloads说道:

    I just couldnt leave your website before telling you that we really enjoyed the quality information you offer to your visitors… Will be back often to check up on new stuff you post!

  4. Cherise Comans说道:

    Nice share. I really like your blog. :)

  5. Digna Ocejo说道:

    Great share. I really adore your blog. :)

  6. Diuretic foods说道:

    Such kind of this article is our need, it can give us help from different aspect for different people, thx a alot and good luck.

  7. sim kort说道:

    I wish I had the same enthusiasm to make a web site like this. You should be proud of yourself for making this!

  8. Elinor Orndorff说道:

    Nice share. I really like your blog. :)

  9. sinhala films说道:

    Thank you for taking the time to write. Good to know info. Thanks for the interesting read, which helps to keep me busy at work

  10. Gratis Sim Kort说道:

    My buddy was talking about this web site today and I gotta say it was really very good things. Comming online here makes me agree with him, keep up the fantastic work!

  11. blackhat说道:

    I really like your web page

  12. Lee Lovorn说道:

    Thank you so much for sharing. It is becoming increasingly rare to find quality material. Appears lots of web blogs are putting out nothing unique – just copied content or rehashed rss feeds. Your work is appreciated.

  13. Ronna Rusboldt说道:

    Just desire to say your article is as surprising. The clearness in your post is simply cool and i could assume you’re an expert on this subject. Fine with your permission let me to grab your feed to keep up to date with forthcoming post. Thanks a million and please carry on the rewarding work.

Leave a Reply