《MySQL Waiting for table metadata lock故障分析》要點:
本文介紹了MySQL Waiting for table metadata lock故障分析,希望對您有用。如果有疑問,可以聯系我們。
源起
線上執行ALTER時,通過show processlist查看到出現Waiting for table metadata lock?,導致后面的查詢都無法執行.
5217122 | create_table_04 | 172.100.207.148:31291 | finance | Query | 1829 | Waiting for table metadata lock | ALTER TABLE `Pay` MODIFY COLUMN `pay` smallint(6) NULL DEFAULT 0 COMMENT '付??' 5217155 | bx_live_dml | 172.100.210.4:34730 | finance | Prepare | 1714 | Waiting for table metadata lock | select * from Pay where (pay =4 or pay =8) and projectId ='CSZY0'
由于當時直接把ALTER給kill了,所以基本沒有現場,所以下面模擬什么情況MySQL產生這種事故.
MySQL元數據鎖
MySQL DBA對于Waiting for table metadata lock肯定不會陌生,一般都是進行alter操作時被堵住了,導致了我們在show processlist 時,看到線程的狀態是在等metadata lock.
為了在并發環境下維護表元數據的數據一致性,在表上有活動事務(顯式或隱式)的時候,不可以對元數據進行寫入操作.因此從MySQL5.5版本開始引入了MDL鎖(metadata lock),來保護表的元數據信息,用于解決或者保證DDL操作與DML操作之間的一致性.對于引入MDL,其主要解決了2個問題,一個是事務隔離問題,比如在可重復隔離級別下,會話A在2次查詢期間,會話B對表結構做了修改,兩次查詢結果就會不一致,無法滿足可重復讀的要求;另外一個是數據復制的問題,比如會話A執行了多條更新語句期間,另外一個會話B做了表結構變更并且先提交,就會導致slave在重做時,先重做alter,再重做update時就會出現復制錯誤的現象.
所以在對表進行上述操作時,如果表上有活動事務(未提交或回滾),請求寫入的會話會等待在Metadata lock wait .例如下面的這種情形:
若沒有MDL鎖的保護,則事務2可以直接執行DDL操作,并且導致事務1出錯,5.1版本即是如此.5.5版本加入MDL鎖就在于保護這種情況的發生,由于事務1開啟了查詢,那么獲得了MDL鎖,鎖的模式為SHARED_READ,事務2要執行DDL,則需獲得EXCLUSIVE鎖,兩者互斥,所以事務2需要等待.
注:支持事務的InnoDB引擎表和不支持事務的MyISAM引擎表,都會出現Metadata Lock Wait等待現象.一旦出現Metadata Lock Wait等待現象,后續所有對該表的訪問都會阻塞在該等待上,導致連接堆積,業務受影響.
引起Metadata lock wait出現的場景:Waiting for table metadata lock
場景一:當前有執行DML操作時執行ALTRE操作.
# SESSION A mysql> insert into sbtest2 select * from sbtest1; # SESSION B mysql> alter table sbtest2 add test1 int; //等待SESSION A執行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | 267 | root | localhost | sbtest | Query | 7 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table sbtest2 add test1 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec) # SESSION D mysql> select * from sbtest2 limit 10; //等待元數據鎖; # SESSION E mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ | 267 | root | localhost | sbtest | Query | 20 | Sending data | insert into sbtest2 select * from sbtest1 | | 271 | root | localhost | sbtest | Query | 13 | Waiting for table metadata lock | alter table sbtest2 add test1 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 308 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | select * from sbtest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+-------------------------------------------+ 4 rows in set (0.00 sec)
從上述例子可以看出,我們在執行DDL語句的時候得事先看一下,進程中是否已經存在某些DML語句占用了表的元數據鎖,這樣會導致DDL語句處于鎖等待狀態.一旦出現Waiting for table metadata lock等待現象,后續所有對該表的訪問都會阻塞在該等待上,包括讀操作,導致連接堆積,業務受影響.
場景二:當前有對表的長時間查詢或使用mysqldump/mysqlpump時,使用alter會被堵住.
# SESSION A mysql> select *,sleep(10) from sbtest2; # SESSION B mysql> alter table sbtest2 add test2 int; //等待SESSION A執行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ | 267 | root | localhost | sbtest | Query | 12 | User sleep | select *,sleep(10) from sbtest2 | | 271 | root | localhost | sbtest | Query | 8 | Waiting for table metadata lock | alter table sbtest2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 311 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from sbtest.sbtest2 limit 10 | +-----+------+-----------+--------+---------+------+---------------------------------+---------------------------------------+ 4 rows in set (0.00 sec)
場景三:顯示或者隱式開啟事務后未提交或回滾,比如查詢完成后未提交或者回滾,使用alter會被堵住.
# SESSION A mysql> begin; mysql> select * from sbtest2; # SESSION B mysql> alter table sbtest2 add test2 int; //等待SESSION A執行完; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | 267 | root | localhost | sbtest | Sleep | 36 | | NULL | | 271 | root | localhost | sbtest | Query | 30 | Waiting for table metadata lock | alter table sbtest2 add test2 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ 3 rows in set (0.00 sec)
場景四:表上有失敗的查詢事務,比如查詢不存在的列,語句失敗返回,但是事務沒有提交,此時alter仍然會被堵住.
# SESSION A mysql> begin; mysql> select error from sbtest2; ERROR 1054 (42S22): Unknown column 'error' in 'field list' # SESSION B mysql> alter table sbtest2 add test3 int; //等待SESSION A提交或回滾; # SESSION C mysql> show processlist; +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ | 267 | root | localhost | sbtest | Sleep | 7 | | NULL | | 271 | root | localhost | sbtest | Query | 3 | Waiting for table metadata lock | alter table sbtest2 add test3 int | | 272 | root | localhost | NULL | Query | 0 | starting | show processlist | | 311 | root | localhost | NULL | Sleep | 413 | | NULL | +-----+------+-----------+--------+---------+------+---------------------------------+-----------------------------------+ 4 rows in set (0.00 sec) # SESSION D mysql> select * from information_schema.innodb_trx; Empty set (0.00 sec)
其實SESSION A中的事務并未開啟,但是由于select獲取表元數據的語句,語法上是有效的,雖然執行失敗了,但是任然不會釋放元數據鎖,故而導致SESSION B的alter動作被阻塞.
通過SESSION D查看當前打開事務時,你會發現沒有,從而找不到原因.所以當出現這種場景時,如何判斷是哪個進程導致的呢,我們可以嘗試查看表performance_schema. events_statements_current,分析進程狀態來進行判斷.
mysql> select * from performance_schema. events_statements_current\G *************************** 1. row *************************** THREAD_ID: 293 EVENT_ID: 32 END_EVENT_ID: 32 EVENT_NAME: statement/sql/select SOURCE: socket_connection.cc:101 TIMER_START: 212721717099954000 TIMER_END: 212721717213807000 TIMER_WAIT: 113853000 LOCK_TIME: 0 SQL_TEXT: select error from sbtest2 DIGEST: 0bbb2d5d1be45e77debea68111264885 DIGEST_TEXT: SELECT ERROR FROM `sbtest2` CURRENT_SCHEMA: sbtest OBJECT_TYPE: NULL OBJECT_SCHEMA: NULL OBJECT_NAME: NULL OBJECT_INSTANCE_BEGIN: NULL MYSQL_ERRNO: 1054 RETURNED_SQLSTATE: 42S22 MESSAGE_TEXT: Unknown column 'error' in 'field list' ERRORS: 1
然后找到其sid, kill掉該session,也可以kill掉DDL所在的session解決可以解決此問題.
另外,測試時SESSION A要顯式開啟一個事務,否則查詢會隱式回滾結束,無法重現上面的場景.SESSION B執行alter后,沒有立即阻塞住,而是立馬開始copy to tmp table,這個過程結束后,才進行了MDL鎖等待.這怎么解釋呢,應該是執行alter操作主要分為創建臨時新表->插入老表的數據->臨時新表rename to老表三個步驟,在這種情況下,到最后一步才需要MDL鎖,所以copy過程中不會阻塞.由于沒有查詢在進行,而且查詢也沒有進入innodb層 (失敗返回),所以show processlist和information_schema.innodb_trx沒有可以參考的信息.
出現以上幾種情況時,這個時候如果進行如下操作就會引起MDL:
使用Profile分析場景三:顯示或者隱式開啟事務后未提交或回滾,比如查詢完成后未提交或者回滾,使用alter會被堵住
# SESSION A mysql> set profiling=on; mysql> begin; mysql> select * from sbtest.sbtest2 limit 1; # SESSION B mysql> set profiling=on; mysql> alter table sbtest.sbtest5 add test2 int; //等待SESSION A執行完; # SESSION C mysql> set profiling=on; mysql> select * from sbtest.sbtest2 limit 1; # SESSION D mysql> set profiling=on; mysql> show processlist; +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ | Id | User | Host | db | Command | Time | State | Info | +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ | 325 | root | localhost | NULL | Query | 25 | Waiting for table metadata lock | alter table sbtest.sbtest2 add test5 int | | 326 | root | localhost | NULL | Query | 3 | Waiting for table metadata lock | select * from sbtest.sbtest2 limit 1 | | 327 | root | localhost | NULL | Query | 0 | starting | show processlist | | 328 | root | localhost | NULL | Sleep | 50 | | NULL | +-----+------+-----------+------+---------+------+---------------------------------+------------------------------------------+ 4 rows in set (0.00 sec)
然后回滾SESSION A,等待SESSION B和SESSION C執行完,查看profile.
查看SESSION A
# SESSION A mysql> show profiles; +----------+-------------+------------------------------------------+ | Query_ID | Duration | Query | +----------+-------------+------------------------------------------+ | 1 | 42.81646375 | alter table sbtest.sbtest2 add test5 int | +----------+-------------+------------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show profile for query 1; +--------------------------------+----------+ | Status | Duration | +--------------------------------+----------+ | starting | 0.000069 | | checking permissions | 0.000004 | | checking permissions | 0.000004 | | init | 0.000005 | | Opening tables | 0.000128 | | setup | 0.000032 | | creating table | 0.000654 | | After create | 0.000053 | | Waiting for table metadata loc | 1.000069 | ..................... | After create | 0.000013 | | Waiting for table metadata loc | 0.871435 | | After create | 0.000042 | | System lock | 0.000013 | | preparing for alter table | 0.002475 | | altering table | 9.752928 | | committing alter table to stor | 0.185624 | | end | 0.000021 | | query end | 0.000010 | | closing tables | 0.000007 | | freeing items | 0.000020 | | cleaning up | 0.000011 | +--------------------------------+----------+ 83 rows in set, 1 warning (0.00 sec)
查看SESSION C
mysql> show profiles; +----------+-------------+--------------------------------------+ | Query_ID | Duration | Query | +----------+-------------+--------------------------------------+ | 1 | 10.75216050 | select * from sbtest.sbtest2 limit 1 | +----------+-------------+--------------------------------------+ 1 row in set, 1 warning (0.00 sec) mysql> show profile for query 1; +--------------------------------+-----------+ | Status | Duration | +--------------------------------+-----------+ | starting | 0.000080 | | checking permissions | 0.000007 | | Opening tables | 0.000012 | | Waiting for table metadata loc | 10.751829 | | Opening tables | 0.000094 | | init | 0.000019 | | System lock | 0.000010 | | optimizing | 0.000004 | | statistics | 0.000011 | | preparing | 0.000009 | | executing | 0.000003 | | Sending data | 0.000040 | | end | 0.000006 | | query end | 0.000008 | | closing tables | 0.000008 | | freeing items | 0.000014 | | cleaning up | 0.000009 | +--------------------------------+-----------+ 17 rows in set, 1 warning (0.00 sec)
從上述測試可以看出,SESSION C需要打開表時碰到了元數據鎖.MySQL不論SESSION A執行的是select還是delete,此時alter table語句無法獲取到metadata獨占鎖,會進行等待;所以會影響SESSION C的讀取.
這是最基本的一種情形,這個和MySQL 5.6中的online ddl并不沖突.一般alter table的操作過程中,在after create步驟會獲取metadata獨占鎖,當進行到altering table的過程時(通常是最花時間的步驟),對該表的讀寫都可以正常進行,這就是online ddl的表現,并不會像之前在整個alter table過程中阻塞寫入,當然并不是所有ALTER語句都支持online ddl.
總之,alter table的語句是很危險的(其實他的危險其實是未提交事物或者長事務導致的),在操作之前最好確認對要操作的表沒有任何進行中的操作、沒有未提交事務、也沒有顯式事務中的報錯語句.如果有alter table的維護任務,在無人監管的時候運行,最好通過lock_wait_timeout設置好超時時間,避免長時間的metedata鎖等待.
原文來自微信公眾號:運維那點事
轉載請注明本頁網址:
http://www.snjht.com/jiaocheng/2733.html