本文針對(duì)MySQL InnoDB中在Repeatable Read的隔離級(jí)別下使用select for update可能引發(fā)的死鎖問題進(jìn)行分析。
1. 業(yè)務(wù)案例
業(yè)務(wù)中需要對(duì)各種類型的實(shí)體進(jìn)行編號(hào),例如對(duì)于x類實(shí)體的編號(hào)可能是x201712120001,x201712120002,x201712120003類似于這樣。可以觀察到這類編號(hào)有兩個(gè)部分組成:x+日期作為前綴,以及流水號(hào)(這里是四位的流水號(hào))。
如果用數(shù)據(jù)庫表實(shí)現(xiàn)一個(gè)能夠分配流水號(hào)的需求,無外乎就可以建立一個(gè)類似于下面的表:
CREATETABLEnumber ( prefix VARCHAR(20) NOTNULLDEFAULT''COMMENT'前綴碼', valueBIGINTNOTNULLDEFAULT0COMMENT'流水號(hào)', UNIQUEKEY uk_prefix(prefix) );
那么在業(yè)務(wù)層,根據(jù)業(yè)務(wù)規(guī)則得到編號(hào)的前綴比如x20171212,接下去就可以在代碼中起事務(wù),用select for update進(jìn)行如下的控制。
@Transactional long acquire(String prefix) { SerialNumber current = dao.selectAndLock(prefix); if (current == null) { dao.insert(new Record(prefix, 1)); return1; } else { current.number++; dao.update(current); return current.number; } }
這段代碼做的事情其實(shí)就是加鎖篩選,有則更新,無則插入,然而在Repeatable Read的隔離級(jí)別下這段代碼是有潛在死鎖問題的。(另一處與事務(wù)傳播行為相關(guān)的問題也會(huì)在下文提及)。
2. 分析與解決
當(dāng)可以通過select for update的where條件篩出記錄時(shí),上面的代碼是不會(huì)有deadlock問題的。然而當(dāng)select for update中的where條件無法篩選出記錄時(shí),這時(shí)在有多個(gè)線程執(zhí)行上面的acquire方法時(shí)是可能會(huì)出現(xiàn)死鎖的。
2.1 一個(gè)簡(jiǎn)單的復(fù)現(xiàn)場(chǎng)景
下面通過一個(gè)比較簡(jiǎn)單的例子復(fù)現(xiàn)一下這個(gè)場(chǎng)景首先給表里初始化3條數(shù)據(jù)。
insertintonumberselect'bbb',2; insertintonumberselect'hhh',8; insertintonumberselect'yyy',25;
接著按照如下的時(shí)序進(jìn)行操作:
session 1 | session 2 |
---|---|
begin; | |
begin; | |
select * from number where prefix='ddd' for update; | |
select * from number where prefix='fff' for update | |
insert into number select 'ddd',1 | |
鎖等待中 | insert into number select 'fff',1 |
鎖等待解除 | 死鎖,session 2的事務(wù)被回滾 |
2.2 分析下這個(gè)死鎖
通過查看show engine innodb status的信息,我們慢慢地觀察每一步的情況:
2.2.1 session1做了select for update
------------TRANSACTIONS------------Trx id counter 238435Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idleHistory list length 13LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 281479459589696, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 281479459588792, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 238434, ACTIVE 3 sec2 lock struct(s), heap size 1136, 1 row lock(s)MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost rootTABLE LOCK table?test.number?trx id 238434 lock mode IXRECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238434 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;
事務(wù)238434拿到了hhh前的gap鎖,也就是('bbb', 'hhh')的gap鎖。
2.2.2 session2做了select for update
------------TRANSACTIONS------------Trx id counter 238436Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idleHistory list length 13LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 281479459589696, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 238435, ACTIVE 3 sec2 lock struct(s), heap size 1136, 1 row lock(s)MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost rootTABLE LOCK table?test.number?trx id 238435 lock mode IXRECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238435 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;---TRANSACTION 238434, ACTIVE 30 sec2 lock struct(s), heap size 1136, 1 row lock(s)MySQL thread id 160, OS thread handle 123145573965824, query id 69153 localhost rootTABLE LOCK table?test.number?trx id 238434 lock mode IXRECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238434 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;
事務(wù)238435也拿到了hhh前的gap鎖。
截自InnoDB的lock_rec_has_to_wait方法實(shí)現(xiàn),可以看到的LOCK_GAP類型的鎖只要不帶有插入意向標(biāo)識(shí),不必等待其它鎖(表鎖除外)
2.2.3 session1嘗試insert
------------TRANSACTIONS------------Trx id counter 238436Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idleHistory list length 13LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 281479459589696, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 238435, ACTIVE 28 sec2 lock struct(s), heap size 1136, 1 row lock(s)MySQL thread id 161, OS thread handle 123145573408768, query id 69155 localhost rootTABLE LOCK table?test.number?trx id 238435 lock mode IXRECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238435 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;---TRANSACTION 238434, ACTIVE 55 sec insertingmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executinginsert into number select 'ddd',1------- TRX HAS BEEN WAITING 2 SEC FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238434 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;
TABLE LOCK tabletest.numbertrx id 238434 lock mode IXRECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest.numbertrx id 238434 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest.numbertrx id 238434 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;
可以看到,這時(shí)候事務(wù)238434在嘗試插入'ddd',1時(shí),由于發(fā)現(xiàn)其他事務(wù)(238435)已經(jīng)有這個(gè)區(qū)間的gap鎖,因此innodb給事務(wù)238434上了插入意向鎖,鎖的模式為L(zhǎng)OCK_X | LOCK_GAP | LOCK_INSERT_INTENTION,等待事務(wù)238435釋放掉gap鎖。
截取自InnoDB的lock_rec_insert_check_and_lock方法實(shí)現(xiàn)
2.2.4 session2嘗試insert
------------------------LATEST DETECTED DEADLOCK------------------------2017-12-21 2240 0x70001028a000*** (1) TRANSACTION:TRANSACTION 238434, ACTIVE 81 sec insertingmysql tables in use 1, locked 1LOCK WAIT 3 lock struct(s), heap size 1136, 2 row lock(s)MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost root executinginsert into number select 'ddd',1*** (1) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest.numbertrx id 238434 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;*** (2) TRANSACTION:TRANSACTION 238435, ACTIVE 54 sec insertingmysql tables in use 1, locked 13 lock struct(s), heap size 1136, 2 row lock(s)MySQL thread id 161, OS thread handle 123145573408768, query id 69159 localhost root executinginsert into number select 'fff',1*** (2) HOLDS THE LOCK(S):RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest.numbertrx id 238435 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;*** (2) WAITING FOR THIS LOCK TO BE GRANTED:RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of tabletest.numbertrx id 238435 lock_mode X locks gap before rec insert intention waitingRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;*** WE ROLL BACK TRANSACTION (2)
TRANSACTIONS
Trx id counter 238436Purge done for trx's n:o < 238430 undo n:o < 0 state: running but idleHistory list length 13LIST OF TRANSACTIONS FOR EACH SESSION:---TRANSACTION 281479459589696, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 281479459588792, not started0 lock struct(s), heap size 1136, 0 row lock(s)---TRANSACTION 238434, ACTIVE 84 sec3 lock struct(s), heap size 1136, 3 row lock(s), undo log entries 1MySQL thread id 160, OS thread handle 123145573965824, query id 69157 localhost rootTABLE LOCK table?test.number?trx id 238434 lock mode IXRECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238434 lock_mode X locks gap before recRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;Record lock, heap no 7 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 646464; asc ddd;;1: len 6; hex 00000003a362; asc b;;2: len 7; hex de000001e60110; asc ;;3: len 8; hex 8000000000000001; asc ;;RECORD LOCKS space id 1506 page no 3 n bits 80 index uk_prefix of table?test.number?trx id 238434 lock_mode X locks gap before rec insert intentionRecord lock, heap no 3 PHYSICAL RECORD: n_fields 4; compact format; info bits 00: len 3; hex 686868; asc hhh;;1: len 6; hex 00000003a350; asc P;;2: len 7; hex d2000001ff0110; asc ;;3: len 8; hex 8000000000000008; asc ;;
到了這里,我們可以從死鎖信息中看出,由于事務(wù)238435在插入時(shí)也發(fā)現(xiàn)了事務(wù)238434的gap鎖,同樣加上了插入意向鎖,等待事務(wù)238434釋放掉gap鎖。因此出現(xiàn)死鎖的情況。
2.3 debug it!
接下來通過debug MySQL的源碼來重新復(fù)現(xiàn)上面的場(chǎng)景。
這里session2的事務(wù)4445加鎖的type_mode為515,也即(LOCK_X | LOCK_GAP),與session1事務(wù)的鎖4444的gap鎖lock2->type_mode=547(LOCK_X | LOCK_REC | LOCK_GAP)的lock_mode是不兼容的(兩者皆為L(zhǎng)OCK_X)。然而由于type_mode滿足LOCK_GAP且不帶有LCK_INSERT_INTENTION的標(biāo)識(shí)位,這里會(huì)判定為不需要等待。因此,第二個(gè)session執(zhí)行select for update也同樣成功加上gap鎖了。
這里sesion1事務(wù)4444執(zhí)行insert時(shí)type_mode為2563(LOCK_X | LOCK_GAP | LOCK_INSERT_INTENTION),由于帶有LOCK_INSERT_INTENTION標(biāo)識(shí)位,因此需要等待session2事務(wù)釋放4445的gap鎖。后續(xù)session1事務(wù)4444獲得了一個(gè)插入意向鎖,并且在等待session2事務(wù)4445釋放gap鎖。
這里session2事務(wù)4445同樣執(zhí)行了insert操作,插入意向鎖需要等待session1的事務(wù)4444的gap鎖釋放。在死鎖檢測(cè)時(shí),被探測(cè)到形成等待環(huán)。因此InnoDB會(huì)選擇一個(gè)事務(wù)作為victim進(jìn)行回滾。其過程大致如下:
session2嘗試獲取插入意向鎖,需要等待session1的gap鎖
session1事務(wù)的插入意向鎖處于等待中
session1事務(wù)插入意向鎖在等待session2的gap鎖
形成環(huán)路,檢測(cè)到死鎖
2.4 如何避免這個(gè)死鎖
我們已經(jīng)知道,這種情況出現(xiàn)的原因是:兩個(gè)session同時(shí)通過select for update,并且未命中任何記錄的情況下,是有可能得到相同gap的鎖的(要看where篩選條件是否落在同一個(gè)區(qū)間。如果上面的案例如果一個(gè)session準(zhǔn)備插入'ddd'另一個(gè)準(zhǔn)備插入'kkk'則不會(huì)出現(xiàn)沖突,因?yàn)椴皇峭粋€(gè)gap)。此時(shí)再進(jìn)行并發(fā)插入,其中一個(gè)會(huì)進(jìn)入鎖等待,待第二個(gè)session進(jìn)行插入時(shí),會(huì)出現(xiàn)死鎖。MySQL會(huì)根據(jù)事務(wù)權(quán)重選擇一個(gè)事務(wù)進(jìn)行回滾。
那么如何避免這個(gè)情況呢?一種解決辦法是將事務(wù)隔離級(jí)別降低到Read Committed,這時(shí)不會(huì)有g(shù)ap鎖,對(duì)于上述場(chǎng)景,如果where中條件不同即最終要插入的鍵不同,則不會(huì)有問題。如果業(yè)務(wù)代碼中可能不同線程會(huì)嘗試對(duì)相同鍵進(jìn)行select for update,則可在業(yè)務(wù)代碼中捕獲索引沖突異常進(jìn)行重試。此外,上面代碼示例中的代碼還有一處值得注意的地方是事務(wù)注解@Transactional的傳播機(jī)制,對(duì)于這類與主流程事務(wù)關(guān)系不大的方法,應(yīng)當(dāng)將事務(wù)傳播行為改為REQUIRES_NEW。原因有兩點(diǎn):
因?yàn)檫@里的解決方案是對(duì)隔離級(jí)別降級(jí),如果傳播行為仍然是默認(rèn)的話,在外層事務(wù)隔離級(jí)別不是RC的情況下,會(huì)拋出IllegalTransactionStateException異常(在你的TransactionManager開啟了validateExistingTransaction校驗(yàn)的情況下)。
如果加入外層事務(wù)的話,某個(gè)線程在執(zhí)行獲取流水號(hào)的時(shí)候可能會(huì)因?yàn)榱硪粋€(gè)線程的與流水號(hào)不相關(guān)的事務(wù)代碼還沒執(zhí)行完畢而阻塞。
責(zé)任編輯:xj
原文標(biāo)題:select for update 引發(fā)的死鎖分析,太驚險(xiǎn)了
文章出處:【微信公眾號(hào):數(shù)據(jù)分析與開發(fā)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
-
死鎖
+關(guān)注
關(guān)注
0文章
25瀏覽量
8082 -
MySQL
+關(guān)注
關(guān)注
1文章
829瀏覽量
26692
原文標(biāo)題:select for update 引發(fā)的死鎖分析,太驚險(xiǎn)了
文章出處:【微信號(hào):DBDevs,微信公眾號(hào):數(shù)據(jù)分析與開發(fā)】歡迎添加關(guān)注!文章轉(zhuǎn)載請(qǐng)注明出處。
發(fā)布評(píng)論請(qǐng)先 登錄
相關(guān)推薦
評(píng)論