MySQL死锁案例 – Learn. Write. Repeat.

19-04-26 banq
                   

无论你是DBA还是开发人员,你都会对死锁感到不耐烦,一些死锁需要几天的时间来修复,它们很难重现,其中一些只能在生产prod机器上重现。在不知道发生了什么情况下盲目修复并不罕见,你只能假设问题出在哪里,然后在这里添加更多详细日志,最后创建一个补丁并将其投入生产,希望获得更多信息,这最近发生在我身上。在这篇文章中,我将尝试分享关于如何处理mysql中的数据库死锁的。

拉一个mysql docker镜像并启动它:

docker pull mysql
 
docker run --name local-mysql -e MYSQL_ROOT_PASSWORD=pass -d mysql:latest

让我们创建一个结构并导致死锁。首先,我们需要连接到我们的mysql实例。获取容器ID

docker ps -a

然后连接:

docker exec -ti 94300e36a1d0 /bin/bash

接下来连接到mysql实例:

mysql -p
Enter password: pass

我们创建一个数据库:

CREATE SCHEMA test;
USE test;
 
CREATE TABLE `child` (
  `id` int(11) NOT NULL,
  `child_name` varchar(45) DEFAULT NULL,
  PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
 
CREATE TABLE `parent` (
  `id` int(11) NOT NULL,
  `parent_name` varchar(45) DEFAULT NULL,
  `child_id` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `parent_child_fk_idx` (`child_id`),
  CONSTRAINT `parent_child_fk` FOREIGN KEY (`child_id`) REFERENCES `child` (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_0900_ai_ci;
 
INSERT INTO `test`.`child` (`id`, `child_name`) VALUES ('1', 'child1');
INSERT INTO `test`.`child` (`id`, `child_name`) VALUES ('2', 'child2');
INSERT INTO `test`.`child` (`id`, `child_name`) VALUES ('3', 'child3');
 
INSERT INTO `test`.`parent` (`id`, `parent_name`, `child_id`) VALUES ('1', 'parent1', '1');

现在我们已经插入了数据表结构和一些数据,我们就可以开始了。我们需要启动两个数据库连接,以便我们可以产生死锁。

第一个:

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
 
mysql> update child set child_name='child10' where id=1;
Query OK, 1 row affected (0.00 sec)
Rows matched: 1  Changed: 1  Warnings: 0

第二个:

mysql> start transaction;
Query OK, 0 rows affected (0.00 sec)
 
mysql> insert into parent values(4,'parent2',1);

我们回到第一个并写入:

mysql> delete from parent;
Query OK, 3 rows affected (0.00 sec)

即看到第二阶段出现死锁:

ERROR 1213 (40001): Deadlock found when trying to get lock; try restarting transaction

好,让我们一步一步地进行调查。首先我们得到innodb引擎的状态。

SHOW ENGINE INNODB STATUS;

显然我们对输出的死锁部分感兴趣:

------------------------
LATEST DETECTED DEADLOCK
------------------------
2019-04-23 12:16:31 0x7fddcc0e7700
*** (1) TRANSACTION:
TRANSACTION 2523, ACTIVE 7 sec inserting
mysql tables in use 1, locked 1
LOCK WAIT 4 lock struct(s), heap size 1136, 2 row lock(s), undo log entries 1
MySQL thread id 109, OS thread handle 140590591276800, query id 374587 localhost root update
insert into parent values(4,'parent2',1)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 12 page no 4 n bits 72 index PRIMARY of table `test`.`child` trx id 2523 lock mode S locks rec but not gap waiting
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 0000000009d5; asc       ;;
 2: len 7; hex 010000012101ca; asc     !  ;;
 3: len 1; hex 61; asc a;;
 
*** (2) TRANSACTION:
TRANSACTION 2522, ACTIVE 9 sec fetching rows
mysql tables in use 1, locked 1
5 lock struct(s), heap size 1136, 5 row lock(s), undo log entries 3
MySQL thread id 108, OS thread handle 140590588000000, query id 374588 localhost root updating
delete from parent
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 12 page no 4 n bits 72 index PRIMARY of table `test`.`child` trx id 2522 lock_mode X locks rec but not gap
Record lock, heap no 2 PHYSICAL RECORD: n_fields 4; compact format; info bits 0
 0: len 4; hex 80000001; asc     ;;
 1: len 6; hex 0000000009d5; asc       ;;
 2: len 7; hex 010000012101ca; asc     !  ;;
 3: len 1; hex 61; asc a;;
 
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13 page no 4 n bits 72 index PRIMARY of table `test`.`parent` trx id 2522 lock_mode X waiting
Record lock, heap no 5 PHYSICAL RECORD: n_fields 5; compact format; info bits 0
 0: len 4; hex 80000004; asc     ;;
 1: len 6; hex 0000000009db; asc       ;;
 2: len 7; hex 81000000940110; asc        ;;
 3: len 1; hex 32; asc 2;;
 4: len 4; hex 80000001; asc     ;;
 
*** WE ROLL BACK TRANSACTION (1)

我们看到事务一正在等待锁被授权,有问题的锁是在子表的主键上,它是一个共享(S)锁。

insert into parent values(4,'parent2',1)
*** (1) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 12 page no 4 n bits 72 index PRIMARY of table `test`.`child` trx id 2523 lock mode S locks rec but not gap waiting

再看到下一条信息:

delete from parent
*** (2) HOLDS THE LOCK(S):
RECORD LOCKS space id 12 page no 4 n bits 72 index PRIMARY of table `test`.`child` trx id 2522 lock_mode X locks rec but not gap
*** (2) WAITING FOR THIS LOCK TO BE GRANTED:
RECORD LOCKS space id 13 page no 4 n bits 72 index PRIMARY of table `test`.`parent` trx id 2522 lock_mode X waiting

第二段事务操作有两个锁。它在子表的主键上有一个独占(X)锁,它等待父表的主键上的独占(X)锁。这样,第一段事务等待第二段事务释放锁,而第二段等待第一段释放,相互等待,死锁。

回顾我们的查询,这非常有意义。首先我们这样做的:

update child set child_name='child10' where id=1;

这意味着它将在子表记录上创建一个独占(X)锁。我们可以通过运行验证

mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+-----------+-------------+
| object_name | lock_type | lock_mode     | lock_data | lock_status |
+-------------+-----------+---------------+-----------+-------------+
| child       | TABLE     | IX            | NULL      | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1         | GRANTED     |
+-------------+-----------+---------------+-----------+-------------+
2 rows in set (0.00 sec)

现在,当我们尝试插入带有child 1的新记录:

insert into parent values(4,'parent2',1);

我们被先前的独占(X)锁定阻止。由于外键约束, Insert需要子表记录上的共享(S)锁:

mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+-----------+-------------+
| object_name | lock_type | lock_mode     | lock_data | lock_status |
+-------------+-----------+---------------+-----------+-------------+
| parent      | TABLE     | IX            | NULL      | GRANTED     |
| child       | TABLE     | IS            | NULL      | GRANTED     |
| child       | RECORD    | S,REC_NOT_GAP | 1         | WAITING     |
| child       | TABLE     | IX            | NULL      | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1         | GRANTED     |
+-------------+-----------+---------------+-----------+-------------+
5 rows in set (0.00 sec)

当我们这样做时,这是正常的:

mysql> delete from parent;

等待一个排它(X)锁的父表。但插入它还在等待共享(S)锁。此时第二段事务被选为受害者(将被回滚),因此事务会话1获胜。

mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+------------------------+-------------+
| object_name | lock_type | lock_mode     | lock_data              | lock_status |
+-------------+-----------+---------------+------------------------+-------------+
| child       | TABLE     | IX            | NULL                   | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1                      | GRANTED     |
| parent      | TABLE     | IX            | NULL                   | GRANTED     |
| parent      | RECORD    | X             | supremum pseudo-record | GRANTED     |
| parent      | RECORD    | X             | 1                      | GRANTED     |
| parent      | RECORD    | X             | 2                      | GRANTED     |
| parent      | RECORD    | X             | 3                      | GRANTED     |
+-------------+-----------+---------------+------------------------+-------------+
7 rows in set (0.00 sec)

所以,我们从这一切中学到了什么。外键和死锁并不总是相互理解。插入/更新将通过其外键使用共享锁(S)锁定子实体。这意味着只能读取。

记录锁是对索引记录的锁定,即使定义了没有索引的表,记录锁也始终锁定索引记录。对于这种情况,InnoDB创建一个隐藏的聚簇索引并使用此索引进行记录锁定。从MySQL 8.0.1开始,还有一些很好的功能,我将在下面介绍。

跳过锁:

mysql> insert into parent select 4,'parent2',id from child where id=1 for update skip locked;
Query OK, 0 rows affected (0.00 sec)
Records: 0  Duplicates: 0  Warnings: 0

上面修饰符可用于从表中非确定性地读取行,同时跳过锁定的行。我们看到我们没有等待锁,并且父进程没有锁定独占(X)锁。 这是第二段事务。

mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+-----------+-------------+
| object_name | lock_type | lock_mode     | lock_data | lock_status |
+-------------+-----------+---------------+-----------+-------------+
| child       | TABLE     | IX            | NULL      | GRANTED     |
| child       | RECORD    | X,GAP         | 2         | GRANTED     |
| child       | TABLE     | IX            | NULL      | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1         | GRANTED     |
+-------------+-----------+---------------+-----------+-------------+
4 rows in set (0.00 sec)

我们在第一段事务会话中进行删除后得到:

mysql> delete from parent;                                                                        
Query OK, 3 rows affected (0.00 sec)
 
mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+------------------------+-------------+
| object_name | lock_type | lock_mode     | lock_data              | lock_status |
+-------------+-----------+---------------+------------------------+-------------+
| child       | TABLE     | IX            | NULL                   | GRANTED     |
| child       | RECORD    | X,GAP         | 2                      | GRANTED     |
| child       | TABLE     | IX            | NULL                   | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1                      | GRANTED     |
| parent      | TABLE     | IX            | NULL                   | GRANTED     |
| parent      | RECORD    | X             | supremum pseudo-record | GRANTED     |
| parent      | RECORD    | X             | 1                      | GRANTED     |
| parent      | RECORD    | X             | 2                      | GRANTED     |
| parent      | RECORD    | X             | 3                      | GRANTED     |
+-------------+-----------+---------------+------------------------+-------------+
9 rows in set (0.00 sec)

很简约。

此外,如果我们不想等待整个50秒(innodb_lock_wait_timeout)释放锁定,我们现在可以使用NOWAIT修饰符,它将立即引发错误。这是第二段事务会话。

mysql> insert into parent select 4,'parent2',id from child where id=1 for update nowait;
ERROR 3572 (HY000): Statement aborted because lock(s) could not be acquired immediately and NOWAIT is set.
 
mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+-----------+-------------+
| object_name | lock_type | lock_mode     | lock_data | lock_status |
+-------------+-----------+---------------+-----------+-------------+
| child       | TABLE     | IX            | NULL      | GRANTED     |
| child       | TABLE     | IX            | NULL      | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1         | GRANTED     |
+-------------+-----------+---------------+-----------+-------------+
3 rows in set (0.00 sec)

在会话中删除后我们得到了:

mysql> delete from parent;                                                                        
Query OK, 3 rows affected (0.00 sec)
 
mysql> SELECT object_name, lock_type, lock_mode, lock_data, lock_status FROM performance_schema.data_locks;
+-------------+-----------+---------------+------------------------+-------------+
| object_name | lock_type | lock_mode     | lock_data              | lock_status |
+-------------+-----------+---------------+------------------------+-------------+
| child       | TABLE     | IX            | NULL                   | GRANTED     |
| child       | TABLE     | IX            | NULL                   | GRANTED     |
| child       | RECORD    | X,REC_NOT_GAP | 1                      | GRANTED     |
| parent      | TABLE     | IX            | NULL                   | GRANTED     |
| parent      | RECORD    | X             | supremum pseudo-record | GRANTED     |
| parent      | RECORD    | X             | 1                      | GRANTED     |
| parent      | RECORD    | X             | 2                      | GRANTED     |
| parent      | RECORD    | X             | 3                      | GRANTED     |
+-------------+-----------+---------------+------------------------+-------------+
8 rows in set (0.00 sec)

再没有死锁!