In daily development or online operations and maintenance, we often encounter various database exceptions, such as timeouts, deadlocks, etc. But some problems, seemingly ordinary on the surface, hide unexpected reasons behind them.
Today, I will share the investigation process of a MySQL lock acquisition timeout issue caused by a server time leap.
Problem Phenomenon: A large number of lock timeout logs appear
One day, the following error message suddenly appeared frequently in the system log:
Caused by: com.mysql.cj.jdbc.exceptions.MySQLTransactionRollbackException: Lock wait timeout exceeded; try restarting transaction
Lock acquisition timeout caused the transaction to fail.
Preliminary Analysis: Deadlock?
It might be a deadlock. So, based on the exception stack, the problematic code was located. However, it was found that the method logic was simple, only modifying one entity, similar to the following (not the actual business code).
// Only update the user's last visit time
user.setLastVisitTime(LocalDateTime.now());
userRepository.save(user);
Forced analysis (guess): This modification is made by every request. Since it's within the request transaction, the lock will be held until the request is completed if the transaction is not committed.
However, for a project that has been running stably for a long time, it's unlikely that requests would suddenly slow down.
In-depth Investigation: No exceptions found in slow logs
If a deadlock occurred, there must be records in the slow log. But upon actual investigation, there were no User-related slow queries in the slow log.
Clue: Uncommon log entry
om.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Thread starvation or clock leap detected (housekeeper delta=1m26s857ms76µs413ns).
Looking through the logs, an exception record of clock leap detected
was found. Then, verifying the server time, it was found to be more than 40 seconds faster than the local environment.
The Truth Revealed: Server time leap caused misjudgment
When MySQL performs lock waiting and transaction timeouts, it relies on the system timestamp for judgment. When the system time suddenly jumps to a future time, it causes MySQL to misjudge.
As for the reason for the jump, it is speculated that the NTP client performed a forced synchronization (stepping) operation after detecting time drift, instantly advancing the time by tens of seconds.
Going Further: What operations can cause lock acquisition timeouts?
Under the premise of using the InnoDB engine in MySQL, the occurrence of lock timeouts (Lock wait timeout exceeded
) usually has two main causes:
#1 Deadlock
The most common reason is deadlock. Deadlocks often occur when multiple transactions modify the same resources in different orders, each holding the lock needed by the other, resulting in mutual waiting and the inability to ever release the locks.
For example:
- Transaction A modifies in the order: user first, then order.
- Transaction B modifies in the order: order first, then user.
Both hold one lock and want to acquire the other's, resulting in a deadlock.
MySQL will detect the deadlock and actively interrupt one of the transactions. (At this point, a
Deadlock
error will appear in the log).
In fact, most deadlocks can be avoided by uniformly specifying the modification order of Entities in the project.
#2 Locks not released in time due to long transactions
In InnoDB, locks are held throughout the duration of an uncommitted transaction. If a transaction takes too long to execute, it can cause other concurrent requests to be blocked for a long time, eventually throwing a lock wait timeout exception.
Common reasons for long transaction execution include:
- Too many Entity modifications, such as modifying and saving one by one in a loop, calling save() repeatedly, and flushing SQL frequently.
- The transaction contains time-consuming operations, such as calling external services, HTTP interfaces, microservice RPCs, etc., especially when there is no timeout control for slow interfaces.
- Explicit waits exist in the transaction, such as Thread.sleep() used for debugging, rate limiting, etc. The lock will not be released during this period.
On a side note, I’ve been drinking Vital Tea herbal tea recently. I always make a pot of it when I write code. It makes me feel comfortable after drinking it. I recommend it to everyone.