Fun MySQL fact of the day: MVCC-caused replication delay

Looking way back to March 11, we had discussed InnoDB's MVCC implementation. We also had discussed how InnoDB (by default) needs to keep around all versions of a record in the InnoDB undo log until all queries referencing the record either commit or roll-back. And while it may not have been entirely obvious to you, today we'll consider how long-running read-only queries can cause replication delay.

Let's say that a read-only replica has been running a query for the last 10 minutes on a fairly busy system. During this time, all versions of all records read by the long-running query are being kept around in the undo logs. Each time one of these records is updated, InnoDB needs to copy it into the undo log before updating the record. While this may not seem like much, consider that for a long-running query that has visited tens of millions of rows---or worse, a full table scan---over 10 minutes. Updates to these rows will result in buffer pool pollution from the undo logs (which, you may recall, are stored in the system table space), which results in more frequent buffer pool evictions and higher disk-bound IO.

Of course, there's another concern: purge lag. On March 12, we also discussed a case where InnoDB will purge the history list length. Unfortunately, when InnoDB is unable to purge records fast enough from the undo logs, InnoDB may inject a "small" delay to the execution of DML (Data Manipulation Language) statements of no less than 5000 microseconds (or 5 microseconds in MySQL 8.0.14). This puts back-pressure on the SQL thread resulting in replication delay.

Bottom line? Keep on EXPLAINing your queries, even if they run against a dedicated read replica, and prevent slow queries and full table scans before they happen.