Fun MySQL fact of the day: seconds behind master

Yesterday, we started considering some causes of replication delay. Given L. Peter Deutsch's assertions about the fallacies of distributed computing, I think we can safely agree that replication delay is going to happen. And, not only will it happen, it will happen frequently and unpredictably. So while we absolutely need to make sure our applications can handle some level of replication delay (likely on the order of 10s to 100s of seconds) without completely destroying our user experience or usability, it would also be nice to monitor it.

Out-of-the box, the MySQL replication mechanism comes with mostly-useful views into the status of the replication engine. To see the most useful and most-commonly-used information, you can run SHOW SLAVE STATUS. Depending on your client, you may wish to enable column-based output using \G instead of ;. This will make this output much easier to read. One of the somewhat useful outputs of SHOW SLAVE STATUS\G is Seconds_Behind_Master, which shows a second-granularity of best-case delay of replication. What do I mean by "best-case"? Unfortunately, the Seconds_Behind_Master metric can lie to you pretty severely.

First, what is this metric? Simply, Seconds_Behind_Master measures the difference, in seconds, between where the IO thread thinks it is after downloading the binary logs and where the SQL thread thinks it is in applying the relay logs. For example, if the last relay log downloaded has a timestamp of 2019-05-30 00:00:00 but the SQL thread is currently processing a relay log entry dated 2019-05-29 23:58:00, then we can expect Seconds_Behind_Master to be 120 (or 2 minutes). Normally, this is good, and this is an example of where we can probably trust the value in Seconds_Behind_Master. But what would happen if the IO thread effectively stalled? For example, what if we observed packet loss between the replica and its master?

Well, in a case where the IO thread has not yet downloaded the latest binary log form the master for any reason, as far as the replication engine knows, there are no new binary logs to download. This makes sense. As such, there are no new relay logs and the SQL thread and IO thread can agree that the replica is "caught-up". At this point, Seconds_Behind_Master will be 0, even though the real replication delay may, in fact, be not 0. If you've had the opportunity to maintain a master-replica MySQL topology before, you may have seen the Seconds_Behind_Master counter act sporadically during times of packet loss, jumping between 0 and a large number creating a type of saw-tooth wave. And if you have, now you have a better idea why.

Finally, you may also be wondering: what about a topology wherein a replica is a grand-child of the master database? Or great-grand-child? Well, unfortunately, things get even more desperate: Seconds_Behind_Master only reports the difference in replication lag from the replica's immediate master. That is, it's a relative metric. As such, even if an intermediate master itself is lagging, a replica of the intermediate master may report it has no replication lag at all. You may see this come up when your cluster spans multiple physical locations (such as availability zones or regions). Now, not all is lost and tomorrow, we'll find a better way.