2019-05-31 08:00 UTC

Fun MySQL fact of the day: absolute delay

Yesterday, we established an understanding of Seconds_Behind_Master and identified some of its shortcomings. While it's not a completely useless metric, it's also not a very helpful one. What we probably really want is an absolute measure of replication latency, and we'll look at one way of accomplishing that today.

Out of the box, no existing or planned (to the best of my own investigative abilities) version of MySQL is capable of measuring absolute replication lag. In MySQL 5.5, a new feature, MASTER_HEARTBEAT_PERIOD was added. This feature has a replica request the master periodically send a "dummy" event to ensure that in the absence of any new binary logs, the IO thread is still alive and able to communicate with the replica's master. Unfortunately, this "dummy" event injected into the master's binary log response is not written into the relay log meaning that it will not be "passed down" to any replicas of this replica. As such, using MASTER_HEARTBEAT_PERIOD can help us approximate an absolute replication delay on a "basic" two-level topology. However, even a moderately complex deployment will likely have no less than 3 levels, with read-servicing replicas being grand-children of the master node. This would seem to make MASTER_HEARTBEAT_PERIOD not a great option. As such, we need to look outward for solutions.

Thankfully, absolute replication delay is a fairly "solved" problem and off-the-shelf tooling exists whether or not you choose to roll-your-own (which, as a default, I tend to advise against). One such tool is Percona's pt-heartbeat, and provided you eventually converge on a configuration that works for you, it can work pretty well. But, regardless of the build-or-buy decision you come to, today, we will consider the algorithm and not the implementation.

Most or all implementations rely on a process or daemon running on the master node. This process would, periodically, insert the server's current timestamp into a table. As with any normal insert, this will be written into the master's binary log and subsequently downloaded and applied to all of its replicas. From here, each replica would perform the same insert and write it into their own binary log. Then, the next level of replicas will do the same. And so on. The periodic-yet-continual nature of these timestamp inserts, can be thought of as a "heartbeat" that can be measured throughout a cluster.

And how do we measure it? Simply, on each replica, the absolute replication delay can be calculated by subtracting the current timestamp from the latest one written by the heartbeat process running on the master. For example, if at 2019-05-31 00:01:00 a replica had the latest heartbeat row of 2019-05-31 00:00:00, we would know with absolute certainty that the replica is delayed by 60 seconds (plus or minus the clock and task scheduling granularity). This holds true for any replica in the cluster, and should any intermediary master be the source of replication lag, it, along with all of its children, will report the same amount of replication lag.