Fun MySQL fact of the day: replication delay

Now that we have a basic understanding of how MySQL's replication mechanism works, let's poke it with a stick and see what happens. That is, of course, fun.

Let's first start running some commands, like STOP SLAVE SQL_THREAD. Based on what we learned yesterday, we know a couple things:

Now, let's START SLAVE SQL_THREAD and then STOP SLAVE IO_THREAD. Based on what we learned on Monday, we know a couple things:

You'll probably agree: we likely don't want to be stopping either replication thread too often on a replica that is currently serving requests. Intentionally stopping either thread will eventually have impact on our services. Unfortunately, our environment often has other priorities. Let's consider an example wherein a replica's connection to its master is degraded. Let's say it's subject to packet loss. In this case, we already know that the SQL thread, which reads off the local disk and writes to the local disk, will be unaffected. However, this is not true for the IO thread: it will be unable to download a master's binary logs quickly or efficiently, creating replication delay on the replica causing it to be stuck in time. Similarly, if we consider a situation wherein the replica database's IO subsystem or CPU become too heavily saturated, the SQL thread will begin slowing down. When the SQL thread takes longer to execute each change, we will, again, start to observe some amount of replication delay on the replica causing it to be stuck in time.

Thankfully, we do have a couple ways to measure (and hopefully monitor!) this type of replication lag, and we'll look at those as we finish up the week.