Fun MySQL fact of the day: PITR without GTIDs
When we isolated yesterday's bad transaction using GTIDs, life was pretty simple. But in some cases, GTIDs won't be an option for us. So, today, we'll take a slightly closer look at the binary log and figure out an alternative.
Without a GTID, a transaction can be specified to the replication engine using two pieces of information: a log file (
RELAY_LOG_FILE) and the log file position (
RELAY_LOG_POS). The position is, essentially, a byte offset allowing MySQL to
seek to the correct position in the binary or relay log file. Today, we're going to focus on
RELAY_LOG_POS for simplicity, but this implies that our replication chain continues to be healthy and that we are not required to move the delayed replica under a different master. Now, starting simple, much like yesterday, we can use
mysqlbinlog to find our bad query; except, today, because we're using
RELAY_LOG_POS, we'll want to search for it in the relay logs on our delayed host:
$ mysqlbinlog relaylog.0000024 # at 239 #190606 12:21:37 server id 1 end_log_pos 318 CRC32 0xb5aa47b1 Query thread_id=28 exec_time=0 error_code=0 BEGIN; # at 318 #190606 12:21:37 server id 1 end_log_pos 445 CRC32 0xd6f803c1 Query thread_id=28 exec_time=0 error_code=0 UPDATE user SET full_name = 'Mary Mary' WHERE id = 33; # at 445 #190606 12:21:40 server id 1 end_log_pos 476 CRC32 0xdc463c6f Xid = 12826 COMMIT; # at 476 #190606 12:21:46 server id 1 end_log_pos 555 CRC32 0xfb1bdef3 Query thread_id=28 exec_time=201 error_code=0 BEGIN; # at 555 #190606 12:21:46 server id 1 end_log_pos 674 CRC32 0x58358895 Query thread_id=28 exec_time=201 error_code=0 UPDATE user SET full_name = 'Charlie Charlie'; # at 674 #190606 12:29:40 server id 1 end_log_pos 705 CRC32 0x41800509 Xid = 12915 COMMIT;
I have taken the liberty of cleaning-up this output to make it a bit simpler to see, and I'd like to draw your attention to the lines starting with
# at, which indicate the log position (i.e. byte offset). These are the values we may pass into
mysqlbinlog also prints out the end position of an entry, such as
end_log_pos. In our case, we know that the change
WHERE id = 33 is legitimate, and we can also see that the transaction completes at log position 476, which conveniently happens to be the start of the bad transaction. With this information, we can
STOP SLAVE SQL_THREAD; CHANGE MASTER TO MASTER_DELAY = 0; START SLAVE SQ_THREAD UNTIL RELAY_LOG_FILE='relaylog.0000024', RELAY_LOG_POS=476;.
At this point you may, again, go grab a coffee while the SQL thread catches up from its delay until stopping just before the bad transaction. Now unlike yesterday where we could inject a "fake" transaction, because we aren't using GTIDs, we need another way of skipping the bad transaction. This is where the
sql_slave_skip_counter counter comes in: it can be used to skip over transactions (or single changes for non-transactional engines). So, if we
SET GLOBAL sql_slave_skip_counter = 1 and then
START SLAVE SQL_THREAD, MySQL will skip over the 1 transaction (the bad transaction) and then resume running the remaining relay logs normally until the SQL thread has caught up completely.
Now, I feel it's worth pointing out that, so far, all of our binary log Fun MySQL Facts Of The Day have used
STATEMENT-based replication solely for purposes of making our examples simpler to understand. However, if you can remember way back to February 28, you may recall we discussed one of several cases where
STATEMENT-based replication can be extremely unsafe. So, tomorrow, we'll start to look at why and then move into next week looking at some safer alternatives.