2019-06-10 12:56 UTC

Fun MySQL fact of the day: ROW format

Over the last couple weeks, all of the examples we've considered have used the STATEMENT binary log format. We've also discussed, no less than two times, that the STATEMENT format is prone to incorrect results for non-deterministic DML (Data Manipulation Language) statements, and last Friday, we considered the ROW binary log format to be a meaningful, safer alternative.

So, why is the ROW binary log format safer? Well, we'll start by taking a look at what a ROW-format binary log entry looks like when we UPDATE employee SET name = 'Charlie Charlson' WHERE id = 2462107:

#190610 12:11:58 server id 57  end_log_pos 455956 CRC32 0xedd776c6      Update_rows: table id 110 flags: STMT_END_F
### UPDATE `test`.`employee`
### WHERE
###   @1=2462107
###   @2='Charlie Charlie'
###   @3='2000-04-07 13:09:46'
###   @4=9
### SET
###   @1=2462107
###   @2='Charlie Charlson'
###   @3='2000-04-07 13:09:46'
###   @4=9

What may immediately stand out to you is that the binary log stores both the old version of the row and the new version of the row, the old version as a predicate and the new version as the SET. This is important because, when trying to replay a binary log in the ROW-based format, MySQL requires that a matching row is found. If no row is found, the change will be rolled-back, and if this happens through replication, the SQL thread will halt with an error, 1032 - HA_ERR_KEY_NOT_FOUND. Now, what's important here is that with the ROW format, all changes are guaranteed to be safe: this means a replica and master cannot diverge in a vacuum, unlike with the STATEMENT format.

Of course, this does come with two non-negligible downsides:

The binary log may be substantially larger, as it has to store two versions of the row. This means larger disk space usage and higher network bandwidth.
A replica that has somehow, no matter how slightly, diverged from its master, may be unable to replicate reliably without operator intervention.

Still, you have to ask yourself if mitigating these are more important to you than your databases' data integrity.