Fun MySQL fact of the day: binlog row image

Considering yesterday's downsides to the ROW binary log format, you may have spent the night sad and upset, maybe even unable to sleep. Likely, the narrative was along the lines of, "why, oh why, MySQL, must you make me choose between binary log size and safety?!" Well, rest-assured, tonight, you'll sleep better after you learn about binlog_row_image.

Introduced in MySQL 5.6, the binlog_row_image variable instructs MySQL how it should write a binary log event when using the ROW format. Yesterday, we considered the default value, full, which maintains backward compatibility with previous versions of MySQL. In addition to full, two other options exist:

Today, we'll focus on the minimal setting, which is much more fun than noblob. If, like yesterday, we UPDATE employee SET name = 'Charlie Charlson' WHERE id = 2462107 when binlog_row_image=minimal and run mysqlbinlog on the binary log, we'll see an entry such as this:

#190611  8:28:26 server id 57  end_log_pos 407 CRC32 0x501823fe         Update_rows: table id 108 flags: STMT_END_F
### UPDATE `test`.`employee`
### WHERE
###   @1=2462107
### SET
###   @2='Charlie Charlson'

Unlike yesterday, however, you'll notice that we're only binary logging the affected row's primary key (@1) and name (@2). This is a log event reduction by about 30% (or 75% field-wise), but keep in mind that the actual reduction you'll see will vary on a table-by-table basis and is highly dependent on your update patterns. For example, some ORMs like to update all fields in a row even if they're not being changed (Hibernate and ActiveRecord to name a few). In such cases, a minimal format will be closer to a 50% reduction field-wise (by reducing the predicate) instead of something substantially higher. Still, 50% is nothing to scoff at.

Of course, using binlog_row_image=minimal doesn't come without downsides. First, a minimal format will only work correctly on tables with a primary key (or a single unique key). Further, when the binary log is applied, the destination table must have the same columns in the same order as the source table. In the absence of either requirement, changes will be non-deterministic, exhibiting undefined behaviour. And, well, really, why wouldn't we want primary keys and consistent tables? Anything else would be asking for trouble.