Fun MySQL fact of the day: collation filtering
Yesterday, we considered some cases where MySQL may indicate
Using where; Using index in the
Extra field of a query plan. Today, we're going to look at one more case that existed up until MySQL 5.7, and while I haven't made a habit of digging up too many bugs as fun facts, this one was a shocker to me, personally, and is something you'll want to keep in mind.
In MySQL (in general, really), a collation is a set of rules that governs the way character sets are evaluated. For example, a character set can have a case-insensitive collation that considers
'A' as equal (which is the out-of-the-box default) or one that considers
'OE' equal. Another type of collation is padding:
PAD SPACE vs.
NO PAD. These properties specify the equality of 2 string types by either ignoring or not ignoring the white-space trailing a string. For example, in a
PAD SPACE collation,
'a ', but not in a
NO PAD collation. Until MySQL 8.0.0, all collations were
PAD SPACE, and you can demonstrating it by running
SELECT 'a ' = 'a';.
Over the last couple days, we've seen a couple situations where the MySQL query optimiser may or may not decide to do additional filtering on a query's results. This decision is based on a set of rules dictating when MySQL can "eliminate" a specific condition that MySQL knows is guaranteed to be true. For example, if a query with
WHERE a=1 AND b=1 is run against a covering index in InnoDB,
(a,b), MySQL knows that it doesn't have to do any extra filtering (InnoDB won't lie) and can drop both conditions on
b=1 thereby preventing any supplemental filtering. Now, consider a similar
SELECT ... WHERE a='a ': in a collation with
PAD SPACE, we expect all records starting with
a and ending zero or more spaces. But, until MySQL 5.7, the optimiser assumed otherwise and excluded all string types from condition elimination and always filtered them (bug #66983).
Now, when I said this was a bug, well, it was when all you have are
PAD SPACE collations. But for
NOPAD collations, this isn't a bug: when you use
NOPAD on a
CHAR type field, you will continue to get supplemental filtering due to the way storage engines may choose to optimise
CHAR storage. Thankfully, it's all fixed up the best it can be and you just have to remember the trade-offs of collation when designing your table.