Fun MySQL fact of the day:
We started discussing secondary indices yesterday, and we considered how we might create two separate indices on our
user table. We also started to build an understanding of when MySQL can and shouldn't use an index for a query, but we omitted one fun fact.
In MySQL 5.0, a new feature called Index Merge Optimisation was released without incredible fanfare. It's a quaint feature that can optimize your
SELECT queries to use multiple indices when MySQL guesses that it's advantageous. Let's consider yesterday's example with a query such as this:
SELECT full_name, date_of_birth FROM users WHERE full_name = 'My McSqlface' OR date_of_birth = '2000-03-14 15:59';
Based on what we know from yesterday, the
OR is going to trip us up: no matter what index MySQL chooses, InnoDB is going to have to do an index scan across every record in an index. It may choose to scan the entire
PRIMARY index, the
fullname_idx index, or the
dateofbirth_fullname_idx, but in all cases, at least one side of the
OR has to evaluated out-of-order, which results in an index scan for the same reasons we discussed yesterday. This is sub-optimal. But wait! MySQL might maybe possibly try just a little bit to help you out here with the aforementioned Index Merge Optimsation! Let's take a look:
percona firstname.lastname@example.org:test> EXPLAIN SELECT * FROM user WHERE full_name = 'My McSqlface' OR date_of_birth = '2000-03-14 15:59'\G ***************************[ 1. row ]*************************** type | index_merge key | fullname_idx,dateofbirth_fullname_idx key_len | 102,5 Extra | Using sort_union(fullname_idx,dateofbirth_fullname_idx); Using where
What do we see here? We see a
Using sort_union... on our two secondary indices. And what does this mean? Well, it means that MySQL is going to run 2 separate queries on 2 separate indices, merge the results, sort them, and then return you the final result set. Seems like what we want, right? Well, maybe, but probably not. In particular, Index Merge Optimisation
- is a best guess based on small sample sizes of each index. The guess can be wildly wrong and you may end up with sub-optimal results.
- cannot be used on any queries that might be be optimised to use a range scan.
- requires the
WHEREclause to structured in a way that is obvious to the optimiser.
- uses the secondary indices only to find the primary key to store in the sort buffer, meaning the result set must always come from the
PRIMARYindex in a tertiary (or more!) read.
What this really all means is that queries that use
OR on two separate fields are quite expensive, even when they're optimized. This is one reason why some people choose to disable merge optimisations. In many cases, you're best to use
UNION, and that's only if you can't completely change your query. But that's a topic for another day.