Fun MySQL fact of the day: OR not

We started discussing secondary indices yesterday, and we considered how we might create two separate indices on our user table. We also started to build an understanding of when MySQL can and shouldn't use an index for a query, but we omitted one fun fact.

In MySQL 5.0, a new feature called Index Merge Optimisation was released without incredible fanfare. It's a quaint feature that can optimize your SELECT queries to use multiple indices when MySQL guesses that it's advantageous. Let's consider yesterday's example with a query such as this:

SELECT full_name, date_of_birth FROM users WHERE full_name = 'My McSqlface' OR date_of_birth = '2000-03-14 15:59';

Based on what we know from yesterday, the OR is going to trip us up: no matter what index MySQL chooses, InnoDB is going to have to do an index scan across every record in an index. It may choose to scan the entire PRIMARY index, the fullname_idx index, or the dateofbirth_fullname_idx, but in all cases, at least one side of the OR has to evaluated out-of-order, which results in an index scan for the same reasons we discussed yesterday. This is sub-optimal. But wait! MySQL might maybe possibly try just a little bit to help you out here with the aforementioned Index Merge Optimsation! Let's take a look:

percona me@> EXPLAIN SELECT * FROM user WHERE full_name = 'My McSqlface' OR date_of_birth = '2000-03-14 15:59'\G
***************************[ 1. row ]***************************
type          | index_merge
key           | fullname_idx,dateofbirth_fullname_idx
key_len       | 102,5
Extra         | Using sort_union(fullname_idx,dateofbirth_fullname_idx); Using where

What do we see here? We see a Using sort_union... on our two secondary indices. And what does this mean? Well, it means that MySQL is going to run 2 separate queries on 2 separate indices, merge the results, sort them, and then return you the final result set. Seems like what we want, right? Well, maybe, but probably not. In particular, Index Merge Optimisation

What this really all means is that queries that use OR on two separate fields are quite expensive, even when they're optimized. This is one reason why some people choose to disable merge optimisations. In many cases, you're best to use UNION, and that's only if you can't completely change your query. But that's a topic for another day.