Fun MySQL fact of the day: LIMIT your expectations

Before we take a close look at OFFSETs next week, we're going to take a quick peek inside MySQL to understand how a simple SELECT ... LIMIT works, and while we're going to consider a simple case, we're also going to fold in some information we've already covered.

Let's consider an example query of SELECT b FROM my_table WHERE b = 10 LIMIT 10, against a table, (id,a,b,c,d) with 1 secondary index, (a,b,c). We know that this query can leverage covering index optimisation (i.e. Using index), since b is present in (a,b,c), but we also know that supplemental filtering will happen (i.e. Using where) because MySQL is not able to eliminate WHERE b = 7 since we are not evaluating in order b. If you've followed Fun MySQL Facts Of The Day, this won't be news to you: the query will require an index scan on (a,b,c), but what's different is that we have a LIMIT 10.

As usual, to answer the query, MySQL will ask my_table's storage engine for the first "matching" row. In this case, because we're using an index scan, the first "matching" row is the first row in order, a. For example, a=0,b=0. Also as usual, the storage engine will return this row, a=0,b=0, to MySQL for supplemental filtering. MySQL will evaluate the record, and if it is not a match for b=10 (and it's not), MySQL will drop it and ask the storage engine for the next "matching" row. This will continue happening until MySQL has seen at most 10 rows where b=10.

Now, if you're thinking "this seems pretty obvious", I don't necessarily disagree. However, what is not always immediately apparent is that while our query has LIMIT 10, MySQL may be forced to evaluate hundreds, thousands, or millions of rows (if not more!) before finding 10 rows WHERE b = 10. So, even if you think, "I'm only selecting 10 rows", you need to consider that while the result set is will be 10 rows or less, MySQL (and the storage engine) may need to evaluate many more, if not all the rows.

What can we take away from this? If nothing else: even a simple-yet-poorly-designed query with a LIMIT can be just as expensive as the same query without a LIMIT.