Fun MySQL fact of the day: paginated ranges
If the last few Fun MySQL Facts Of The Day weren't enough to convince you that your actions can have severe consequences in a hefty database of hundreds of millions or hundreds of billions of rows, I'm not sure what else can. But they absolutely can, and when the stakes are high, we have to be thoughtful. And now that we have a better understanding about
OFFSETting in MySQL, I think we'll agree that we'll want to avoid them as much as we can (whenever it makes sense to, anyways). While neither
OFFSET are necessarily bad implementations, they may result in work we can entirely avoid with some critically thought-out user work flows supporting well-designed queries and schema designs. Over the next few days, we'll combine the information we've been collecting over the last couple months to consider a few creative-yet-efficient flows for data pagination.
Today, we'll start very simple and go back to our closing thoughts from last Friday: in a paginated result set of rows
id=1..id=10, MySQL isn't "smart" enough to jump straight to
id=11 to serve page 2 using an
OFFSET. As we discussed, if we use an
OFFSET 10, MySQL will still evaluate all rows between
id=20, meaning 100% overhead. Or, worse, 200% overhead for page 3 and 300% overhead for page 4, and so on and so on. This is even worse if we use a
filesort, which may be sorting records we'll likely never see. For today, let's imagine a simple scenario where we wish to paginate a set of results by a primary key or other indexed, unique key. Perhaps this is an inventory table paginated by
part_number. Let's go-ahead and query for page 1:
SELECT part_number, name, description FROM inventory ORDER BY part_number LIMIT 10
part_number is indexed, the execution plan for this query will be simple: if
part_number is a primary key, the query will likely do a scan of
PRIMARY for the first 10 records. We know that the data is stored in the same index, and we have no further reads. On the other hand, if
part_number is a unique, non-primary key, depending on your schema, the query may or may not be eligible for covering index optimisation. Either way, this query will be about as fast as you can get as it's querying data in index order. Now, let's pretend this query returns 10 rows where
part_number is between
33 (i.e. there are 23
part_numbers that have been deleted). Then, when the user requests page 2 or above, we'll need to change the query just a little bit:
SELECT part_number, name, description FROM inventory WHERE part_number > 33 ORDER BY part_number LIMIT 10
The only change here is adding a
WHERE part_number > 33, wherein
33 references the highest
part_number from page 1. When MySQL executes this query, it is able to "skip" to the records where
part_number > 33 in
part_number order all without any supplemental filtering or sorting.
Now, before you ask: no. Using such a technique will, indeed, prevent you from skipping "directly" to a specific page. And if you absolutely need to, then I challenge you to refactor your user experience to eliminate such a need. Consider, for example, replacing deep pagination with the ability to search or aggregate, instead. Really, if a user already knows they need to go to "page 4", perhaps "page 4" really should be "page 1".
Either way, I said, we started out simple today. Tomorrow, we'll think our way through a slightly more complex situation where our sort key isn't unique.