2019-05-02 08:00 UTC

Fun MySQL fact of the day: paginated ranges

If the last few Fun MySQL Facts Of The Day weren't enough to convince you that your actions can have severe consequences in a hefty database of hundreds of millions or hundreds of billions of rows, I'm not sure what else can. But they absolutely can, and when the stakes are high, we have to be thoughtful. And now that we have a better understanding about filesorting and OFFSETting in MySQL, I think we'll agree that we'll want to avoid them as much as we can (whenever it makes sense to, anyways). While neither filesort nor OFFSET are necessarily bad implementations, they may result in work we can entirely avoid with some critically thought-out user work flows supporting well-designed queries and schema designs. Over the next few days, we'll combine the information we've been collecting over the last couple months to consider a few creative-yet-efficient flows for data pagination.

Today, we'll start very simple and go back to our closing thoughts from last Friday: in a paginated result set of rows id=1..id=10, MySQL isn't "smart" enough to jump straight to id=11 to serve page 2 using an OFFSET. As we discussed, if we use an OFFSET 10, MySQL will still evaluate all rows between id=1 and id=20, meaning 100% overhead. Or, worse, 200% overhead for page 3 and 300% overhead for page 4, and so on and so on. This is even worse if we use a filesort, which may be sorting records we'll likely never see. For today, let's imagine a simple scenario where we wish to paginate a set of results by a primary key or other indexed, unique key. Perhaps this is an inventory table paginated by part_number. Let's go-ahead and query for page 1:

SELECT part_number, name, description FROM inventory ORDER BY part_number LIMIT 10

Because part_number is indexed, the execution plan for this query will be simple: if part_number is a primary key, the query will likely do a scan of PRIMARY for the first 10 records. We know that the data is stored in the same index, and we have no further reads. On the other hand, if part_number is a unique, non-primary key, depending on your schema, the query may or may not be eligible for covering index optimisation. Either way, this query will be about as fast as you can get as it's querying data in index order. Now, let's pretend this query returns 10 rows where part_number is between 1 and 33 (i.e. there are 23 part_numbers that have been deleted). Then, when the user requests page 2 or above, we'll need to change the query just a little bit:

SELECT part_number, name, description FROM inventory WHERE part_number > 33 ORDER BY part_number LIMIT 10

The only change here is adding a WHERE part_number > 33, wherein 33 references the highest part_number from page 1. When MySQL executes this query, it is able to "skip" to the records where part_number > 33 in part_number order all without any supplemental filtering or sorting.

Now, before you ask: no. Using such a technique will, indeed, prevent you from skipping "directly" to a specific page. And if you absolutely need to, then I challenge you to refactor your user experience to eliminate such a need. Consider, for example, replacing deep pagination with the ability to search or aggregate, instead. Really, if a user already knows they need to go to "page 4", perhaps "page 4" really should be "page 1".

Either way, I said, we started out simple today. Tomorrow, we'll think our way through a slightly more complex situation where our sort key isn't unique.