Fun MySQL fact of the day: in defence of auto_increment

Monday's introduction to auto_increment maybe wasn't as glowing as it should have been, but we really shouldn't let one little bug well-documented issue stop us from using it. You see, auto_increment really is an invaluable tool, and today we'll consider why.

As we discussed yesterday, in InnoDB, a "table" is actually a b+tree index (the PRIMARY index, or just PRIMARY) in which "rows" are actually "records". In the PRIMARY index, records are sorted in ascending order of the primary key. While designing a schema, deciding on a primary key is as much a science as it is an art, and the field(s) you eventually choose is critical to the performance of your database. Today, we're just going to focus on an OLTP (Online Transaction Processing) use case, which primarily optimises for write throughput and read latency of small data sets. Most web-based systems fit this category.

I'll make a bold proclamation: in any system that optimises for write throughput, a surrogate primary key (that is, one that's unrelated to the data) consisting of an auto_increment value will nearly always win-out in terms of inserts per second. To explain why, let's consider that we currently have a PRIMARY index with the following keys: [ 1, 3, 5, 6, 8 ].

Good. Now, let's insert a new record with a key value of 9. Simple: we stick it right at the end. Then 10? Again, right at the end. What about 522 or 33977344? Right smack at the end, again. It's "almost" as "simple" as that (read: come along with me on my imaginary optimism trip into make-believe database land!).

Now, what happens if we instead need to insert 4? It goes after 3 but before 5. And what about 7? You get the point: unordered inserts require extra work, which slows down the process of inserts, which, as we recall from the last few weeks, slows everything down.

Of course, there's a whole lot more complexity, but one thing remain constant: inserting (increasing) monotonic records is a heck of a lot more efficient than random inserts. So, should you use an auto_increment column (which provides monotonic, sequential values) for your primary key? It highly depends on your scenario, but you'd be crazy not to at least consider it.