Fun MySQL fact of the day: hypothetical buffer pool size

Yesterday, we imagined a hypothetical 200GB database and an InnoDB buffer pool of 204.8GB. We also ignored some details to decide that we can hold something like 92% of the database in memory. Today, I propose we continue being ignorant of details and further explore this hypothetical situation.

So, if we say we can hold 92% of our database in memory, that means we're reading the other 8% from disk on-demand. It also means that 8% is at the cost of another 8%. However, we also need to consider one very important question: do we even care about that 8%? With obvious exception, many databases hold data that's infrequently, if ever, used.

For example, in a database for a SaaS vendor, customer attrition may result in unused, if not completely inaccessible, data. For example, let's pretend you run a SaaS company with an annual customer attrition rate of 2.5%. In such a case, unless you enforce a data retention policy (you do enforce a data retention policy, right?) and aggressively purge unused data, 2.5% of your customer data, per year, would become unused but still stored. Assuming each customer had roughly the same amount of data, this would mean we stop using 5GB of our database each year. Still, this assumes you have no index fragmentation, so it's never going to be just this simple.

Alternatively, consider the same hypothetical database where data ages quickly. For example, imagine if your database held time-sensitive information that became pretty much useless after, say, 6 months. Maybe it is accessed less than 0.1% of the time. And, well, given the same set of assumptions and a linear data growth rate over your last 5 years (40GB/year or 20GB/6months), we'd only need to hold 20GB of "important" data in memory 99.9% of the time, leaving 184.8GB free for less-frequently-used pages---quite possibly a big waste of money. Sure, though, this comes with other caveats we'll get to over the next little while, but it's a thought worth considering when sizing your buffer pool.

Still, you know you're pretty far into hypothetical territory when you're making assumptions about your hypotheticals. But the point remains: your database will have a certain amount of unused data, and the money you spend (or don't spend) to keep it in the InnoDB buffer pool might be better spent somewhere else. Or maybe not. But you're the only one that can make that decision.