Fun MySQL fact of the day: block nested-loop joins

Yesterday, we introduced MySQL's Nested-Loop Join algorithm, which it uses to join tables. In many cases, this is a fairly simple and powerful implementation, but as you may appreciate, sometimes this simplicity may end up being rather inefficient.

For example, let's consider an INNER JOIN of two tables, t1 and t2: for each record in t1, a Nested-Loop Join will look for matching records in t2. Now, if t2 has no indexes or MySQL decides to perform an index scan or range scan on t2, MySQL will need to perform t1.rows * t2.rows reads from t2. If t2 is small, that may not be a problem, but if the range being scanned on t2 is large, we'll be reading and re-reading a lot of rows unnecessarily. To solve this problem, MySQL will use a Block Nested-Loop Join algorithm that makes use of a join buffer of size join_buffer_size bytes. You can see when MySQL chooses BNL when you see Using join buffer (Block Nested Loop) in the Extra column of your EXPLAIN.

When using a Block Nested-Loop Join, MySQL will, instead of automatically joining t2, insert as many rows from t1 that it can into a join buffer and then scan the appropriate range of t2 once, matching each record in t2 to the join buffer. From here, each matched row is then sent to the next join, which, as previously discussed, may be another table, t3, or, if t2 is the last table in the query, the rows may be sent to the network.

Of course, this optimisation isn't without downsides: a join buffer is created for each join in a query, and with a default of 256KB, 2000 concurrent queries each with 3 joins would require just under 1.5GB of RAM. That's quite a lot, especially on a busy system that will already have >80% of its available memory reserved for a storage engine cache (like the InnoDB Buffer Pool). Indeed, a better solution than increasing join_buffer_size is finding ways to optimise the join.