Fun MySQL fact of the day: duplicate weedout
Last week, we started looking at some of MySQL's semi-join optimisations for subqueries. We've already covered materialisiation,
LooseScan, and table pullout optimisations, but there's still one more to clover: the
DuplicateWeedout optimisation, and then thistle be the last topic we discuss about about subqueries.
If you remember from last week, one of the guarantees that MySQL makes when resolving queries with subqueries is that you will not receive duplicate rows from the outer table as a result of multiple matches in a subquery. But because, as we've discussed, MySQL uses a nested loop join (or some variant thereof) to resolve a subquery, this takes extra work. With both
FirstMatch, MySQL is able to "short-circuit" the nested loop join; however, this may not always be the cheapest approach. And this is when MySQL may choose to use the
When MySQL deems that the
DuplicateWeedout semi-join optimisation is the "cheapest" way to answer a query with a subquery, MySQL will create a temporary "weedout" table with a single, unique (hence indexed)
rowids, using the default storage engine. Then, after all table joins for a given row in the outer table(s) have been resolved, MySQL will calculate a unique hash of the row IDs for the outer table and check for its presence in the temporary "weedout" table. If the row has already been seen before, the current row is dropped and execution continues. Otherwise, the new row ID is inserted into the "weedout" table and the row is returned (either to the client or to the next applicable join). You'll know when MySQL chooses the
DuplicateWeedout strategy when you see
Start temporary and
End temporary in the
Extra of your
Now, if you spent the weekend like me, pulling out weeds, after plantain even more ideas about weeds in your head, you may, in fact, be going nutsedge. But don't worry: I hear you thinking, "kudzu please stop?". And I will. Until tomorrow.