Fun MySQL fact of the day: table pullouts

As noted yesterday, MySQL 5.6 was released with a large number of optimisations to the subquery engine. In particular, a number of changes were made to optimise for non-scalar, uncorrelated subqueries, allowing them to be run as... you guess it! Joins.

Except, these aren't any ordinary joins. No. They're semi-joins. "Why," you may ask? Well, unlike a "normal" join, a subquery has the following properties:

And so it follows that these aren't "complete" joins but are, instead, semi-joins (which sounds better than "kinda-like-joins"). But how does this look? Today, we'll briefly discuss the "table pullout" semi-join strategy, which is used when MySQL determines it is able to "pull" the table out of a subquery and make it a join for you. Let's consider a company->employee relation and a query wherein we wish to select all employees for an company by date of company incorporation:

EXPLAIN SELECT * FROM employee WHERE company_id IN (SELECT id FROM company WHERE date_of_incorporation = 1996);

If you run this EXPLAIN in MySQL 5.5, you will likely see the company table is resolved with a DEPENDENT SUBQUERY, meaning the subquery will be executed once per employee row. And, because this is a full table scan of employee, that's a lot of rows! But, in MySQL 5.6, if a table pullout strategy is chosen, you will likely see that company is being queried as a SIMPLE type first, and then being joined to employee, also using a SIMPLE query type. In this case, MySQL will only need to evaluate the matching employee * company rows, as opposed to an index scan over company once per row in the employee table.

When MySQL uses a table pullout strategy, you may not immediately identify it based on the EXPLAIN plan. But, once you run the EXPLAIN, what's actually happening will become completely apparent once you SHOW WARNINGS:

***************************[ 1. row ]***************************
Level   | Note
Code    | 1003
Message | /* select#1 */ select `test`.`employee`.`id` AS `id`,`test`.`employee`.`company_id` AS `company_id` from `test`.`company` = `test`.`company`.`id`) and (`test`.`company`.`date_of_incorporation` = 1996))

Of course, if this still isn't enough, you can always SELECT * FROM INFORMATION_SCHEMA.OPTIMIZER_TRACE, which, when table pullout strategy is chosen, will explain that select#2 is undergoing a transformation to semi-join, and lists company in pulled_out_semijoin_tables.