Fun MySQL fact of the day: tenant lock-out

A couple weeks ago, we very briefly touched on the subject about MySQL/InnoDB locking only single records. And since we're on the topic of multi-tenant schemas, I think it's a good time to focus on this subject. For now, we'll focus on using MySQL's default isolation, REPEATABLE READ. In MySQL, REPEATABLE READ prevents phantom reads, even though it is not a defined as a property of the "well-known" REPEATABLE READ. In simple terms, a phantom read happens when the same query run twice within a transaction sees different results. To see why this all matters, let's look at a fairly common multi-tenant table:

CREATE TABLE list_of_things(
    id INT AUTO_INCREMENT PRIMARY KEY,
    account_id INT NOT NULL, # REFERENCES account(id)
    modified DATETIME NOT NULL DEFAULT now(),
    thing VARCHAR(18) NOT NULL
);
INSERT INTO list_of_things(account_id, thing) VALUES(1, 'account 1''s thing');
INSERT INTO list_of_things(account_id, thing) VALUES(5, 'account 5''s thing');
INSERT INTO list_of_things(account_id, thing) VALUES(9, 'account 9''s thing');

In this table, each account_id may have multiple things, and each thing has a timestamp when it was last modified. Now, imagine several transactions from different accounts try to update their own rows:

account 5> BEGIN; UPDATE list_of_things SET modified = now() WHERE account_id = 5; # ...
account 9> BEGIN; UPDATE list_of_things SET modified = now() WHERE account_id = 9; # ...
account 1> BEGIN; UPDATE list_of_things SET modified = now() WHERE account_id = 1; # ...

Naturally, we'd expect that these UPDATEs wouldn't get in each other's way, but what actually happens? After account_id=5 runs its UPDATE, the entire table will be locked. Why? Simply, to prohibit phantom reads, an InnoDB transaction must lock each record it encounters, and because the only index InnoDB can use on this table is the PRIMARY key, InnoDB is forced to perform a full table scan, starting from before the first record to after the last record, locking each and every record as it goes, sequentially searching for all records where account_id=5.

Can we fix this? Yes, in several ways, but we'll start very simple today:

CREATE INDEX listofthings_accountid_idx ON list_of_things(account_id);

How does this help? Instead of doing a full index scan, this index will probabilistically be used to find the necessary records. Does this completely fix the problem? Nope, but the index is still extremely helpful to prevent locking the entire table, effectively serializing all changes.