Fun MySQL fact of the day:
ON DUPLICATE KEY LEAK DATA
As developers, part of our job is ensuring system stability long after we are finished writing code. It is our responsibility to ensure the next developer in our place doesn't inherit a system littered with landmines and that our customers' data integrity and privacy is always a top priority. Today's Fun MySQL fact of the day is going change gears and focus on a question that I am frequently asked: should I use
INSERT ... ON DUPLICATE KEY UPDATE. In general, my answer is "no," and this is especially true in any multi-tenant database. To find out why, let's look at a simple example that is somewhat representative of a not-terribly-uncommon table:
CREATE TABLE secrets( id INT AUTO_INCREMENT PRIMARY KEY, global_secret_number INT NOT NULL UNIQUE, account_id INT NOT NULL UNIQUE, secret VARCHAR(50) NOT NULL ); INSERT INTO secrets VALUES(0, 1, 10, 'account 10''s secret'); INSERT INTO secrets VALUES(0, 2, 20, 'account 20''s secret');
In this example,
account_id both have separate
UNIQUE secondary indices. This means no two rows in the entire table may have the same value for either column. In InnoDB, indexes are created in the order they are specified, and, when inserting data, InnoDB inserts into the
PRIMARY index first (where the data is stored), followed by the rest of the indices in the order in which they were created.
Sometimes, we'll want to insert a row that might already exists; and, in some of those cases, we may just want to update it to bring it up to date. Enter
INSERT ... ON DUPLICATE KEY UPDATE, which does exactly that on either the primary key or the first
UNIQUE index that matches. Let's try:
INSERT INTO secrets(global_secret_number, account_id, secret) VALUES(1, 20, 'account 20''s new secret') ON DUPLICATE KEY UPDATE secret = 'account 20''s new secret';
When we run this, we may expect a couple things:
- an error because there is already a secret for
- an error because there is already a
global_secret_id=1, or most likely
secretto change iff there is a
global_secret_id=1,account_id=20, otherwise nothing.
But what will happen is probably not what we want:
account_id=10 gets the new secret and we have a data leakage incident. Worse,
account_id=20 likely won't ever know! That is not fun.
So, even if your schema doesn't allow for this behaviour today, it might in the future. So, keep the next person in your thoughts and, instead of using an
ON DUPLICATE KEY UPDATE, do a
SELECT (preferably on a replica) and
UPDATE by primary key accordingly, safely handling the rare duplicate key error in your code.