Fun MySQL fact of the day: ON DUPLICATE KEY LEAK DATA

As developers, part of our job is ensuring system stability long after we are finished writing code. It is our responsibility to ensure the next developer in our place doesn't inherit a system littered with landmines and that our customers' data integrity and privacy is always a top priority. Today's Fun MySQL fact of the day is going change gears and focus on a question that I am frequently asked: should I use INSERT ... ON DUPLICATE KEY UPDATE. In general, my answer is "no," and this is especially true in any multi-tenant database. To find out why, let's look at a simple example that is somewhat representative of a not-terribly-uncommon table:

CREATE TABLE secrets(
    id INT AUTO_INCREMENT PRIMARY KEY,
    global_secret_number INT NOT NULL UNIQUE,
    account_id INT NOT NULL UNIQUE,
    secret VARCHAR(50) NOT NULL
);
INSERT INTO secrets VALUES(0, 1, 10, 'account 10''s secret');
INSERT INTO secrets VALUES(0, 2, 20, 'account 20''s secret');

In this example, global_secret_number and account_id both have separate UNIQUE secondary indices. This means no two rows in the entire table may have the same value for either column. In InnoDB, indexes are created in the order they are specified, and, when inserting data, InnoDB inserts into the PRIMARY index first (where the data is stored), followed by the rest of the indices in the order in which they were created.

Sometimes, we'll want to insert a row that might already exists; and, in some of those cases, we may just want to update it to bring it up to date. Enter INSERT ... ON DUPLICATE KEY UPDATE, which does exactly that on either the primary key or the first UNIQUE index that matches. Let's try:

INSERT INTO secrets(global_secret_number, account_id, secret)
VALUES(1, 20, 'account 20''s new secret')
ON DUPLICATE KEY UPDATE secret = 'account 20''s new secret';

When we run this, we may expect a couple things:

But what will happen is probably not what we want: account_id=10 gets the new secret and we have a data leakage incident. Worse, account_id=20 likely won't ever know! That is not fun.

So, even if your schema doesn't allow for this behaviour today, it might in the future. So, keep the next person in your thoughts and, instead of using an ON DUPLICATE KEY UPDATE, do a SELECT (preferably on a replica) and INSERT or UPDATE by primary key accordingly, safely handling the rare duplicate key error in your code.