Fun MySQL fact of the day: group commit

We now know that each MySQL thread has its own binary log cache to which it writes the binary log events for a single transaction. We also briefly discussed that MySQL will, upon commit of a transaction, write the thread's binary log cache to the actual binary log file. And we've even discussed that, to improve durability, MySQL may sync the binary log file after each write. This, you see, is where it all gets fun.

Let's begin by imagining an implementation in which each time a transaction commits, the committing thread is responsible for serialising its own binary log cache into the binary log file. Seems simple enough, right? Well, let's think about it: each time a thread needs to write its binary log, it would need to safely acquire a handle to the binary log file. And it would need to ensure that its writes didn't conflict or happen out-of-order with any other writes. And, finally, it needs to sync the binary log file to ensure durability. And, well, that's a simple problem, too, right? We could stick a big, fat lock in the code around the critical section. Problem solved, yea? How well do you suspect this would scale across, say, 1000 concurrent transactions all committing at the same time, all needing to sync to disk? Pretty poorly, I'd say, since that's a lot of contention with necessarily-expensive CPU/memory operations and even more I/O. Maybe, then, you'd be surprised to know this was MySQL's implementation until MySQL 5.6. Fun!

In MySQL 5.6 (and in some earlier value-added forks such as Percona Server 5.5.18-23 and MariaDB 5.3), a change called "group commit" was made to the binary log serialization implementation. The main goal of group commits is to reduce the number of disk syncs and lock contention as much as possible. For example, if the same 1000 transactions all commit at the same time, wouldn't it make sense to write all 1000 of the transactions' binary log events at the same time from one thread, syncing the file only 1 time at the end? And that's exactly what group commits try to accomplish.

If we look at the high-level algorithm for a group commit implementation (using the Percona Server 5.5.18-23 implementation as an example), when a transaction commits, the committing thread will attempt to acquire a lock guarding a "group commit queue". When the thread acquires the lock, the thread adds its binary log cache events and then releases the lock right away. Next, the thread will determine if it was the first thread to enqueue its binary log events into the group commit queue, and, if so, the thread will become the self-appointed "group commit leader"; otherwise, the thread will just stop and wait. Patiently. We'll come back to this later, but at this point, no locks are held and no threads are blocked. That's good!

Now, if the thread is the group commit leader, the next step for it is to acquire another, coarser lock guarding the binary log file itself. This is a low-contention lock with, at most, 1 thread waiting for it. When the leader thread acquires this binary log file lock, it will re-acquire the group commit queue lock before draining the group commit queue into a thread-local queue and subsequently releasing the lock group commit queue lock. At this instant, the leader thread holds the binary log lock, the shared group commit queue is empty and unlocked, and the non-leader threads are still just waiting. Any new group commit enqueues "just happen", and the first thread to do it, again, appoints itself as the "group commit leader" for its own, now-forming group while it waits to acquire the binary log file lock.

At this point, the leader thread with the binary log file lock reads from its thread-local copy of the group commit queue for which it is responsible for writing into the binary log file. And, after writing the events into the binary log file, the leader thread syncs the binary log according to sync_binlog (which is, in fact, a counter!) and then unlocks the binary log file lock and then proceeds to "notify" the threads that had been waiting for their enqueued events to be written. At this point, another group commit leader may already be handling its own group. And so on. And so on. And on and on.

That's the gist of group commit, anyways, but there's still more fun to cover next week.