The Concurrency Of ConcurrentHashMap
ConcurrentHashMap is an excellent point of study when learning about Java's Memory Model. It employs numerous techniques to provide lock-free reads while still guaranteeing safe publication of objects. This article represents my attempt to take the magic out of the
This article covers the
java.util.concurrent.ConcurrentHashMap as implemented in JDK7. This implementation differs slightly from previous versions and is radically different from the new implementation in JDK8 (perhaps the topic of a future article). I assume you have a basic understanding of how the hash table data structure works, as this is out of scope. Finally, you may find this article to greatest effect if you review the source code yourself. I do not include code excerpts below, as most methods require contextual knowledge of the overall class.
In a nutshell, the
ConcurrentHashMap contains an array,
segments, of "segments" (
ConcurrentHashMap.Segment). Each segment is, essentially, a hash table that contains an array,
table, of "entries" (
ConcurrentHashMap.HashEntry). Each entry holds a key, a value, and a pointer to it's next adjacent entry (effectively creating a linked list for the hash of a given key). You may consider a
ConcurrentHashMap a hash table of hash tables: each
Segment is indexed by the high-order bits of a given key and each
HashEntry is indexed by the full hash of a key. This will be discussed in later sections.
Noticing that the
ConcurrentHashMap implementation makes extensive use of arrays, I think to remind you that when reading or writing elements of an array, the per-element operation will never have volatile semantics. This is true even if the array itself is flagged as
volatile. As of JDK5, the
Atomic(Integer|Long|Reference)Array types were introduced to work around this fact. Interestingly, however, Doug Lea opted to use the less-portable
sun.misc.Unsafe HotSpot-specific class to accomplish the same goal. This was done attempting to "reduce the levels of indirection." I will attempt to bridge the gap when applicable.
What may seem to be the biggest magic of the
ConcurrentHashMap is the Javadoc: developers, when not paying close attention, may think that the
ConcurrentHashMap is a completely non-locking data structure. This is not true: modifications to the each segment will entail locking, and some reads (namely,
containsValue()) may also require a lock. This topic will be discussed throughout the following subsections.
When creating a new
ConcurrentHashMap, several items are considered. Primarily, a
ConcurrentHashMap is sized according to a
concurrencyLevel (provided by the constructor or defaulting to 16). This attribute represents the "estimated number of concurrently updating threads" (from the
ConcurrentHashMap source). The
concurrencyLevel is particularly important to correctly estimate: when under-estimating, the data structure will be under extra contention, causing threads to block when attempting to write to a currently-locked
Segment. If, instead, the concurrency level over-estimated, you encounter excessive bloat due to an unnecessary number of segments; such bloat may lead to degraded performance due to high numbers of cache misses.
The actual number of segments a
ConcurrentHashMap instance will maintain is equal to the next power of two of the specified
concurrencyLevel. For example, a
concurrencyLevel of 6 results in 8 segments; similarly, a
concurrencyLevel of 65 results in 128 segments. As mentioned previously, each
Segment is indexed by the high-order bits of a given key. The exact number of bits,
segmentShift, is determined at construction time and is a function of
concurrencyLevel: 4 bytes less the next n-th power of
concurrencyLevel (for example, the upper 25 bits will be used with a
concurrencyLevel of 128, and the upper 32bits with a
concurrencyLevel of 1).
As of JDK6,
ConcurrentHashMap lazily creates segments as necessary. Only one segment is created at construction time, and this creation is the first use of
Unsafe magic. Within the constructor, a single
s0, is created; however, this creation is slightly irregular: when
s0 is put into
ConcurrentHashMap.segments, it is done through
Unsafe.putOrderedObject(). Noticing, however, that
segments is an immutable (
final) instance field, which already guarantees transient visibility of
s0 after the final field freeze, this explicit ordering is unexpected. This may simply be chalked up to defensive programming, style consistency, or simply a result of an over-caffeinated Doug Lea. The JSR166 mailing list seems to agree this use of
putOrderedObject is not strictly necessary for this write.
Because only a single segment is created upon instantiation, each subsequent modification of the
ConcurrentHashMap implementation must ensure the target
Segment (as determined by the high-order bits of a key's hash) exists. This check is performed in the
ensureSegment(int index) method. As
ConcurrentHashMap attempts to be lock-free whenever possible, the
ensureSegment() method makes use of several techniques of concurrent programming to remain lock-free. The next several subsections attempt to explain these techniques.
segments is marked final (and thus will never change), Doug Lea prudently creates a method-local copy,
segments upon which to operate. Such defensive programming allows a programmer to not worry about otherwise-volatile instance member references changing during execution of a method (i.e. inconsistent reads). Of course, this is simply a new reference and does not prevent your method from seeing changes to the referent.
Remembering that arrays, even when
volatile, do not provide volatile semantics when reading or writing elements, accessing the k-th element of
segments requires an explicit volatile read. This volatile read is performed through the
Unsafe.getObjectVolatile() method within
ConcurrentHashMap.segmentAt() method. Again, the
ConcurrentHashMap implementation uses
Unsafe methods instead of an
AtomicReferenceArray, of which the
get method would also accomplish this task. If already created, the k-th
Segment is returned to the invoker; otherwise, a
null value is returned.
Creating Missing Segments
Segment for index
k is not found (i.e.
ensureSegment() attempts to create one. Because this algorithm is lock-free, this implementation must allow for multiple contended writers without risking lost updates. Essentially, this implementation uses the technique of optimistic locking: each writer checks to see if it won and keeps trying until it finally wins (or definitely loses).
When creating a new
segment is used as a prototype. This allows for sizing according to the current table size and
loadFactor. Once the prototype has been recorded, the implementation double checks to ensure that the k-th
Segment is still
null; if so, the new
ss, is created from the prototype. The method next enters a
while loop; this loop is, effectively, a check to determine if the k-th segment has been created by another thread.
If, upon the loop condition, the k-th segment is still
null, insertion into
segments[k] is finally attempted. This write is performed using the
Unsafe.compareAndSwapObject method, which employs a compare-and-swap (CAS) algorithm. This algorithm performs hardware-level operations to guarantee that, when multiple contended threads attempt to write to memory, a value is only written iff the current value (
null, in this case) matches the expected value (also
null). This means that when threads
t2 write to
segments[k], only one will win. The other, or both, will fail---retrying at the next loop. For example,
t1.cas(seg=s,expect=null), seeing that
segments[k] == null; meanwhile,
t2.cas(seg=s',expect=null) seeing that
segments[k] == s. In this case,
t1 wins and
t2 sees, upon next check, that it has definitely lost. This is the same implementation used in
ConcurrentHashMap.put() is invoked, it first must find the
Segment into which the entry should be written. When doing so,
put() attempts to decrease the access latency associated with volatile reads by leveraging the fact that segments, once written, do not get destroyed. As such, a memory fence can potentially be avoided by performing a normal read on
Unsafe.getObject(). The return will either be the correct
null. In the case a
null value is returned, one of two reasons may hold true:
- The specified
Segmenthas never been created (and truly is null).
- The specified
Segmenthas been created but not yet flushed from writing thread's write buffer (or loaded into the reading thread's load buffer).
Segment results in a call to
ensureSegment(), which, as previously mentioned, performs a volatile read on the necessary array element. When performing the volatile read, a
null read positively identifies a missing segment. The existing, or newly created segment (if necessary), is returned. Interestingly,
AtomicReferenceArray has no comparable method to perform a lazy read. As this type of optimization is tightly coupled with the overall design, any attempt to mimic this behaviour in your own systems should be thoroughly understood and documented. The
segment.put method is invoked to perform the actual write.
Segment.table is, itself, a hash table, each
HashEntry sits in a specific index of
table determined by the key's hash. Each element of
table represents the head node of a linked-list of
HashEntry objects. When putting a value into the hash table, the put is, effectively, an insert-or-update operation: existing nodes for the specified key are updated, and unknown entries are added.
As previously indicated, the
put operation is performed under a lock. Interestingly, the
Segment class actually extends from
ReentrantLock "... to simplify some locking and avoid separate construction." This type of lock offers the same locking semantics as a
synchronized block but offers more developer-focused capabilities (like non-blocking acquisitions with
tryAcquire()). Importantly, the locking provides a memory fence on entry and exit: all changes to a specific segment will be sequentially consistent.
When attempting to acquire the segment lock, the
put method first uses the non-blocking
tryAcquire(). If the lock cannot be acquired, the method
scanAndLockForPut() is executed. This method is quite interesting, as, instead of simply blocking (as with
lock()), it attempts to optimistically acquire the lock and incrementally determine if the specified key is already present in the map. If not, a new entry is speculatively created. This approach has two primary advantages:
- It increases the likelihood that all members of the linked list will be retained in a CPU cache (probably L1 or L2), preventing the need to do a subsequent, more costly read from L3 or memory once the lock has been acquired. This is effectively an obtuse pre-fetch.
- It decreases the amount of time (however small) required to create a new
HashEntrywhile holding a lock on the segment.
scanAndLockForPut() method uses a while loop to optimistically acquire the segment lock. Each time it is unable to acquire the lock, the applicable linked list is traversed, 1 element per lock acquisition failure. If the current element of traversal has the same key as the one being added, the traversal ends and the loop becomes a basic spin lock. Alternatively, if the key is not found in the linked list (after a complete traversal), a new
HashEntry is speculatively created on behalf of
put. At this time, the loop becomes a basic spin lock.
Once the loop becomes a spin lock, the applicable table element is interrogated to ensure the head element reference has not changed since the loop began. This would be indicative of another put, removal, or rehashing. In such a case, the traversal is restarted. Assuming, however, the head node does not change for a specified number of retries (defined as 64 for multi-core machines), the method gives up and blocks for the lock. Indeed, it is quite possible (and harmless) that the method will return prior to establishing the presence of a key.
put has full ownership of the segment lock, it traverses the applicable linked list looking for an existing
HashEntry that matches the key being inserted. If a matching entry is found, the
HashEntry volatile member,
value, is updated. Alternatively, if no element is found, a new
HashEntry is created (unless
scanAndLockForPut() returned one), setting itself as the referent of the linked list's current head node's
next member (i.e.
HashEntry.setNext(). Finally, the
HashEntry is written into the table using
setNext() perform ordered writes (rather than volatile writes) with
Unsafe.putOrderedObject(). This is done to eliminate the memory fence provided after a volatile write, because, upon release of the segment lock, a memory fence will already be inserted. As such, the volatile write semantics are unnecessary and the ordered write is sufficient. The same behaviour can be accomplished with
AtomicReferenceArray.lazySet(). Such optimizations should be heavily documented, as trivial changes could lead to race conditions.
Finally, note that
setNext() must always be performed before
setEntryAt(). If, instead, the two writes are reversed, there is a possibility that the entry may be read prior to its
next node being written. Such ordering may result in corruption of the table. This behaviour would certainly fit within the Java Memory Model, and may even be desired (though, I cannot contrive an example).
ConcurrentHashMap implementation of
remove delegates the actual removal implementation to the respective
Segment. Similarly, the segment removal implementation requires a lock to be taken out on the segment. Whereas
put acquires a lock through
scanAndLockForPut(), remove acquires a lock through
scanAndLock(), which simply attempts to keep the elements in cache to prevent cache misses once it acquires the lock. When removing, the specified element is simply orphaned by linking its two adjacent nodes using an ordered write.
ConcurrentHashMap.containsValue() have different purposes, they share a similar algorithm. Both methods require full-map traversal and come at the potential cost of locking and eager segment creation. Both methods try to perform lock-free operations; however, to prevent on-going contentions with writes, both implementations will degrade to a pessimistic lock after several failed attempts to establish a conclusive return value. Indeed, these methods may be very expensive if they have to lock the entire structure.
To establish a conclusive value, both methods attempt to perform the same operation twice; when the results of the second attempt match the results of the first attempt, the results are considered correct and returned to the invoker. As an example, the
size() implementation runs over each segment, summing up each
Segment.count instance member, which retains an up-to-date count of its
HashEntrys. Once summed, this value is retained in a method-local,
last. Subsequently, the segment counts are once again summed. If both sums are equal, the value is deemed conclusive and is returned to the invoker.
Otherwise, the method retries two more times (as defined by
ConcurrentHashMap.RETRIES_BEFORE_LOCK) to establish a conclusive value. Failure to do so results in a whole-map lock. This lock is achieved by iterating through each element in
ensureSegment(), which creates non-existent segments as necessary (hence eager creation). Each segment is locked with
Segment.lock(), and the lock is held until the method completes.
Of interest, the
Segment.count member is neither volatile nor accessed through any sort of
Unsafe magic. While this often could result in a race condition, the
ConcurrentHashMap implementation prevents such a scenario. Consider, for example, thread
size() while thread
put() executes in-between the first and second reads in
Segment.count value is still up-to-date for
t1 (hence different between reads). This is accomplished without locks by means of memory fencing (and, again, should be subject to heavy documentation).
Remembering that when
t2 unlocks the segment lock at the end of
put(), its write buffers are flushed and all writes performed within the monitor become visible to other threads. This includes the incremented
count member. Next, when
size() loads each
Segment, it uses
segmentAt(); recalling that
segmentAt() loads the specified segment with volatile read semantics, a memory fence is introduced. This results in the invalidation of
t1's load buffer and guarantees that
t1 will see the up-to-date write of
t2 (assuming no other operations occurred).
In comparisons to the previous methods, the
containsKey() methods are fairly mundane. Also unlike the previous methods, both are entirely lock-free. First, a
Segment is retrieved from the
segments array using the applicable high-order bits of the key's hash. The retrieval is performed using
Unsafe.getObjectVolatile(). Next, a
HashEntry of the segment's
table array is retrieved using the key's hash. This retrieval is also performed using
Unsafe.getObjectVolatile. From this head node, the linked list of
HashEntry objects is traversed until the specified key is found (or not found) and the applicable value is returned.
ConcurrentHashMap implementation offers numerous techniques from which Java developers can learn to write efficient, lock-free concurrent code. While the
ConcurrentHashMap source code provides fairly comprehensive documentation, I hope this article has provided you with some additional insight into the overall design.
While Doug Lea made use of numerous undocumented APIs, most instances have public API counterparts. I would encourage you to use these public APIs, as they are less likely to change across releases. Finally, regardless which APIs you use, I also encourage you to heavily document any assumptions or preconditions you make in your code. As seen in several instances above, seeming-trivial changes may cause unexpected, difficult-to-detect behaviour within your system.