Order-preserving CommonDictionaryStorage#1908
Conversation
I don't think I altered the locking strategy that was currently in place, however now that I look at it I think switching Anyway, I need to review this myself since I wrote it two years ago and am a bit foggy on the details. But I would still appreciate a review if you feel like providing one. Side note, |
BCSharp
left a comment
There was a problem hiding this comment.
Well, unfortunately, I think the way it currently is, it is not thread-safe. The main challenge is because of the change from one structure containing data (Bucket[] _buckets) to two (int[] _indices and List<Bucket> _buckets). What was before an atomic operation on _buckets, now is sometimes split into two that are not atomic.
For instance, the call TryGetValue(_buckets, key, out value) would atomically grab field _buckets, and as long as operations that may disrupt readers were done on a new array, which then would be atomically assigned to the field, things were fine.
Now, the call TryGetValue(_indices, _buckets, key, out value) grabs two fields: _indices and _buckets in indeterminate order, that may not be in sync witch each other.
There are various ways of handling it; in my edit suggestions I propose the way based on the following rules:
- There is never a situation that
_indicespoint to some bucket that does not exist. The opposite is OK - there may be a bucket that is not indexed yet.- If something is added, it is first added to
_buckets, and only after to_indices. - If something is removed, it is first removed from
_indices, only then from_buckets, and not really removed but marked asDUMMY/Bucket.Removed. - List
_bucketsnever gets shorter (except forClear()- special case).
- If something is added, it is first added to
- If both
_indicesand_bucketshave to be replaced by new objects (e.g. to shorten them like inClear()), it is done in a predictable order to ensure Rule 1 is always satisfied.- If they are reset,
_indicesare reset first, then_buckets. - If they are read,
_bucketsare read first, then_indices.
- If they are reset,
Rule 2 is to ensure that we never get _indices that refer to _buckets that are not (or no longer) around. In other words, if we get a torn read of _indices and _buckets for the lookup, it is always new _indices and old _buckets.
To ensure proper ordering, Thread.MemoryBarrier() was needed in more places than before.
In my code suggestions I tried to demonstrate what I mean by all this. I hope I got all the relevant places, but don't take my word for it.
| => TryGetValue(_indices, _buckets, key, out value); | ||
|
|
||
| /// <summary> | ||
| /// Static helper to try and get the value from the dictionary. |
| /// Used so the value lookup can run against a buckets while a writer | ||
| /// replaces the buckets. |
There was a problem hiding this comment.
This description seems obsolete.
| // we need to clone the buckets so any lock-free readers will only see | ||
| // the old buckets which are homogeneous | ||
| _buckets = (Bucket[])_buckets.Clone(); | ||
| _indices = (int[])_indices.Clone(); |
There was a problem hiding this comment.
| _indices = (int[])_indices.Clone(); | |
| _buckets = new List<Bucket>(_buckets); |
| _indices = new int[(int)(size / Load) + 1]; | ||
| _indices.AsSpan().Fill(FREE); |
There was a problem hiding this comment.
| _indices = new int[(int)(size / Load) + 1]; | |
| _indices.AsSpan().Fill(FREE); | |
| var newIndices = new int[(int)(size / Load) + 1]; | |
| newIndices.AsSpan().Fill(FREE); | |
| _indices = newIndices; |
| indices[pair.Key] = buckets.Count; | ||
| buckets.Add(bucket); | ||
| _count++; |
There was a problem hiding this comment.
| indices[pair.Key] = buckets.Count; | |
| buckets.Add(bucket); | |
| _count++; | |
| _count++; | |
| buckets.Add(bucket); | |
| Thread.MemoryBarrier(); | |
| indices[pair.Key] = buckets.Count; |
| _indices[pair.Key] = DUMMY; | ||
| _buckets[pair.Value] = Bucket.Removed; | ||
| Thread.MemoryBarrier(); |
There was a problem hiding this comment.
| _indices[pair.Key] = DUMMY; | |
| _buckets[pair.Value] = Bucket.Removed; | |
| Thread.MemoryBarrier(); | |
| _indices[pair.Key] = DUMMY; | |
| Thread.MemoryBarrier(); | |
| _buckets[pair.Value] = Bucket.Removed; |
| public override bool TryGetValue(object key, out object value) | ||
| => TryGetValue(_indices, _buckets, key, out value); |
There was a problem hiding this comment.
| public override bool TryGetValue(object key, out object value) | |
| => TryGetValue(_indices, _buckets, key, out value); | |
| public override bool TryGetValue(object key, out object value) { | |
| var buckets = _buckets; | |
| Thread.MemoryBarrier(); | |
| var indices = _indices; | |
| return TryGetValue(indices, buckets, key, out value); | |
| } |
| _indices = new int[8]; | ||
| _indices.AsSpan().Fill(FREE); | ||
| _buckets.Clear(); |
There was a problem hiding this comment.
_buckets.Clear() is not enough. A new list object is needed not to mess up readers in progress.
| _indices = new int[8]; | |
| _indices.AsSpan().Fill(FREE); | |
| _buckets.Clear(); | |
| var newIndices = new int[InitialBucketSize]; | |
| newIndices.AsSpan().Fill(FREE); | |
| _indices = newIndices; | |
| Thread.MemoryBarrier(); | |
| _buckets = new List<Bucket>(); |

Cherry-picked my PR for 3.6. It's a nice feature to have so why not include it in 3.4...