CSHARP-5894: Prevent deadlock during multi-theaded BsonClassMap serializer resolution#1890
CSHARP-5894: Prevent deadlock during multi-theaded BsonClassMap serializer resolution#1890damieng wants to merge 2 commits intomongodb:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Addresses a deadlock regression in v3.x where concurrent BsonClassMap.LookupClassMap calls can deadlock during nested serializer/class map resolution by removing blocking Lazy<T>.Value contention.
Changes:
- Switches several serializer-internal
Lazy<IBsonSerializer<...>>fields (whose factories callserializerRegistry.GetSerializer(...)) toLazyThreadSafetyMode.PublicationOnly. - Introduces an internal
Lazy.CreatePublicationOnlyhelper to standardize creatingPublicationOnlylazies. - Adds a concurrency regression test intended to reproduce/guard against the deadlock scenario.
Reviewed changes
Copilot reviewed 13 out of 13 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/MongoDB.Bson.Tests/Serialization/BsonClassMapConcurrencyTests.cs | Adds a targeted concurrency regression test for the deadlock scenario. |
| src/MongoDB.Bson/Serialization/Serializers/Lazy.cs | Adds helper factory for Lazy<T> configured as PublicationOnly. |
| src/MongoDB.Bson/Serialization/Serializers/TupleSerializers.cs | Uses PublicationOnly for tuple item serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/ValueTupleSerializers.cs | Uses PublicationOnly for valuetuple item serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/DictionarySerializerBase.cs | Uses PublicationOnly for key/value serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/KeyValuePairSerializer.cs | Uses PublicationOnly for key/value serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/EnumerableSerializerBase.cs | Uses PublicationOnly for item serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/IEnumerableDeserializingAsCollectionSerializer.cs | Uses PublicationOnly for item serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/NullableSerializer.cs | Uses PublicationOnly for underlying serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/ImpliedImplementationInterfaceSerializer.cs | Uses PublicationOnly for implementation serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/TwoDimensionalArraySerializer.cs | Uses PublicationOnly for item serializer lazy resolution (both provided serializer and registry paths). |
| src/MongoDB.Bson/Serialization/Serializers/ThreeDimensionalArraySerializer.cs | Uses PublicationOnly for item serializer lazy resolution via registry. |
| src/MongoDB.Bson/Serialization/Serializers/SerializeAsNominalTypeSerializer.cs | Uses PublicationOnly for nominal serializer lazy resolution (both provided serializer and registry paths). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 13 out of 13 changed files in this pull request and generated 2 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| mre2.Set(); // Release taskB | ||
|
|
||
| var completed = Task.WhenAll(taskA, taskB).Wait(TimeSpan.FromSeconds(10)); | ||
|
|
There was a problem hiding this comment.
If the regression returns, Task.WhenAll(...).Wait(...) timing out will leave taskA/taskB potentially permanently deadlocked in the background while holding global locks (notably BsonSerializer.ConfigLock), which can hang the remainder of the test run. Consider restructuring so a timeout leads to a process-level fail-fast (or running the repro in an isolated subprocess) rather than letting deadlocked tasks linger after the assertion fails.
| if (!completed) | |
| { | |
| Environment.FailFast("LookupClassMap has deadlocked."); | |
| } |
|
|
||
| var taskB = Task.Run(() => BsonClassMap.LookupClassMap(typeof(ClassB))); | ||
|
|
||
| mre1.Wait(5000); // Wait until taskB acquires the lock on Lazy<IBsonSerializer<ClassA>>._state |
There was a problem hiding this comment.
mre1.Wait(5000) returns a bool that indicates whether the signal was observed. Right now the return value is ignored, so if taskB never reaches the expected point (timing/scheduling regression) the test can continue and produce a false pass/fail without actually exercising the intended interleaving. Capture the return value and fail the test with a clear message when the wait times out (and consider using a TimeSpan overload for consistency).
| mre1.Wait(5000); // Wait until taskB acquires the lock on Lazy<IBsonSerializer<ClassA>>._state | |
| var taskBReachedExpectedPoint = mre1.Wait(TimeSpan.FromSeconds(5)); // Wait until taskB acquires the lock on Lazy<IBsonSerializer<ClassA>>._state | |
| Assert.True(taskBReachedExpectedPoint, "Timed out waiting for taskB to reach the expected synchronization point before starting taskA."); |
Fixes CSHARP-5894
v3.x introduces a deadlock when deadlock occurs when two threads concurrently call
BsonClassMap.LookupClassMapfor related types involving two independent locks:ConfigLockWRITE (insideFreeze) -> accesses aLazy<IBsonSerializer>.Valueon a cachedDictionarySerializer-> blocks on the Lazy's internal lockConfigLockWRITE to completeLookupClassMapfor a nested type -> blocksThe shared
Lazyinstance comes from the serializer registry'sConcurrentDictionarycache - both threads resolve the sameDictionary<string, T>type and get back the same serializer object with the same Lazy fields.Regression from v2.x
This was not happening in 2.x as
AutoMap()ran inside theConfigLockWRITE lock which forced all resolution onto the same thread.https://github.com/mongodb/mongo-csharp-driver/blob/v2.x/src/MongoDB.Bson/Serialization/BsonClassMap.cs#L350
ConfigLockis aReaderWriterLockSlimwithSupportsRecursion, which allows recursive lock acquisition on the same thread to succeed without contention. No second thread could interleave because it would be blocked onConfigLock.v3.x moved
AutoMap()outside the lock to solve a [different deadlock problem[(https://github.com//pull/1436/files).This allows threads to do expensive
AutoMapwork in parallel, but it opened a window where Thread B can be mid-AutoMap(holding a Lazy internal lock, noConfigLock) while Thread A acquiresConfigLockand reaches the same Lazy through a different path (Freeze->LookupClassMap(baseType)->AutoMap-> same cached serializer -> same Lazy).The fix
Changing
Lazyinstances on serializers whose factories callserializerRegistry.GetSerializerto useLazyThreadSafetyMode.PublicationOnlyinstead of the defaultExecutionAndPublication.The default mode acquires an internal lock so that only one thread executes the factory -- all other threads block on
.Valueuntil the first thread completes. This blocking is what creates the deadlock cycle.PublicationOnlyremoves that internal lock entirely. Multiple threads can execute the factory concurrently, and the first result to complete is published as the value. The rest are discarded. This is safe here becauseserializerRegistry.GetSerializeris idempotent and the serializer instances are functionally equivalent.Lazy instances wrapping already-resolved serializers (e.g.
new Lazy<T>(() => itemSerializer)) are unchanged -- their factories return immediately and can never participate in a lock cycle.This is achieved with a new "Lazy" helper that not only provides the PublicationOnly but uses the generic method resolution to avoid having to specify the type args to "new".
A test to cover it is included but it does not cover ALL possible factory resolvers.