Skip to content

[4.0.1] Critical thread safety regression? #625

@hrstoyanov

Description

@hrstoyanov

Environment Details

  • EclipseStore Version: 4.0.1
  • JDK version: 25.0.2
  • OS: MacOS Tahoe

Describe the bug

I see data corruption issues in highly multi-threaded environment (Simulator app)

  1. Zombie objects.
Storage GC marking encountered zombie ObjectId 1000000000000016368
Storage GC marking encountered zombie ObjectId 1000000000000021954
Storage GC marking encountered zombie ObjectId 1000000000000016373
Storage GC marking encountered zombie ObjectId 1000000000000021950
  1. JVM crashes (MacOS-specific "Trace/BPT trap: 5 ") with no traces left behind.

  2. Exceptions.

        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.Binary.validatePostIterationState(Binary.java:2099)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.Binary.storeKeyValuesAsEntries(Binary.java:1911)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.Binary.storeKeyValuesAsEntries(Binary.java:502)
        at org.eclipse.store.gigamap@5.0.0-SNAPSHOT/org.eclipse.store.gigamap.types.AbstractBinaryHandlerAbstractBitmapIndexHashing.internalStore(AbstractBinaryHandlerAbstractBitmapIndexHashing.java:81)
        at org.eclipse.store.gigamap@5.0.0-SNAPSHOT/org.eclipse.store.gigamap.types.AbstractBinaryHandlerAbstractBitmapIndexHashing.internalStore(AbstractBinaryHandlerAbstractBitmapIndexHashing.java:34)
        at org.eclipse.store.gigamap@5.0.0-SNAPSHOT/org.eclipse.store.gigamap.types.AbstractBinaryHandlerStateChangeFlagged.store(AbstractBinaryHandlerStateChangeFlagged.java:71)
        at org.eclipse.store.gigamap@5.0.0-SNAPSHOT/org.eclipse.store.gigamap.types.AbstractBinaryHandlerStateChangeFlagged.store(AbstractBinaryHandlerStateChangeFlagged.java:37)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.BinaryStorer$Default.storeItem(BinaryStorer.java:493)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.BinaryStorer$Default.processItems(BinaryStorer.java:478)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.BinaryStorer$Default.internalStore(BinaryStorer.java:461)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.BinaryStorer$Default.store(BinaryStorer.java:500)
        at org.eclipse.serializer.persistence@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.types.PersistenceManager$Default.store(PersistenceManager.java:305)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageConnection.store(StorageConnection.java:413)
        at peruncs.eclipsestore.core/peruncs.eclipsestore.core.EclipseStoreContext.storeEach(EclipseStoreContext.java:123)
        at peruncs.eclipsestore.core/peruncs.eclipsestore.core.EclipseStoreContext.lambda$attemptCommit$0(EclipseStoreContext.java:152)
        at org.eclipse.serializer.base@5.0.0-SNAPSHOT/org.eclipse.serializer.concurrency.LockedExecutor$Default.write(LockedExecutor.java:214)
        at org.eclipse.serializer.base@5.0.0-SNAPSHOT/org.eclipse.serializer.concurrency.LockScope.write(LockScope.java:99)
        at peruncs.eclipsestore.core/peruncs.eclipsestore.core.EclipseStoreContext.attemptCommit(EclipseStoreContext.java:148)
        at peruncs.eclipsestore.core/peruncs.eclipsestore.core.EclipseStoreContext.runAndCommit(EclipseStoreContext.java:195)
        at peruncs.eclipsestore.core/peruncs.eclipsestore.core.EclipseStoreContext.runWithCommit(EclipseStoreContext.java:207)
        at tourbiz.webapp/tourbiz.webapp.simulator.SimulationBehavior.run(SimulationBehavior.java:184)
        at tourbiz.webapp/tourbiz.webapp.simulator.Simulator.lambda$run$5(Simulator.java:414)
        at java.base/java.util.concurrent.StructuredTaskScopeImpl$SubtaskImpl.run(StructuredTaskScopeImpl.java:325)
        at java.base/java.lang.VirtualThread.run(VirtualThread.java:456)
....


Error occurred in storage channel#1
org.eclipse.store.storage.exceptions.StorageExceptionConsistency: No entity found for objectId 1000000000000025337
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageEntityCollector$EntityCollectorByOid.accept(StorageEntityCollector.java:119)
        at org.eclipse.serializer.persistence.binary@5.0.0-SNAPSHOT/org.eclipse.serializer.persistence.binary.types.LoadItemsChain$ChannelHashing$ChainItemObjectIdSet.iterate(LoadItemsChain.java:324)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageChannel$Default.collectLoadByOids(StorageChannel.java:640)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageRequestTaskLoadByOids$Default.internalProcessBy(StorageRequestTaskLoadByOids.java:52)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageRequestTaskLoadByOids$Default.internalProcessBy(StorageRequestTaskLoadByOids.java:22)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageChannelTask$Abstract.processBy(StorageChannelTask.java:244)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageChannel$Default.work(StorageChannel.java:458)
        at org.eclipse.store.storage@5.0.0-SNAPSHOT/org.eclipse.store.storage.types.StorageChannel$Default.run(StorageChannel.java:541)
        at java.base/java.lang.Thread.run(Thread.java:1474)

To Reproduce

This is a large codebase, I may have the time to extract it in reproducible state, but it will take time. In the mean while, I encourage the ES team to develop a "real-world" demo, similar to Book Store, but with:

  • massively parallel operations.
  • random/chaotic timing.

Expected behavior

I think this used to work fine with 3.0.1 and it was discusses in #559 and #558 . Not sure if this is a regression in 4.0.1 or my misunderstanding.

Additional context

Here is some AI analysis, which suggest PersistenceStoring.store(..) is not thread safe (never was by design?) :

All three failures share one root cause: the EmbeddedStorageManager uses a singleton StorageConnection (line 116 of EmbeddedStorageManager.java), and its BinaryStorer is explicitly documented as not thread-safe (line 120-121 of BinaryStorer.java):

"A storer instance is never meant to be used in a mutating fashion by more than one thread"

In the Simulator, every behavior cycle creates its own EclipseStoreContext, but all contexts share the same EmbeddedStorageManager → same singletonConnection() → same PersistenceManager. When VersionedBaseEntity.Store.add() calls storeAll(entity, gigaMap), it triggers immediate serialization through this shared, non-thread-safe path.

With 20+ concurrent virtual threads in the Simulator, multiple threads call persistenceStoring.store() simultaneously, causing all three observed failures.

See the complete AI analysis here: OPUS-FIXES.md

Also, there is a contradiction with this FAQ section - it maybe refer to internal parallelization (channels for throughput), NOT about concurrent application threads sharing the same connection?

The Real FAQ Answer should perhaps be:

  • ✅ Multi-channel storage - supported internally for parallel I/O
  • ❌ Concurrent store() calls from multiple app threads on same connection - NOT supported
    as per the javadocs in the BinaryStorer code.

So, the EclipseStore documentation seems contradictory:

  1. FAQ says "multi-threaded"
  2. BinaryStorer says "not thread-safe for mutating operations"
  3. Locking page says "application must handle concurrency"

In fact, I could not find anything in the ES docs to clearly state of store operations are therad safe, or need to be guarded.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions