Skip to content

GO Feature Flag provider IN_PROCESS mode fails under concurrent access #1775

@ciaran-mcquade

Description

@ciaran-mcquade

The WASM evaluation engine used by IN_PROCESS mode shares a single instance across all threads. When multiple threads evaluate flags at the same time, the WASM memory gets corrupted and every evaluation fails.

Single-threaded evaluation works. REMOTE evaluation works regardless of concurrency.

Environment

  • go-feature-flag provider 1.1.1
  • OpenFeature SDK 1.16.0
  • Relay proxy v1.52.1
  • Java 11 (Temurin 11.0.30)
  • Linux amd64

Suspected Cause

EvaluationWasm.evaluate() calls malloc, memory.write, evaluate, memory.readString, and free on a single shared Instance with no synchronization:

https://github.com/open-feature/java-sdk-contrib/blob/main/providers/go-feature-flag/src/main/java/dev/openfeature/contrib/providers/gofeatureflag/wasm/EvaluationWasm.java#L106-L141

When multiple threads call evaluate() concurrently, their malloc/write/evaluate/read/free sequences interleave on the same linear memory and corrupt each other.

Workaround

Synchronizing access to the client on our side. This works but limits evaluation to one thread at a time.

Reproduction

Start a relay proxy with any flag:

# flags.yaml
test-flag:
  variations:
    enabled: true
    disabled: false
  defaultRule:
    variation: enabled
# docker-compose.yaml
services:
  relay-proxy:
    image: gofeatureflag/go-feature-flag:v1.52.1
    ports:
      - "1031:1031"
    volumes:
      - ./goff-config.yaml:/config/goff-proxy.yaml
      - ./flags.yaml:/config/flags.yaml
    command: ["/go-feature-flag", "--config", "/config/goff-proxy.yaml"]
# goff-config.yaml
listen: 1031
pollingInterval: 5000
retriever:
  kind: file
  path: /config/flags.yaml

Then run this (dependencies: dev.openfeature:sdk:1.16.0, dev.openfeature.contrib.providers:go-feature-flag:1.1.1):

import dev.openfeature.contrib.providers.gofeatureflag.GoFeatureFlagProvider;
import dev.openfeature.contrib.providers.gofeatureflag.GoFeatureFlagProviderOptions;
import dev.openfeature.contrib.providers.gofeatureflag.bean.EvaluationType;
import dev.openfeature.sdk.*;

import java.util.concurrent.CountDownLatch;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.atomic.AtomicInteger;

public class ReproInProcess {
    public static void main(String[] args) throws Exception {
        String endpoint = args.length > 0 ? args[0] : "http://localhost:1031";

        GoFeatureFlagProvider provider = new GoFeatureFlagProvider(
                GoFeatureFlagProviderOptions.builder()
                        .endpoint(endpoint)
                        .evaluationType(EvaluationType.IN_PROCESS)
                        .disableDataCollection(true)
                        .build());

        OpenFeatureAPI api = OpenFeatureAPI.getInstance();
        api.setProviderAndWait("test", provider);
        Client client = api.getClient("test");

        // Single-threaded works fine
        FlagEvaluationDetails<Boolean> d = client.getBooleanDetails(
                "test-flag", false, new MutableContext("user-1"));
        System.out.println("Single-threaded: value=" + d.getValue()
                + " reason=" + d.getReason() + " error=" + d.getErrorMessage());

        // Concurrent access crashes
        int threads = 20, evalsPerThread = 100;
        AtomicInteger errors = new AtomicInteger();
        CountDownLatch gate = new CountDownLatch(1);
        CountDownLatch done = new CountDownLatch(threads);
        ExecutorService exec = Executors.newFixedThreadPool(threads);

        for (int t = 0; t < threads; t++) {
            int tid = t;
            exec.submit(() -> {
                try {
                    gate.await();
                    for (int i = 0; i < evalsPerThread; i++) {
                        FlagEvaluationDetails<Boolean> r = client.getBooleanDetails(
                                "test-flag", false,
                                new MutableContext("user-" + tid + "-" + i));
                        if (r.getErrorMessage() != null) {
                            if (errors.incrementAndGet() <= 3)
                                System.out.println("ERROR: " + r.getErrorMessage());
                        }
                    }
                } catch (Exception e) { errors.incrementAndGet(); }
                finally { done.countDown(); }
            });
        }

        gate.countDown();
        done.await();
        exec.shutdown();

        int total = threads * evalsPerThread;
        System.out.println("Concurrent: " + errors.get() + "/" + total + " failed");
        api.shutdown();
    }
}

Output

Single-threaded: value=true reason=STATIC error=null
panic: runtime error: type assert failed
panic: free: invalid pointer
ERROR: out of bounds memory access: attempted to access address: -192 but limit is: 1073741824 and size: 4
ERROR: arraycopy: last destination index 33554432 out of bounds for byte[16777216]
ERROR: Trapped on unreachable instruction
Concurrent: 2000/2000 failed

The provider reports PROVIDER_READY and syncs config successfully. There is no indication anything is wrong other than flags returning the SDK default value.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions