Skip to content

User/jp/feat parallel simulation#358

Closed
jp-fizzbee wants to merge 2 commits into
mainfrom
user/jp/feat-parallel-simulation
Closed

User/jp/feat parallel simulation#358
jp-fizzbee wants to merge 2 commits into
mainfrom
user/jp/feat-parallel-simulation

Conversation

@jp-fizzbee
Copy link
Copy Markdown
Collaborator

No description provided.

jayaprabhakar and others added 2 commits May 19, 2026 12:34
ProtoPath.filesMap is a global cache populated lazily by every Processor
through GetProtoFieldByPath. The map was written without synchronization
(only the values are immutable; the map structure mutates on insert).
Today this is benign because all callers are sequential; once parallel
simulation workers exist, two concurrent first-touches of the same file
or path will trip Go's "concurrent map writes" runtime check.

Wrap reads in a fast RLock path and writes in a Lock path with a double-
check after acquiring the write lock (in case another goroutine populated
the entry while we were computing). The expensive GetFieldByPath/
convertToProto work runs outside the write lock so workers don't serialize
on it.

Adds a concurrent test that exercises 16 goroutines × 200 iterations to
make the locking contract explicit; it currently passes either way
because Go's runtime check is timing-sensitive and the rules_go race
build setup here did not appear to wire through, but the test will start
catching regressions once we wire race detection or extend the iteration
count.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each worker is its own fizzbee subprocess, which sidesteps the
in-process shared-state hazards (roleRefs, nextChannelId, and others
not yet audited) that block goroutine-based parallelism. Tradeoff: N
process-startup overheads per run (~50ms each, negligible vs typical
sim wall time) and per-worker output dirs instead of interleaved
stdout.

Activates only when -x is set, no fixed --seed (a seed forces a
single run anyway), and --parallel > 1. Otherwise falls through to
the existing single-process invocation unchanged.

Behavior:

- Splits --max_runs evenly across workers (0 -> unlimited per worker).
- Each worker writes to <base>/parallel_<ts>/worker_<N>/ + worker_<N>.log.
- Polling loop checks every 500ms: if a worker writes the failure
  sentinel, kill surviving workers; otherwise wait for all to finish.
- Ctrl-C kills all live workers and exits 130.
- On success: aggregate "PASSED: N total runs across W workers".
- On failure: print the first failing worker's full log + output dir.

Measured: 13-02-01 two-phase-commit, 1000 sims, sequential 52.7s vs
parallel=4 19.8s on macOS arm64 (2.65x speedup).

Compatible with macOS bash 3.2 — uses only basic array, arithmetic,
and process-control features. set -e is disabled inside the block so
expected non-zero exits (kill of dead worker, grep miss) don't abort
the script; we exit explicitly at the end.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@jp-fizzbee jp-fizzbee closed this May 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants