feat(reeval): cap per-trial POVs and add reeval merge script by acorn421 · Pull Request #211 · sslab-gatech/CRSBench

acorn421 · 2026-04-27T04:51:00Z

Summary

Bound per-trial POV verification load via discovery-time stratified sampling (CRSBENCH_REEVAL_POV_SAMPLE_SIZE, default 100; 0 disables).
Apply the cap on both sync (_reeval_bug_finding) and async (_enqueue_trial_povs) verification paths so cloud workers see the same sampled subset.
Add scripts/merge_reeval_into_source.py to overlay a reeval result tree onto the source experiment tree with a surgical pov_store.json merge that preserves original CRS timing while adopting reeval verdicts.
Update docs/design/distributed/cloud-reeval.md to document the per-trial POV cap.
Add tests/test_reeval.py covering sampling/env parsing behavior.

Description

Re-evaluation of large CRS runs was bottlenecked by trials with thousands of POVs. The cap enforces a deterministic per-trial budget by sorting POVs by mtime (a proxy for CRS discovery time), partitioning into equal time buckets, and randomly choosing one per bucket — seeded by the trial path so distributed workers agree on the subset. The merge script lets reeval results be folded back into the original experiment tree without losing original-run timing or downgrading POVs that reeval intentionally skipped (sampling) or never saw.

Commits:

d4e23beb feat(reeval): cap per-trial POVs via stratified discovery-time sampling
05e4c563 chore(reeval): lower default POV sample cap to 100
8b3068db feat: add reeval merge script

acorn421 added 3 commits April 26, 2026 05:26

feat(reeval): cap per-trial POVs via stratified discovery-time sampling

d4e23be

chore(reeval): lower default POV sample cap to 100

05e4c56

feat: add reeval merge script

8b3068d

acorn421 merged commit 43ae2c8 into main Apr 27, 2026
5 checks passed

acorn421 deleted the feat/reeval-sampling branch April 27, 2026 05:46

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(reeval): cap per-trial POVs and add reeval merge script#211

feat(reeval): cap per-trial POVs and add reeval merge script#211
acorn421 merged 3 commits into
mainfrom
feat/reeval-sampling

acorn421 commented Apr 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

acorn421 commented Apr 27, 2026

Summary

Description

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant