Skip to content

fix(distributed): Harden evaluator claim recovery and warmup gating#208

Merged
fuyu0425 merged 54 commits into
mainfrom
fix/evalautor-error
Apr 25, 2026
Merged

fix(distributed): Harden evaluator claim recovery and warmup gating#208
fuyu0425 merged 54 commits into
mainfrom
fix/evalautor-error

Conversation

@fuyu0425

Copy link
Copy Markdown
Member

Summary

  • Synchronize evaluator claim reservations and active-claim tracking so warmup only feeds durable claimed work.
  • Re-enqueue or refresh terminal verify jobs, clean up canceled verify-job residue, and fail reclaim when stale opaque wrappers remain.
  • Tighten distributed regression coverage across warmup, claim worker, queue cleanup, and run-experiment paths.

Testing

  • uv run pytest -q tests/test_evaluator_claim_worker.py tests/test_queue_cleanup.py tests/test_run_experiment_distributed.py
  • scripts/ci-tests/run-local.sh checks

Notes

  • Includes the fix/evaluator-warning branch plus a follow-up fix for stale verify-wrapper reclaim behavior on opaque queues.

@fuyu0425 fuyu0425 force-pushed the fix/evalautor-error branch from 77ada6c to bf51c56 Compare April 25, 2026 12:30
fuyu0425 added 25 commits April 25, 2026 09:15
@fuyu0425 fuyu0425 force-pushed the fix/evalautor-error branch from bf51c56 to 2456a20 Compare April 25, 2026 13:45
@fuyu0425 fuyu0425 merged commit 8e23f18 into main Apr 25, 2026
5 checks passed
@fuyu0425 fuyu0425 deleted the fix/evalautor-error branch April 25, 2026 13:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant