This repository was archived by the owner on May 19, 2026. It is now read-only.
Conversation
luhenry
commented
May 15, 2026
A workflow_job sometimes stays status=queued on GitHub forever after
its parent workflow run terminates -- e.g. when a sibling fails fast
or the run is cancelled before scheduling reaches the job. The
scheduler used to keep trying to provision a runner for those jobs
indefinitely.
sync_jobs now probes the parent run when GitHub still reports the job
queued. If the run is completed, mark the job failed with a v1 record
that captures the run's conclusion. The probe is gated on
JobStuckQueuedMinAge (10 min in internal/constants.go) so the normal
freshly-queued window is not charged for an extra GitHub API call
every reconcile cycle.
Side fixes hit while wiring this up:
- internal.FailureInfo gained a Message field so v1 rows
({version:1, message:"..."}) match the legacy on-disk shape
templates_test.go has been pinning all along. The two existing v1
callsites in sync_jobs (installation 404, job 404) were stuffing
the human message into the typed Reason field; switch them to
Message. Document the v1 vs v2 split in the struct doc.
- Extend internal.GHJob with the run_id field GitHub already returns,
and add GHRun + GitHubClient.GetRunInfo for the new probe.
- Wire OnGetRunInfo through FakeGH for tests.
https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
A workflow_job sometimes stays status=queued on GitHub forever after its parent workflow run terminates -- e.g. when a sibling fails fast or the run is cancelled before scheduling reaches the job. The scheduler used to keep trying to provision a runner for those jobs indefinitely.
sync_jobs now probes the parent run when GitHub still reports the job queued. If the run is completed, mark the job failed with a v1 record that captures the run's conclusion. The probe is gated on JobStuckQueuedMinAge (10 min in internal/constants.go) so the normal freshly-queued window is not charged for an extra GitHub API call every reconcile cycle.
Side fixes hit while wiring this up:
https://claude.ai/code/session_01Vda2TpwJnGYRYuw1Xg46Da