Skip to content

fix/invalid-state-transitiondevelop#1288

Closed
lbeckman314 wants to merge 1 commit into
developfrom
fix/invalid-state-transition
Closed

fix/invalid-state-transitiondevelop#1288
lbeckman314 wants to merge 1 commit into
developfrom
fix/invalid-state-transition

Conversation

@lbeckman314

@lbeckman314 lbeckman314 commented Dec 22, 2025

Copy link
Copy Markdown
Contributor

Overview 🌀

This PR resolves the Invalid State Transition error when resubmitting tasks via an external retry mechanism (e.g. K8s BackoffLimit).

Current Behavior ⚠️

s3-invalid.json

{
  "name": "S3 Storage example (invalid)",
  "description": "Task inputs and outputs can be Cloud Storage URLs (Invalid Test)",
  "executors": [
    {
      "image": "ubuntu",
      "command": ["md5sum", "/tmp/README.md"]
    }
  ],
  "inputs": [
    {
      "name": "input",
      "description": "Download a file from S3 Storage",
      "url": "s3://funnel-testing-east/ERROR",    <----- Non-existent object set here
      "path": "/tmp/README.md"
    }
  ]
}

➜ funnel task create examples/s3-invalid.json
<TASK ID>

➜ funnel task get <TASK ID> --view MINIMAL
{
  "id":  "<TASK ID>",
  "state":  "SYSTEM_ERROR"
}

➜ kubectl get jobs/<TASK ID>
NAME          COMPLETIONS   DURATION
<TASK ID>     0/1           13m     <---- Worker

➜ kubectl logs jobs/<TASK ID>
{"error":"invalid state transition from SYSTEM_ERROR to INITIALIZING","msg":"error writing event"}
{"error":"genericS3: stat object ERROR in bucket funnel-testing-east: The specified key does not exist."}
{"msg":"TASK_STATE","ns":"worker","state":"SYSTEM_ERROR","taskID":"<TASK ID>"}

Note

Tested with Helm Charts 0.1.71 (2025-12-14) and 0.1.75 (2025-12-22):

➜ helm repo update ohsu
Update Complete. ⎈Happy Helming!⎈

➜ helm search repo funnel --versions
NAME            CHART VERSION   APP VERSION     DESCRIPTION
ohsu/funnel     0.1.75          2025-12-22      A toolkit for distributed task execution ⚙️
...
ohsu/funnel     0.1.71          2025-12-14      A toolkit for distributed task execution ⚙️

➜ helm upgrade --install funnel ohsu/funnel -f values.yaml --version 0.1.71
Release "funnel" has been upgraded. Happy Helming!

Copilot AI review requested due to automatic review settings December 22, 2025 23:01
@lbeckman314 lbeckman314 self-assigned this Dec 22, 2025
@netlify

netlify Bot commented Dec 22, 2025

Copy link
Copy Markdown

Deploy Preview for funnel-dev ready!

Name Link
🔨 Latest commit b09cdc0
🔍 Latest deploy log https://app.netlify.com/projects/funnel-dev/deploys/6949cdc94f782800080bffaa
😎 Deploy Preview https://deploy-preview-1288--funnel-dev.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@lbeckman314 lbeckman314 added the blocker blocks user interactions label Dec 22, 2025
@lbeckman314 lbeckman314 moved this to In Progress in Funnel Dec 22, 2025

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds TODO comments to investigate invalid state transition errors that occur after valid worker restarts in Kubernetes environments. The changes are purely documentation-focused, adding context for future investigation without modifying any code logic.

  • Added TODO comments identifying potential sources of invalid state transition errors during K8s worker restarts
  • Documented concerns about retry policy interactions with K8s BackoffLimit settings
  • Added note about removing K8s artifacts in favor of Helm configurations

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File Description
tes/states.go Added TODO noting that the TransitionError is thrown after valid worker restarts
database/mongodb/events.go Added TODO suggesting this retrier logic may be the source of invalid state transition errors and noting interaction with K8s BackoffLimit
database/boltdb/events.go Added comment explaining executor restart behavior and terminal state transitions
config/kubernetes/executor-job.yaml Added TODO to remove K8s artifacts in favor of Helm configs

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread database/boltdb/events.go
Comment on lines +176 to +178
// In case where restarts of executor are allowed, this line should not be encountered
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
// but if a retry mechanism would work (e.g. unavailable for a few seconds), then Task Worker should allow Executor to restart as expected.

Copilot AI Dec 22, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This comment is difficult to parse and unclear. Consider breaking it into multiple sentences or restructuring for clarity. The relationship between "restarts of executor", "invalid object", "SYSTEM_ERROR", and "retry mechanism" is confusing as currently written.

Suggested change
// In case where restarts of executor are allowed, this line should not be encountered
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
// but if a retry mechanism would work (e.g. unavailable for a few seconds), then Task Worker should allow Executor to restart as expected.
// When executor restarts are allowed, this branch should not be reached.
// For example, repeatedly accessing an invalid object should always result in SYSTEM_ERROR (a terminal state) with no retries.
// In contrast, transient errors that can be retried (e.g. a service being unavailable for a few seconds) should be handled before the task reaches a terminal state, allowing the executor to restart as expected.

Copilot uses AI. Check for mistakes.
Comment thread database/boltdb/events.go
current, target)

// In case where restarts of executor are allowed, this line should not be encountered
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,

Copilot AI Dec 22, 2025

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spelling error: "shpuld" should be "should"

Suggested change
// e.g. Accessing invalid object shpuld be SYSTEM_ERROR every time,
// e.g. Accessing invalid object should be SYSTEM_ERROR every time,

Copilot uses AI. Check for mistakes.
@lbeckman314 lbeckman314 changed the title chore: Add TODOs to investigate invalid state transition errors fix/invalid-state-transition -> develop Dec 23, 2025
@lbeckman314 lbeckman314 changed the title fix/invalid-state-transition -> develop fix/invalid-state-transitiondevelop Dec 23, 2025
@lbeckman314

Copy link
Copy Markdown
Contributor Author

Closing in favor of #1392 but can be re-open if needed!

@lbeckman314 lbeckman314 closed this May 4, 2026
@lbeckman314 lbeckman314 deleted the fix/invalid-state-transition branch June 11, 2026 18:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

blocker blocks user interactions bug

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants