Skip to content

Conversation

@darkmoon-o
Copy link

@darkmoon-o darkmoon-o commented Feb 6, 2026

Description

Fixes “task lock not found” errors by making lock acquisition/release idempotent and improving retry handling. This prevents failed runs when a lock record is missing or already released.

What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Copy link
Collaborator

@bytecii bytecii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can also add some unit tests for this. Thanks!

question: tempMessageContent,
task_id: nextTaskId,
attaches: improveAttaches,
});
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we do something like this?

if (res?.code === 410) {
  // Re-establish SSE + task lock via POST /chat
  await chatStore.startTask(nextTaskId, ...);
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need that because when the improve endpoint returns 410, we fall back to startTask() which reestablishes both the SSE consumer and the task lock.

@darkmoon-o darkmoon-o force-pushed the fix/1158-task-lock-not-found branch from ea25ee1 to 4cc1802 Compare February 7, 2026 11:59
@darkmoon-o
Copy link
Author

Maybe we can also add some unit tests for this. Thanks!

Yep, for sure. I added test cases for this.

Comment on lines 377 to 379
def supplement(id: str, data: SupplementChat):
chat_logger.info("Chat supplement requested", extra={"task_id": id})
task_lock = get_task_lock(id)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we still use get_task_lock in supplement, stop, add_task

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We keep get_task_lock() on supplement/stop/add_task because they only make sense with an existing active session; creating a new lock there would silently drop the action (no SSE consumer). improve is user-facing, so we return 410 and let the client reconnect via POST /chat to re-create both lock + SSE.

@Wendong-Fan Wendong-Fan added this to the Sprint 14 milestone Feb 8, 2026
@darkmoon-o darkmoon-o force-pushed the fix/1158-task-lock-not-found branch from 504dea0 to 479c5e1 Compare February 8, 2026 21:09
nitpicker55555

This comment was marked as outdated.

Comment on lines 257 to +270
"Chat improvement requested",
extra={"task_id": id, "question_length": len(data.question)},
)
task_lock = get_task_lock(id)
task_lock = get_task_lock_if_exists(id)

if task_lock is None:
# SSE session no longer exists (disconnected, timed out, or stopped).
# Return 410 Gone so the frontend can fall back to POST /chat which
# creates both a new task lock AND a new SSE consumer.
chat_logger.warning(
"Task lock not found for improve request, "
"returning 410 so client reconnects via POST /chat",
extra={"project_id": id},
)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

258 line extra={"task_id": id
but 269 line extra={"project_id": id

this id should be task_id right?

Comment on lines 591 to 595
fetchPost(`/chat/${projectStore.activeProjectId}`, {
question: tempMessageContent,
task_id: nextTaskId,
attaches: improveAttaches,
});
Copy link
Collaborator

@nitpicker55555 nitpicker55555 Feb 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The backend now returns 410 when the task lock is missing, but currently there's no frontend handling for it. The existing fetchPost call in ChatBox/index.tsx is fire-and-forget (no await, no .then()/.catch()), and handleResponse in src/api/http.ts doesn't check HTTP status codes — so the 410 response gets silently discarded. Could we add corresponding frontend handling (e.g. await the response, detect 410, and show a reconnect prompt to the user) to make this fix actually take effect?

@darkmoon-o darkmoon-o force-pushed the fix/1158-task-lock-not-found branch from b9b44d5 to 343b11c Compare February 9, 2026 22:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants