[Legacy] Stop generation on disconnection by mzegla · Pull Request #3994 · openvinotoolkit/model_server

mzegla · 2026-02-19T11:12:47Z

No description provided.

Copilot

Pull request overview

This PR updates the legacy LLM and VLM servables to react to client disconnects by signaling disconnection in the execution context and attempting to cancel streaming generation via the OpenVINO GenAI TextStreamer callback.

Changes:

Added signalDisconnection() helper on legacy execution contexts to set a disconnect flag and notify waiters.
Updated legacy streaming callbacks to return ov::genai::StreamingStatus::CANCEL when a disconnect is signaled.
Wired disconnection callbacks (and an immediate disconnect check) to call signalDisconnection().

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 2 comments.

File	Description
`src/llm/visual_language_model/legacy/servable.hpp`	Adds `signalDisconnection()` helper to VLM legacy execution context.
`src/llm/visual_language_model/legacy/servable.cpp`	Cancels streaming via `TextStreamer` callback on disconnect; uses `signalDisconnection()` when disconnect occurs.
`src/llm/language_model/legacy/servable.hpp`	Adds `signalDisconnection()` helper to LLM legacy execution context.
`src/llm/language_model/legacy/servable.cpp`	Cancels streaming via `TextStreamer` callback on disconnect; uses `signalDisconnection()` when disconnect occurs.

Comments suppressed due to low confidence (2)

src/llm/language_model/legacy/servable.cpp:98

There are unit tests for legacy servable behavior in src/test/llm/llmnode_test.cpp, but nothing covering the new disconnect-driven cancellation behavior. Please add a test that exercises the disconnect path (e.g., ensuring the streamer callback returns CANCEL when clientDisconnected is set, and/or that a registered disconnection callback stops an in-flight streaming request).

    if (legacyExecutionContext->apiHandler->isStream()) {
        legacyExecutionContext->lastStreamerCallbackOutput = "";  // initialize with empty string
        auto callback = [& executionInProgress = legacyExecutionContext->executionInProgress, &mutex = legacyExecutionContext->mutex, &lastStreamerCallbackOutput = legacyExecutionContext->lastStreamerCallbackOutput, &clientDisconnected = legacyExecutionContext->clientDisconnected](std::string text) {
            SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Streamer callback executed with text: [{}]", text);
            if (clientDisconnected.load()) {
                executionInProgress.notify_one();
                return ov::genai::StreamingStatus::CANCEL;
            }
            {
                std::lock_guard<std::mutex> lock(mutex);
                lastStreamerCallbackOutput += text;
                executionInProgress.notify_one();
            }
            return ov::genai::StreamingStatus::RUNNING;
        };

src/llm/visual_language_model/legacy/servable.cpp:109

Please add test coverage for the new disconnect cancellation behavior for the VLM legacy servable as well (streamer callback returning CANCEL / disconnect callback stopping generation). Existing LLM-node tests indicate this repo expects gtest coverage for servable logic changes.

    if (legacyExecutionContext->apiHandler->isStream()) {
        legacyExecutionContext->lastStreamerCallbackOutput = "";  // initialize with empty string
        auto callback = [& executionInProgress = legacyExecutionContext->executionInProgress, &mutex = legacyExecutionContext->mutex, &lastStreamerCallbackOutput = legacyExecutionContext->lastStreamerCallbackOutput, &clientDisconnected = legacyExecutionContext->clientDisconnected](std::string text) {
            SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Streamer callback executed with text: [{}]", text);
            if (clientDisconnected.load()) {
                executionInProgress.notify_one();
                return ov::genai::StreamingStatus::CANCEL;
            }
            {
                std::lock_guard<std::mutex> lock(mutex);
                lastStreamerCallbackOutput += text;
                executionInProgress.notify_one();
            }
            return ov::genai::StreamingStatus::RUNNING;
        };
        legacyExecutionContext->textStreamer = std::make_shared<ov::genai::TextStreamer>(getProperties()->tokenizer, callback);
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-02-19T11:17:23Z

src/llm/language_model/legacy/servable.cpp

    if (legacyExecutionContext->apiHandler->isStream()) {
        legacyExecutionContext->lastStreamerCallbackOutput = "";  // initialize with empty string
-        auto callback = [& executionInProgress = legacyExecutionContext->executionInProgress, &mutex = legacyExecutionContext->mutex, &lastStreamerCallbackOutput = legacyExecutionContext->lastStreamerCallbackOutput](std::string text) {
+        auto callback = [& executionInProgress = legacyExecutionContext->executionInProgress, &mutex = legacyExecutionContext->mutex, &lastStreamerCallbackOutput = legacyExecutionContext->lastStreamerCallbackOutput, &clientDisconnected = legacyExecutionContext->clientDisconnected](std::string text) {
            SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Streamer callback executed with text: [{}]", text);
+            if (clientDisconnected.load()) {


The new cancellation path is only active for streaming requests (where textStreamer/callback exist). For non-streaming requests textStreamer stays null, so a mid-generation disconnect will still let pipe->generate(...) run to completion and consume compute. If the intent is to stop generation on disconnection in all cases, consider wiring a cancellation mechanism for unary requests too (e.g., create a lightweight streamer callback that only checks clientDisconnected, or use a stop-handle approach like the continuous batching servable).

Copilot · 2026-02-19T11:17:23Z

src/llm/visual_language_model/legacy/servable.cpp

    if (legacyExecutionContext->apiHandler->isStream()) {
        legacyExecutionContext->lastStreamerCallbackOutput = "";  // initialize with empty string
-        auto callback = [& executionInProgress = legacyExecutionContext->executionInProgress, &mutex = legacyExecutionContext->mutex, &lastStreamerCallbackOutput = legacyExecutionContext->lastStreamerCallbackOutput](std::string text) {
+        auto callback = [& executionInProgress = legacyExecutionContext->executionInProgress, &mutex = legacyExecutionContext->mutex, &lastStreamerCallbackOutput = legacyExecutionContext->lastStreamerCallbackOutput, &clientDisconnected = legacyExecutionContext->clientDisconnected](std::string text) {
            SPDLOG_LOGGER_TRACE(llm_calculator_logger, "Streamer callback executed with text: [{}]", text);
+            if (clientDisconnected.load()) {
+                executionInProgress.notify_one();
+                return ov::genai::StreamingStatus::CANCEL;
+            }


Same as LLM legacy servable: cancellation is only wired for streaming requests. In unary mode textStreamer is not created, so a mid-generation disconnect will not stop pipe->generate(...) early and can waste accelerator/CPU time.

Copilot

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

init

e42011a

mzegla requested review from Copilot, dtrawins and michalkulakowski and removed request for Copilot February 19, 2026 11:12

Copilot started reviewing on behalf of mzegla February 19, 2026 11:13 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

unary

9bf1250

mzegla requested a review from Copilot February 19, 2026 12:34

Copilot started reviewing on behalf of mzegla February 19, 2026 12:35 View session

Copilot AI reviewed Feb 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

[Legacy] Stop generation on disconnection#3994

[Legacy] Stop generation on disconnection#3994
mzegla wants to merge 2 commits intomainfrom
disconnect_handling_for_legacy

mzegla commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI Feb 19, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Comments

Conversation

mzegla commented Feb 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant