Skip to content

[aw][code health] _interactive_loop daemon fallback-reader thread outlives the function call — concurrency guidelines violatio [Content truncated due to length] #1062

@microsasa

Description

@microsasa

Root Cause

_interactive_loop (in cli.py) lazily starts a daemon thread via _start_input_reader_thread() when _read_line_nonblocking raises ValueError/OSError. The finally block only tears down the watchdog observer:

finally:
    _stop_observer(observer)

The fallback_queue and its backing daemon thread receive no cleanup. The thread blocks in input() (i.e. sys.stdin.readline()) after _interactive_loop returns — violating the concurrency rule in CODING_GUIDELINES.md:

"Queues, daemon threads, file handles, sockets, and other I/O resources must not outlive the function call that uses them."
"The litmus test: if two calls to the public entry point happen in the same process, do they share any I/O handle, queue, or daemon thread? If yes, the design is wrong."

The guidelines explicitly call out sys.stdin.readline() daemon threads as the canonical failure mode that caused PR #1015 / issue #1012. The fix in that PR put the queue and thread inside the function, which prevents stale-sentinel leakage between calls. However, it did not address the thread's lifetime: after the loop exits, the old thread continues to compete with any new fallback thread for sys.stdin bytes. In a test suite that calls _interactive_loop repeatedly with a monkeypatched stdin, back-to-back calls can see input consumed by a zombie thread from the previous call.


Resolution

Replace the current input()-based daemon thread with an approach that supports a clean stop signal so the thread can be joined in the finally block.

Recommended approach (matches the guidelines' preferred pattern):

  1. Extend _start_input_reader_thread (or replace it) to accept a threading.Event stop sentinel.
  2. Inside the thread, replace the blocking input() call with a select()-based read loop (matching _read_line_nonblocking) that checks the stop event between polls. On platforms where select() raises ValueError/OSError, fall back to a very short input() attempt inside a try/except and check the stop event via the queue.
  3. In _interactive_loop's finally block, set the stop event and join the thread with a timeout:
finally:
    _stop_observer(observer)
    if fallback_stop is not None:
        fallback_stop.set()
    if fallback_thread is not None:
        fallback_thread.join(timeout=1.0)

Scope constraint: the fix must not re-introduce shared module-level state (queues, threads, or events that persist between calls).


Testing Requirement

Add a test that calls _interactive_loop twice in the same process (with a monkeypatched _read_line_nonblocking that raises ValueError to force the fallback path) and asserts:

  1. No daemon threads whose name matches "input-fallback" are alive after the first call returns.
  2. The second call completes without consuming input that was intended for it from a zombie thread started during the first call.

This prevents regression of the concurrency guideline violation documented above.

Generated by Code Health Analysis · ● 2.3M ·

Metadata

Metadata

Assignees

No one assigned

    Labels

    awCreated by agentic workflowaw-dispatchedIssue has been dispatched to implementercode-healthCode cleanup and maintenance

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions