Skip to content

feat(mcp): expose connector LOG / stderr capture from read_source_stream_records and friends #1019

@devin-ai-integration

Description

@devin-ai-integration

Summary

The local-MCP read tools in airbyte/mcp/local.py (read_source_stream_records, get_stream_previews, sync_source_to_cache) only return records (or an error string on failure). The connector's LOG/TRACE messages — and any raw stderr from the connector subprocess — are not surfaced to the MCP caller. For most analytics use cases that's fine, but it makes these tools unusable for a class of repro / debugging scenarios where the contents of the connector's logs are the actual signal being inspected.

Use case

Same context as airbytehq/PyAirbyte#1018 — building a CONTRIBUTING.md-blessed CDC repro harness for source-mssql (airbytehq/airbyte#77775) and recommending coral-mcp / these MCP tools as the connector-side runner in the !slack_connector_issue_repro playbook (airbytehq/ai-skills#302).

One of the worked examples (airbytehq/oncall#12094 — "CDC schema history grows unbounded") is verified by counting Debezium TRACE log lines in the read output:

Snapshot step 2 - Determining captured tables
Adding table CdcTest.dbo.users to the list of capture schema tables
Adding table CdcTest.dbo.noise_1 to the list of capture schema tables
... (one line per table in the database, even though the configured catalog has one stream)

A successful reproduction emits records and logs; the bug is in the log shape (32 "Adding table" lines for 1 configured stream, where there should be 1 + however-many-CDC-enabled-tables). To assert that, the harness needs to count lines matching ^Adding table CdcTest\.dbo\..* to the list of capture schema tables$ in the connector's stderr.

Today there is no MCP-surfaced path to those lines. read_source_stream_records returns list[dict] | str — records or error message. PyAirbyte's Executor.execute(...) does pass the connector's stderr through to the parent process's stderr (via subprocess.Popen(..., stderr=None) when suppress_stderr=False), but in MCP-server mode the parent process's stderr is the MCP server's stderr — not visible to the caller.

Proposal

Add a log_file_path: str | Path | None = None parameter to the read tools (read_source_stream_records, get_stream_previews, sync_source_to_cache). When set, the tool passes a corresponding log_file= (or equivalent) into _stream_from_subprocess(...) so the connector subprocess's stderr is written to that file. The caller can then read the file post-hoc.

Optionally, for short logs, also accept include_logs_in_response: bool = False — when set, the tool collects the log file contents and returns them alongside records (e.g., return shape becomes {"records": [...], "logs": "..."} instead of list[dict]).

Either shape makes the harness use case viable. (1) is more flexible (large log files, structured grep); (2) is more ergonomic for short repros.

Executor.execute(..., suppress_stderr=False, log_file=...) already supports the underlying capture — the gap is purely in the MCP layer not exposing it. The plumbing change is:

  • airbyte/mcp/local.pyread_source_stream_records / get_stream_previews / sync_source_to_cache accept log_file_path (and optional include_logs_in_response)
  • airbyte/sources/base.pySource.get_records(...) (and friends) accept and pass through a log_file kwarg into _execute(...)
  • airbyte/_executors/base.pyExecutor.execute(...) already takes suppress_stderr; needs an additional log_file kwarg that flows into _stream_from_subprocess(log_file=...)

(Listed for reviewer convenience; happy to take this on.)

Related

  • airbytehq/PyAirbyte#1018 — companion feature request: let the same MCP tools accept a docker image / version identifier instead of only registry-name lookup. The two together unblock language-agnostic connector repro harnesses driven from MCP / coral-mcp.

Devin session

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions