Summary
The local-MCP read tools in airbyte/mcp/local.py (read_source_stream_records, get_stream_previews, sync_source_to_cache) only return records (or an error string on failure). The connector's LOG/TRACE messages — and any raw stderr from the connector subprocess — are not surfaced to the MCP caller. For most analytics use cases that's fine, but it makes these tools unusable for a class of repro / debugging scenarios where the contents of the connector's logs are the actual signal being inspected.
Use case
Same context as airbytehq/PyAirbyte#1018 — building a CONTRIBUTING.md-blessed CDC repro harness for source-mssql (airbytehq/airbyte#77775) and recommending coral-mcp / these MCP tools as the connector-side runner in the !slack_connector_issue_repro playbook (airbytehq/ai-skills#302).
One of the worked examples (airbytehq/oncall#12094 — "CDC schema history grows unbounded") is verified by counting Debezium TRACE log lines in the read output:
Snapshot step 2 - Determining captured tables
Adding table CdcTest.dbo.users to the list of capture schema tables
Adding table CdcTest.dbo.noise_1 to the list of capture schema tables
... (one line per table in the database, even though the configured catalog has one stream)
A successful reproduction emits records and logs; the bug is in the log shape (32 "Adding table" lines for 1 configured stream, where there should be 1 + however-many-CDC-enabled-tables). To assert that, the harness needs to count lines matching ^Adding table CdcTest\.dbo\..* to the list of capture schema tables$ in the connector's stderr.
Today there is no MCP-surfaced path to those lines. read_source_stream_records returns list[dict] | str — records or error message. PyAirbyte's Executor.execute(...) does pass the connector's stderr through to the parent process's stderr (via subprocess.Popen(..., stderr=None) when suppress_stderr=False), but in MCP-server mode the parent process's stderr is the MCP server's stderr — not visible to the caller.
Proposal
Add a log_file_path: str | Path | None = None parameter to the read tools (read_source_stream_records, get_stream_previews, sync_source_to_cache). When set, the tool passes a corresponding log_file= (or equivalent) into _stream_from_subprocess(...) so the connector subprocess's stderr is written to that file. The caller can then read the file post-hoc.
Optionally, for short logs, also accept include_logs_in_response: bool = False — when set, the tool collects the log file contents and returns them alongside records (e.g., return shape becomes {"records": [...], "logs": "..."} instead of list[dict]).
Either shape makes the harness use case viable. (1) is more flexible (large log files, structured grep); (2) is more ergonomic for short repros.
Executor.execute(..., suppress_stderr=False, log_file=...) already supports the underlying capture — the gap is purely in the MCP layer not exposing it. The plumbing change is:
airbyte/mcp/local.py — read_source_stream_records / get_stream_previews / sync_source_to_cache accept log_file_path (and optional include_logs_in_response)
airbyte/sources/base.py — Source.get_records(...) (and friends) accept and pass through a log_file kwarg into _execute(...)
airbyte/_executors/base.py — Executor.execute(...) already takes suppress_stderr; needs an additional log_file kwarg that flows into _stream_from_subprocess(log_file=...)
(Listed for reviewer convenience; happy to take this on.)
Related
- airbytehq/PyAirbyte#1018 — companion feature request: let the same MCP tools accept a docker image / version identifier instead of only registry-name lookup. The two together unblock language-agnostic connector repro harnesses driven from MCP / coral-mcp.
Devin session
Summary
The local-MCP read tools in
airbyte/mcp/local.py(read_source_stream_records,get_stream_previews,sync_source_to_cache) only return records (or an error string on failure). The connector'sLOG/TRACEmessages — and any raw stderr from the connector subprocess — are not surfaced to the MCP caller. For most analytics use cases that's fine, but it makes these tools unusable for a class of repro / debugging scenarios where the contents of the connector's logs are the actual signal being inspected.Use case
Same context as airbytehq/PyAirbyte#1018 — building a CONTRIBUTING.md-blessed CDC repro harness for
source-mssql(airbytehq/airbyte#77775) and recommending coral-mcp / these MCP tools as the connector-side runner in the!slack_connector_issue_reproplaybook (airbytehq/ai-skills#302).One of the worked examples (
airbytehq/oncall#12094— "CDC schema history grows unbounded") is verified by counting Debezium TRACE log lines in the read output:A successful reproduction emits records and logs; the bug is in the log shape (32 "Adding table" lines for 1 configured stream, where there should be 1 + however-many-CDC-enabled-tables). To assert that, the harness needs to count lines matching
^Adding table CdcTest\.dbo\..* to the list of capture schema tables$in the connector's stderr.Today there is no MCP-surfaced path to those lines.
read_source_stream_recordsreturnslist[dict] | str— records or error message. PyAirbyte'sExecutor.execute(...)does pass the connector's stderr through to the parent process's stderr (viasubprocess.Popen(..., stderr=None)whensuppress_stderr=False), but in MCP-server mode the parent process's stderr is the MCP server's stderr — not visible to the caller.Proposal
Add a
log_file_path: str | Path | None = Noneparameter to the read tools (read_source_stream_records,get_stream_previews,sync_source_to_cache). When set, the tool passes a correspondinglog_file=(or equivalent) into_stream_from_subprocess(...)so the connector subprocess's stderr is written to that file. The caller can then read the file post-hoc.Optionally, for short logs, also accept
include_logs_in_response: bool = False— when set, the tool collects the log file contents and returns them alongside records (e.g., return shape becomes{"records": [...], "logs": "..."}instead oflist[dict]).Either shape makes the harness use case viable. (1) is more flexible (large log files, structured grep); (2) is more ergonomic for short repros.
Executor.execute(..., suppress_stderr=False, log_file=...)already supports the underlying capture — the gap is purely in the MCP layer not exposing it. The plumbing change is:airbyte/mcp/local.py—read_source_stream_records/get_stream_previews/sync_source_to_cacheacceptlog_file_path(and optionalinclude_logs_in_response)airbyte/sources/base.py—Source.get_records(...)(and friends) accept and pass through alog_filekwarg into_execute(...)airbyte/_executors/base.py—Executor.execute(...)already takessuppress_stderr; needs an additionallog_filekwarg that flows into_stream_from_subprocess(log_file=...)(Listed for reviewer convenience; happy to take this on.)
Related
Devin session