feat(mcp): expose connector LOG / stderr capture from read_source_stream_records and friends

## Summary

The local-MCP read tools in `airbyte/mcp/local.py` (`read_source_stream_records`, `get_stream_previews`, `sync_source_to_cache`) only return records (or an error string on failure). The connector's `LOG`/`TRACE` messages — and any raw stderr from the connector subprocess — are not surfaced to the MCP caller. For most analytics use cases that's fine, but it makes these tools unusable for a class of repro / debugging scenarios where the *contents of the connector's logs* are the actual signal being inspected.

## Use case

Same context as [airbytehq/PyAirbyte#1018](https://github.com/airbytehq/PyAirbyte/issues/1018) — building a CONTRIBUTING.md-blessed CDC repro harness for `source-mssql` ([airbytehq/airbyte#77775](https://github.com/airbytehq/airbyte/pull/77775)) and recommending coral-mcp / these MCP tools as the connector-side runner in the `!slack_connector_issue_repro` playbook ([airbytehq/ai-skills#302](https://github.com/airbytehq/ai-skills/pull/302)).

One of the worked examples ([`airbytehq/oncall#12094`](https://github.com/airbytehq/oncall/issues/12094) — "CDC schema history grows unbounded") is verified by counting Debezium TRACE log lines in the read output:

```
Snapshot step 2 - Determining captured tables
Adding table CdcTest.dbo.users to the list of capture schema tables
Adding table CdcTest.dbo.noise_1 to the list of capture schema tables
... (one line per table in the database, even though the configured catalog has one stream)
```

A successful reproduction emits records *and* logs; the bug is in the log shape (32 "Adding table" lines for 1 configured stream, where there should be 1 + however-many-CDC-enabled-tables). To assert that, the harness needs to *count* lines matching `^Adding table CdcTest\.dbo\..* to the list of capture schema tables$` in the connector's stderr.

Today there is no MCP-surfaced path to those lines. `read_source_stream_records` returns `list[dict] | str` — records or error message. PyAirbyte's `Executor.execute(...)` does pass the connector's stderr through to the parent process's stderr (via `subprocess.Popen(..., stderr=None)` when `suppress_stderr=False`), but in MCP-server mode the parent process's stderr is the MCP server's stderr — not visible to the caller.

## Proposal

Add a `log_file_path: str | Path | None = None` parameter to the read tools (`read_source_stream_records`, `get_stream_previews`, `sync_source_to_cache`). When set, the tool passes a corresponding `log_file=` (or equivalent) into `_stream_from_subprocess(...)` so the connector subprocess's stderr is written to that file. The caller can then read the file post-hoc.

Optionally, for short logs, also accept `include_logs_in_response: bool = False` — when set, the tool collects the log file contents and returns them alongside records (e.g., return shape becomes `{"records": [...], "logs": "..."}` instead of `list[dict]`).

Either shape makes the harness use case viable. (1) is more flexible (large log files, structured grep); (2) is more ergonomic for short repros.

`Executor.execute(..., suppress_stderr=False, log_file=...)` already supports the underlying capture — the gap is purely in the MCP layer not exposing it. The plumbing change is:

- `airbyte/mcp/local.py` — `read_source_stream_records` / `get_stream_previews` / `sync_source_to_cache` accept `log_file_path` (and optional `include_logs_in_response`)
- `airbyte/sources/base.py` — `Source.get_records(...)` (and friends) accept and pass through a `log_file` kwarg into `_execute(...)`
- `airbyte/_executors/base.py` — `Executor.execute(...)` already takes `suppress_stderr`; needs an additional `log_file` kwarg that flows into `_stream_from_subprocess(log_file=...)`

(Listed for reviewer convenience; happy to take this on.)

## Related

- [airbytehq/PyAirbyte#1018](https://github.com/airbytehq/PyAirbyte/issues/1018) — companion feature request: let the same MCP tools accept a docker image / version identifier instead of only registry-name lookup. The two together unblock language-agnostic connector repro harnesses driven from MCP / coral-mcp.

---
[Devin session](https://app.devin.ai/sessions/ddc1f83765d64caba75be393540d7b50)


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(mcp): expose connector LOG / stderr capture from read_source_stream_records and friends #1019

Summary

Use case

Proposal

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

feat(mcp): expose connector LOG / stderr capture from read_source_stream_records and friends #1019

Description

Summary

Use case

Proposal

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions