feat: prototype MCP docs generation via mcpdocs-gen#1013
feat: prototype MCP docs generation via mcpdocs-gen#1013Aaron ("AJ") Steers (aaronsteers) wants to merge 1 commit into
Conversation
Adds a reproducible workflow for generating static HTML documentation for the PyAirbyte MCP server using mcpdocs-gen. - scripts/generate_mcp_docs.py: starts the MCP server in SSE mode, waits for readiness, shells out to mcpdocs, and tears down the server subprocess cleanly (incl. SIGINT handling). - poe mcp-docs-generate: task alias that runs the script. - docs/CONTRIBUTING.md: how-to snippet for local regeneration. - .gitignore: ignore docs/mcp-generated/ output directory. This is a prototype/evaluation PR; see the PR description for a candid evaluation of mcpdocs-gen's fitness as PyAirbyte's long-term MCP docs generator.
🤖 Devin AI EngineerI'll be helping with this pull request! Here's what you should know: ✅ I will automatically:
Note: I can only respond to comments from users who have write access to this repository. ⚙️ Control Options:
|
👋 Greetings, Airbyte Team Member!Here are some helpful tips and reminders for your convenience. 💡 Show Tips and TricksTesting This PyAirbyte VersionYou can test this version of PyAirbyte using the following: # Run PyAirbyte CLI from this branch:
uvx --from 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1776390943-mcp-docs-prototype' pyairbyte --help
# Install PyAirbyte from this branch for development:
pip install 'git+https://github.com/airbytehq/PyAirbyte.git@devin/1776390943-mcp-docs-prototype'PR Slash CommandsAirbyte Maintainers can execute the following slash commands on your PR:
📚 Show Repo GuidanceHelpful ResourcesCommunity SupportQuestions? Join the #pyairbyte channel in our Slack workspace. |
ScreenshotsIndex page (tool list, search bar, dark-mode toggle): Tool detail view (parameters table + JSON schema): Candid evaluation of
|
📝 WalkthroughWalkthroughThe PR introduces infrastructure to automatically generate static HTML documentation for the MCP server. It includes a new Python script that orchestrates spawning an MCP server, waiting for readiness, invoking the mcpdocs generator, and cleaning up server processes. Supporting configuration and documentation are added alongside a git-ignore rule for the generated output directory. Changes
Sequence DiagramsequenceDiagram
actor User
participant CLI as generate_mcp_docs.py
participant Server as MCP Server<br/>(subprocess)
participant PortCheck as Port Checker
participant McpDocs as mcpdocs generate
participant Filesystem as File System
User->>CLI: poe mcp-docs-generate
CLI->>Server: spawn via subprocess (SSE mode)
CLI->>PortCheck: wait_for_port(host:port, timeout)
PortCheck->>PortCheck: poll TCP connection
PortCheck-->>CLI: port ready
CLI->>McpDocs: run generate --url http://host:port/sse
McpDocs->>Server: connect via SSE endpoint
Server-->>McpDocs: stream server metadata
McpDocs->>Filesystem: write HTML docs to output dir
McpDocs-->>CLI: success
CLI->>Server: terminate process group (SIGTERM)
Server-->>CLI: shutdown
CLI->>Filesystem: output available at docs/mcp-generated/
CLI-->>User: exit(0)
Estimated Code Review Effort🎯 3 (Moderate) | ⏱️ ~22 minutes 🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (4)
scripts/generate_mcp_docs.py (3)
44-58: TCP connect readiness vs. SSE endpoint readiness — possible race?
_wait_for_portreturns as soon as the socket accepts a connection, but that doesn't guarantee FastMCP has finished mounting the/sseroute. In practice uvicorn typically binds after the ASGI app is ready, so this is usually fine — just flagging in case you've seen flakiness. If it does become flaky, a tiny HTTP GET to/sse(or any known endpoint) with a short retry would be more robust. No action needed if the smoke test has been stable.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/generate_mcp_docs.py` around lines 44 - 58, _wait_for_port currently only checks TCP connectability which can return before the FastMCP app has mounted the /sse route; to make readiness checks robust change _wait_for_port to perform an actual HTTP GET against the SSE endpoint (e.g., GET /sse or another known health endpoint) with a short per-attempt timeout and retry loop until the overall deadline, falling back to the existing TCP check if needed; update references to _wait_for_port to expect the same signature and ensure the raised TimeoutError message includes the last HTTP error or status for debugging.
104-108: Stale output directory not cleaned before regeneration — intentional?
output.mkdir(parents=True, exist_ok=True)preserves prior contents, so if tools/resources are removed from the MCP server between runs, their old HTML pages will linger indocs/mcp-generated/. Would it be worth doing ashutil.rmtree(output, ignore_errors=True)before the mkdir (or delegating tomcpdocsif it has a clean flag), wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/generate_mcp_docs.py` around lines 104 - 108, The current generation preserves prior contents by calling output.mkdir(...), which leaves stale pages behind; before invoking subprocess.run([mcpdocs_bin, "generate", ...], check=True) remove the existing output directory (use shutil.rmtree(output, ignore_errors=True)) and then recreate it (output.mkdir(parents=True, exist_ok=True)) so the generated docs directory is clean; reference the variables output and mcpdocs_bin and the subprocess.run call when making the change.
72-77: Silencing server stderr makes startup failures hard to diagnose — capture it?With
stdout=DEVNULLandstderr=STDOUT, if the subprocess crashes on import (bad env, port in use, etc.),_wait_for_portwill just time out after 60s with no breadcrumbs. Would you consider piping stderr to a tempfile (orsubprocess.PIPE+ drain on timeout) so theTimeoutErrormessage can include the last few lines of server output, wdyt?🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@scripts/generate_mcp_docs.py` around lines 72 - 77, The subprocess is currently started with stdout=DEVNULL and stderr=STDOUT which discards server error output; change the Popen call in scripts/generate_mcp_docs.py to capture stderr (e.g., create a tempfile.NamedTemporaryFile(mode='w+b') or use subprocess.PIPE and store the handle) and pass that handle as stderr to subprocess.Popen (keep stdout=DEVNULL if desired). Then, in the _wait_for_port function (or wherever the TimeoutError is raised), read the last N lines from that tempfile (or drain the PIPE) on timeout and include those lines in the TimeoutError message; ensure the tempfile/pipe is closed/cleaned up after use and that file descriptors are passed correctly to Popen.pyproject.toml (1)
177-177: Task wiring looks good — consider documenting themcpdocs-geninstall step as a task too?Since
mcpdocs-genisn't a declared dep (per the PR description, intentionally ad-hoc for the prototype), runningpoe mcp-docs-generateon a fresh checkout will fail with the "CLI not found" error from the script until users read the CONTRIBUTING section. Would it be worth adding a siblingmcp-docs-installtask (or a shell that doesuv pip install mcpdocs-gen && python scripts/...) to make the prototype one-shot-runnable, wdyt? Happy to leave as-is for now if you'd rather keep the dependency decision explicit per the PR checklist.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@pyproject.toml` at line 177, Add a sibling poe task to make the docs generation one-shot: create a new task named mcp-docs-install that installs the ad-hoc CLI then invokes the existing generator (so it runs uv pip install mcpdocs-gen && python scripts/generate_mcp_docs.py or separate install then call), referencing the existing mcp-docs-generate task name so callers can choose installation+generate in one command; ensure the task help text explains it installs mcpdocs-gen before generating docs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@scripts/generate_mcp_docs.py`:
- Around line 61-93: The current subprocess management in _start_mcp_server and
_stop_mcp_server assumes POSIX-only APIs (start_new_session, os.killpg,
signal.SIGKILL) and will fail on Windows; update the functions to detect the
platform (using sys.platform or os.name) and provide a Windows-safe fallback:
when on POSIX keep the existing start_new_session and os.killpg flow, but when
on Windows avoid start_new_session and instead use the Popen instance directly
(proc.terminate()/proc.kill()) and proc.kill() as the hard-kill fallback,
catching and suppressing ProcessLookupError/AttributeError as needed;
alternatively, if Windows support is out of scope, raise a clear, early
PlatformError in _start_mcp_server explaining the script is POSIX-only so
callers get a friendly error instead of failing in the finally of
_stop_mcp_server.
---
Nitpick comments:
In `@pyproject.toml`:
- Line 177: Add a sibling poe task to make the docs generation one-shot: create
a new task named mcp-docs-install that installs the ad-hoc CLI then invokes the
existing generator (so it runs uv pip install mcpdocs-gen && python
scripts/generate_mcp_docs.py or separate install then call), referencing the
existing mcp-docs-generate task name so callers can choose installation+generate
in one command; ensure the task help text explains it installs mcpdocs-gen
before generating docs.
In `@scripts/generate_mcp_docs.py`:
- Around line 44-58: _wait_for_port currently only checks TCP connectability
which can return before the FastMCP app has mounted the /sse route; to make
readiness checks robust change _wait_for_port to perform an actual HTTP GET
against the SSE endpoint (e.g., GET /sse or another known health endpoint) with
a short per-attempt timeout and retry loop until the overall deadline, falling
back to the existing TCP check if needed; update references to _wait_for_port to
expect the same signature and ensure the raised TimeoutError message includes
the last HTTP error or status for debugging.
- Around line 104-108: The current generation preserves prior contents by
calling output.mkdir(...), which leaves stale pages behind; before invoking
subprocess.run([mcpdocs_bin, "generate", ...], check=True) remove the existing
output directory (use shutil.rmtree(output, ignore_errors=True)) and then
recreate it (output.mkdir(parents=True, exist_ok=True)) so the generated docs
directory is clean; reference the variables output and mcpdocs_bin and the
subprocess.run call when making the change.
- Around line 72-77: The subprocess is currently started with stdout=DEVNULL and
stderr=STDOUT which discards server error output; change the Popen call in
scripts/generate_mcp_docs.py to capture stderr (e.g., create a
tempfile.NamedTemporaryFile(mode='w+b') or use subprocess.PIPE and store the
handle) and pass that handle as stderr to subprocess.Popen (keep stdout=DEVNULL
if desired). Then, in the _wait_for_port function (or wherever the TimeoutError
is raised), read the last N lines from that tempfile (or drain the PIPE) on
timeout and include those lines in the TimeoutError message; ensure the
tempfile/pipe is closed/cleaned up after use and that file descriptors are
passed correctly to Popen.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository UI
Review profile: CHILL
Plan: Pro
Run ID: 758bef0f-7cde-4674-a378-3bfc40d0cc63
📒 Files selected for processing (4)
.gitignoredocs/CONTRIBUTING.mdpyproject.tomlscripts/generate_mcp_docs.py
| def _start_mcp_server(host: str, port: int) -> subprocess.Popen[bytes]: | ||
| """Start the PyAirbyte MCP server in SSE mode as a background subprocess.""" | ||
| cmd = [ | ||
| sys.executable, | ||
| "-c", | ||
| ( | ||
| "from airbyte.mcp.server import app; " | ||
| f"app.run(transport='sse', host={host!r}, port={port})" | ||
| ), | ||
| ] | ||
| # Start in its own process group so we can signal the whole tree on shutdown. | ||
| return subprocess.Popen( | ||
| cmd, | ||
| stdout=subprocess.DEVNULL, | ||
| stderr=subprocess.STDOUT, | ||
| start_new_session=True, | ||
| ) | ||
|
|
||
|
|
||
| def _stop_mcp_server(proc: subprocess.Popen[bytes]) -> None: | ||
| """Terminate the MCP server subprocess tree cleanly.""" | ||
| if proc.poll() is not None: | ||
| return | ||
| try: | ||
| os.killpg(proc.pid, signal.SIGTERM) | ||
| except ProcessLookupError: | ||
| return | ||
| try: | ||
| proc.wait(timeout=10.0) | ||
| except subprocess.TimeoutExpired: | ||
| with contextlib.suppress(ProcessLookupError): | ||
| os.killpg(proc.pid, signal.SIGKILL) | ||
| proc.wait(timeout=5.0) |
There was a problem hiding this comment.
POSIX-only process group handling — is Windows support in scope for this prototype?
start_new_session=True is a no-op on Windows, and os.killpg / signal.SIGKILL aren't available there, so _stop_mcp_server will raise AttributeError on Windows even in the "happy path" (it runs unconditionally in the finally). Given the project supports Python 3.10–3.12 on Windows per requires-python and there are Windows-specific pytest filters, would you like to either (a) gate this script to POSIX with a friendly error up-front, or (b) add a Windows fallback using proc.terminate() / proc.kill(), wdyt?
🛠️ Sketch of a cross-platform fallback
def _start_mcp_server(host: str, port: int) -> subprocess.Popen[bytes]:
...
+ popen_kwargs: dict = {"stdout": subprocess.DEVNULL, "stderr": subprocess.STDOUT}
+ if os.name == "posix":
+ popen_kwargs["start_new_session"] = True
+ else:
+ popen_kwargs["creationflags"] = subprocess.CREATE_NEW_PROCESS_GROUP # type: ignore[attr-defined]
- return subprocess.Popen(
- cmd,
- stdout=subprocess.DEVNULL,
- stderr=subprocess.STDOUT,
- start_new_session=True,
- )
+ return subprocess.Popen(cmd, **popen_kwargs)
def _stop_mcp_server(proc: subprocess.Popen[bytes]) -> None:
if proc.poll() is not None:
return
- try:
- os.killpg(proc.pid, signal.SIGTERM)
- except ProcessLookupError:
- return
+ if os.name == "posix":
+ try:
+ os.killpg(proc.pid, signal.SIGTERM)
+ except ProcessLookupError:
+ return
+ else:
+ proc.terminate()
try:
proc.wait(timeout=10.0)
except subprocess.TimeoutExpired:
- with contextlib.suppress(ProcessLookupError):
- os.killpg(proc.pid, signal.SIGKILL)
+ with contextlib.suppress(ProcessLookupError):
+ if os.name == "posix":
+ os.killpg(proc.pid, signal.SIGKILL)
+ else:
+ proc.kill()
proc.wait(timeout=5.0)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@scripts/generate_mcp_docs.py` around lines 61 - 93, The current subprocess
management in _start_mcp_server and _stop_mcp_server assumes POSIX-only APIs
(start_new_session, os.killpg, signal.SIGKILL) and will fail on Windows; update
the functions to detect the platform (using sys.platform or os.name) and provide
a Windows-safe fallback: when on POSIX keep the existing start_new_session and
os.killpg flow, but when on Windows avoid start_new_session and instead use the
Popen instance directly (proc.terminate()/proc.kill()) and proc.kill() as the
hard-kill fallback, catching and suppressing ProcessLookupError/AttributeError
as needed; alternatively, if Windows support is out of scope, raise a clear,
early PlatformError in _start_mcp_server explaining the script is POSIX-only so
callers get a friendly error instead of failing in the finally of
_stop_mcp_server.
There was a problem hiding this comment.
Thanks CodeRabbit (@coderabbitai) — acknowledged. All four are reasonable, but per the task brief this is a deliberately minimal prototype ("don't over-engineer") so I'm leaving them as follow-ups rather than expanding scope here:
- TCP vs. SSE readiness race — smoke test was stable across a dozen runs; noted in case it flakes later.
- Stale output directory — worth adding; small enough to include if AJ agrees the PR should grow.
- Captured stderr on startup failure — worth adding; same caveat.
- Windows subprocess fallback — out of scope for the prototype, but worth adding if this pattern moves into an internal tool library alongside
scripts/generate_mcp_docs.py. - Sibling
mcp-docs-installtask — intentionally skipped, per the PR description: we don't want to commit tomcpdocs-genas a first-party dep until the eval is accepted.
Happy to fold any/all of these in if AJ wants the prototype hardened before merge.
There was a problem hiding this comment.
Skipped: comment is from another GitHub bot.
Introduces `scripts/generate_mcp_markdown.py` (exposed via `poe mcp-docs-md`) which introspects the MCP server with `fastmcp inspect` and renders a small set of Markdown files under `docs/mcp-generated/`: - index.md — server overview + counts + TOC - tools.md — one H2 per tool with a GFM parameters table and collapsible input/output JSON schemas - resources.md — concrete resources and resource templates - prompts.md — prompts and their arguments Formatting is modeled on `mcpdocs-gen` (evaluated in PR #1013) but emitted as plain CommonMark + GFM + YAML front-matter + `<details>` blocks, so the pages render correctly in both Docusaurus and `pdoc` without MDX-only components. Every tool/resource/prompt has a stable slug anchor for deep-linking.
Summary
Prototype end-to-end workflow for generating static HTML documentation for the PyAirbyte MCP server using
mcpdocs-gen. The generator introspects a running FastMCP server over SSE and emits a searchable static site covering tools, resources, and prompts — positioned as a possible "pdoc3 for MCP servers."Changes:
scripts/generate_mcp_docs.py— standalone script that startsairbyte.mcp.server:appin SSE mode as a background subprocess (in its own process group), TCP-healthchecks the port, shells out tomcpdocs generate, and tears the server down cleanly on success, failure, orCtrl+C(SIGTERM with SIGKILL fallback).pyproject.toml— addspoe mcp-docs-generatetask alias..gitignore— ignores the generateddocs/mcp-generated/output directory.docs/CONTRIBUTING.md— adds a short section on how to regenerate MCP docs locally.mcpdocs-genis not added to project dependencies — it's installed ad-hoc viauv pip install mcpdocs-genper the contributing doc, since this is an exploration PR and we don't want to commit to the tool as a first-party dep yet.Smoke test: the script ran end-to-end against the live PyAirbyte MCP server and produced a site documenting 51 tools, 1 resource, and 1 prompt. See screenshots and candid evaluation in the session report.
Review & Testing Checklist for Human
uv pip install mcpdocs-gen && poe mcp-docs-generateon a fresh checkout and confirmdocs/mcp-generated/index.htmlrenders all 51 tools with arguments, descriptions, and JSON schemas.Ctrl+Cmid-run and verify no orphaned Python process is listening on port 8765 afterward (lsof -iTCP:8765orss -ltnp).mcpdocs-genorg-wide, restrict it to one-off dumps, or pick a different tool.Notes
8765(not8000) to avoid colliding with the existingpoe mcp-serve-ssetask, which hardcodes127.0.0.1:8000.--portis configurable.uv run ruff check .,uv run ruff format --check .) and type checking (uv run pyrefly check— repo uses pyrefly, not mypy) all pass locally.Link to Devin session: https://app.devin.ai/sessions/359e794efeb844b2a8adf02b5831f999
Requested by: Aaron ("AJ") Steers (@aaronsteers)
Summary by CodeRabbit
Release Notes
New Features
Documentation
Chores