Skip to content

docs-convert-gdoc: Add extracted Google Docs comments as footnotes to converted md file#113

Open
bburt-rh wants to merge 4 commits intoredhat-documentation:mainfrom
bburt-rh:gdoc-comment-extraction
Open

docs-convert-gdoc: Add extracted Google Docs comments as footnotes to converted md file#113
bburt-rh wants to merge 4 commits intoredhat-documentation:mainfrom
bburt-rh:gdoc-comment-extraction

Conversation

@bburt-rh
Copy link
Copy Markdown
Collaborator

@bburt-rh bburt-rh commented Apr 24, 2026

Summary

Pull comment threads from the Drive v3 API and insert them as Markdown footnotes when --comments is passed on the command line.

Resolved threads are excluded by default; --include-resolved brings them back.

Anchor matching uses whitespace-normalized fuzzy search with word-boundary snapping so footnote references never split a word.

Argument parsing migrated from sys.argv to argparse; backward compatible.

Changes

  • `gdoc2md.py`: Migrate arg parsing to `argparse`; add `fetch_comments()` using Drive v3 API; add `insert_comment_footnotes()` with whitespace-normalized fuzzy anchor matching and word-boundary snapping; wire into `fetch()` for doc mode
  • `SKILL.md`: Document `--comments` and `--include-resolved` flags with usage examples

Test plan

  • Run `gdoc2md.py` without flags -- verify backward compatibility
  • Run with `--comments` on a doc with open comments -- verify footnotes appear
  • Run with `--comments --include-resolved` -- verify resolved comments included
  • Run with `--comments` on a Slides URL -- verify warning and graceful skip

Closes #112

Made-with: Cursor

Summary by CodeRabbit

  • New Features
    • CLI now supports --comments and --include-resolved flags; comments import as numbered Markdown footnotes (anchored inline, unanchored appended) with nested replies.
    • Resolved threads excluded by default; Slides/Sheets ignore comments and emit a warning.
  • Documentation
    • Usage docs updated with examples and flag behavior.
  • Chores
    • Plugin version bumped; dependency attribution updated for workflow scripts.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

  • @coderabbitai resume to resume automatic reviews.
  • @coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

  • ▶️ Resume reviews
  • 🔍 Trigger review

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: e715fa89-9297-43a6-97a3-894458ee48f7

📥 Commits

Reviewing files that changed from the base of the PR and between eb4837a and e00a833.

📒 Files selected for processing (1)
  • scripts/deps.json

Walkthrough

Adds argparse CLI flags --comments and --include-resolved to gdoc2md, fetches Drive v3 comment threads for Google Docs exports, fuzzy-matches anchors to insert numbered footnote references (or endnotes for unanchored comments), appends footnote definitions with nested replies, and warns/ignores flags for non-Docs exports.

Changes

Cohort / File(s) Summary
Documentation
plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
Documented new CLI flags --comments and --include-resolved, anchored vs unanchored comment behavior, reply nesting, resolved-thread exclusion by default, and that flags apply only to Google Docs.
gdoc2md script (CLI, export, comments)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Replaced manual argv parsing with argparse; parse_and_validate_args() returns (file_id, output, mode, comments, include_resolved); fetch() signature extended to accept include_comments and include_resolved; added fetch_comments(...) (Drive v3 paging, optional resolved filtering), insert_comment_footnotes(...) (whitespace-normalized fuzzy anchor matching, inline [^N] insertion, endnotes for unanchored), helper normalization/snap functions, and stderr warning when --comments used for non-Docs exports.
Plugin manifest
plugins/docs-tools/.claude-plugin/plugin.json
Bumped plugin version from 0.0.52 to 0.0.53.
Dependency manifest
scripts/deps.json
Updated dependency/source attribution for create_mr.py (now lists PyGithub and python-gitlab imports); removed create_mr.sh tools usage entries for gh, glab, jq.

Sequence Diagram

sequenceDiagram
    participant User
    participant Script as gdoc2md.py
    participant DriveAPI as Google Drive API
    participant Exporter as Markdown Exporter

    User->>Script: Run with --comments [--include-resolved]
    Script->>DriveAPI: Export file content (Docs/Slides/Sheets)
    DriveAPI-->>Script: Exported content
    alt mode == Docs and include_comments
        Script->>DriveAPI: GET /drive/v3/files/{fileId}/comments (paged, include_resolved?)
        DriveAPI-->>Script: Comment threads (anchors, authors, replies)
        Script->>Script: Normalize text & fuzzy-match anchors
        Script->>Exporter: Insert inline [^N] refs and append footnote defs
    else non-Docs or no comments
        Script->>Script: Skip comment fetch (emit stderr warning if --comments)
    end
    Script-->>User: Final Markdown (with appended footnotes)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🚥 Pre-merge checks | ✅ 10
✅ Passed checks (10 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: adding Google Docs comment extraction as footnotes to the converted Markdown file.
Linked Issues check ✅ Passed All requirements from issue #112 are met: argparse migration [#112], --comments flag with Drive v3 API integration [#112], --include-resolved for resolved threads [#112], whitespace-normalized fuzzy anchor matching with word-boundary snapping [#112], inline footnote placement and unanchored footnotes [#112], Google Docs-only application [#112], and documentation updates [#112].
Out of Scope Changes check ✅ Passed All changes are within scope: gdoc2md.py enhancements for comment extraction, SKILL.md documentation, plugin version bump for release tracking, and deps.json updates reflecting workflow script changes from upstream rebase.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
No Real People Names In Style References ✅ Passed Pull request does not violate custom check. No real people names used as style references or examples in documentation, scripts, or plugin configuration.
Git Safety Rules ✅ Passed The pull request contains no git operations, git push commands, force push operations, or hardcoded git remote names in any modified files.
No Untrusted Mcp Servers ✅ Passed No MCP server installations from untrusted or any sources detected. PR contains only Python stdlib changes and documentation updates.
Skill And Script Conventions ✅ Passed The pull request adheres to skill and script naming conventions. SKILL.md uses the new flag format without prohibited 'plugin:' prefix or old slash-command syntax. gdoc2md.py contains no cross-skill calls or external script invocations requiring '${CLAUDE_PLUGIN_ROOT}' patterns.
Plugin Registry Consistency ✅ Passed The plugin registry is properly maintained with matching entries and correct version management in plugin.json only.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 359-381: The code only checks the first occurrence of norm_anchor
via norm_md.find(norm_anchor) and if that pos is already in used_positions it
falls back to unanchored footnote; change this to iterate/find subsequent
occurrences (e.g., use norm_md.find(norm_anchor, start) in a loop) until you
find a pos not in used_positions or pos == -1, then proceed to call
_find_original_end, _snap_to_word_boundary and insert label into markdown,
update norm_md, footnotes and fn_index; if no unused occurrence is found keep
the existing fallback behavior. Ensure you reference the existing symbols
norm_md, norm_anchor, used_positions, _find_original_end,
_snap_to_word_boundary, footnotes and fn_index.
- Around line 54-78: The CLI currently allows --include-resolved without
enabling --comments, which results in resolved threads being ignored; update the
argument handling after parser.parse_args() to enforce the dependency by setting
args.comments = True when args.include_resolved is true (or alternatively call
parser.error when args.include_resolved and not args.comments); modify the block
that currently checks args.comments (referencing args.include_resolved,
args.comments, MODE_MAP, and output calculation) so that include_resolved always
implies comments before returning file_id, output, mode, args.comments,
args.include_resolved and before fetch() / fetch_comments() are used.
- Around line 285-316: The comments fetch currently requests only one page
(pageSize=100) and ignores nextPageToken, causing loss of comments; update the
logic around fields, api_url, download(), and the loop that builds comments to
implement pagination: include nextPageToken in the fields parameter, make
repeated requests using pageToken until no nextPageToken is returned, accumulate
comment objects into the comments list across pages (preserving the existing
extraction of author, content, quotedFileContent/value, resolved, and replies),
and ensure includeDeleted/pageSize params are kept while passing the current
pageToken in subsequent api_url requests.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 2aa8670f-d9a8-4e23-8acf-397ede53a892

📥 Commits

Reviewing files that changed from the base of the PR and between a67154d and 52c0c96.

📒 Files selected for processing (2)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py

Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py Outdated
@bburt-rh bburt-rh force-pushed the gdoc-comment-extraction branch from 52c0c96 to 68e08af Compare April 24, 2026 17:26
Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (1)

8-8: Use skill-relative script path in the module usage example.

Line 8 currently shows python gdoc2md.py ...; align it with skill conventions (python3 scripts/gdoc2md.py ...) so docs and script usage stay consistent.

As per coding guidelines, **/skills/**/scripts/**/*.{sh,py,yaml,yml}: For same skill script references: use paths relative to the skill directory (e.g., python3 scripts/foo.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` at line 8,
Update the usage example in gdoc2md.py to use the skill-relative path and
python3: replace the existing "python gdoc2md.py [--comments]
[--include-resolved] <url> [output]" invocation with "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]" so the example matches the
repository convention; locate the usage/help string in the gdoc2md.py module
(the top-level usage text or the function that prints/returns the usage) and
update it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 375-377: anchor may be non-empty but normalize to an empty string
causing accidental matches; after calling _normalize(anchor) (result stored in
norm_anchor) check whether norm_anchor is truthy and only then set search_from
or proceed with anchor-based matching—if norm_anchor is empty, treat it as no
anchor (skip anchor handling) so labels won't be placed using an empty
normalized anchor.
- Around line 367-373: The footnote builder injects raw comment/reply text
(comment['content'] and reply['content']) into footnote_def which breaks
Markdown if they contain newlines; before appending to body_parts, normalize
each comment/reply string by collapsing all internal newlines and excessive
whitespace into single spaces and trimming surrounding whitespace (e.g., via a
small helper like sanitize_comment(text)), then use the sanitized values when
formatting f"**{comment['author']}{status}:** ..." and the reply f-strings so
the generated footnote_def remains a single-line, well-formed Markdown footnote.

---

Nitpick comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Line 8: Update the usage example in gdoc2md.py to use the skill-relative path
and python3: replace the existing "python gdoc2md.py [--comments]
[--include-resolved] <url> [output]" invocation with "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]" so the example matches the
repository convention; locate the usage/help string in the gdoc2md.py module
(the top-level usage text or the function that prints/returns the usage) and
update it accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ac09697a-cfc3-4a5c-8a58-bbf8104fc6d2

📥 Commits

Reviewing files that changed from the base of the PR and between 52c0c96 and 68e08af.

📒 Files selected for processing (2)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md

Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py Outdated
Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py Outdated
@bburt-rh bburt-rh force-pushed the gdoc-comment-extraction branch 3 times, most recently from b183621 to a004e05 Compare April 24, 2026 18:44
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (2)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (2)

367-373: ⚠️ Potential issue | 🟠 Major

Sanitize multiline comment/reply text before composing footnote definitions.

Line 367 and Line 371 currently inject raw content; embedded newlines can break footnote formatting and render malformed Markdown.

Suggested fix
-        body_parts = []
+        body_parts = []
+        def _flat(text: str) -> str:
+            return re.sub(r"\s+", " ", (text or "")).strip()
         status = " (resolved)" if comment["resolved"] else ""
         body_parts.append(
-            f"**{comment['author']}{status}:** {comment['content']}"
+            f"**{comment['author']}{status}:** {_flat(comment['content'])}"
         )
         for reply in comment["replies"]:
             body_parts.append(
-                f"    **{reply['author']}:** {reply['content']}"
+                f"    **{reply['author']}:** {_flat(reply['content'])}"
             )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` around
lines 367 - 373, Sanitize comment and reply text before composing footnote_def
to prevent embedded newlines from breaking footnote formatting: in the block
that builds body_parts (where comment['author'], comment['content'],
reply['author'], reply['content'] are used) normalize or escape newlines (e.g.,
replace newline characters with spaces or " \\n" escapes) and trim
trailing/leading whitespace, then join body_parts into footnote_def using the
sanitized strings (referenced symbols: body_parts, comment['content'],
reply['content'], footnote_def, label).

375-377: ⚠️ Potential issue | 🟡 Minor

Treat normalized-empty anchors as unanchored comments.

Line 375 allows whitespace-only anchors through; _normalize(anchor) can become empty and produce accidental matches.

Suggested fix
-        if anchor:
-            norm_anchor = _normalize(anchor)
+        norm_anchor = _normalize(anchor) if anchor else ""
+        if norm_anchor:
             search_from = 0
             pos = -1
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` around
lines 375 - 377, The code currently accepts anchors even when _normalize(anchor)
returns an empty string, causing accidental matches; after computing norm_anchor
= _normalize(anchor) in the anchor handling block, check whether norm_anchor is
non-empty (truthy) and only proceed to set search_from/use the anchor if it is;
otherwise treat the comment as unanchored (skip anchor-specific logic). Update
the logic around norm_anchor and any subsequent matching/search variables (e.g.,
search_from) in the same block so whitespace-only anchors are ignored.
🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (1)

8-8: Use a skill-relative python3 invocation in the usage string.

The usage example should align with skill script path conventions to avoid ambiguity in execution context.

Suggested update
-python gdoc2md.py [--comments] [--include-resolved] <url> [output]
+python3 scripts/gdoc2md.py [--comments] [--include-resolved] <url> [output]

As per coding guidelines, **/skills/**/scripts/**/*.{sh,py,yaml,yml} requires same-skill script references to use paths relative to the skill directory (e.g., python3 scripts/foo.py).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` at line 8,
Update the usage string in gdoc2md.py to use a skill-relative python3 invocation
instead of "python gdoc2md.py"; specifically change the example usage line
("python gdoc2md.py [--comments] [--include-resolved] <url> [output]") to use
"python3" and a skill-relative path (e.g., "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]") so it follows the same-skill
script reference convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Duplicate comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 367-373: Sanitize comment and reply text before composing
footnote_def to prevent embedded newlines from breaking footnote formatting: in
the block that builds body_parts (where comment['author'], comment['content'],
reply['author'], reply['content'] are used) normalize or escape newlines (e.g.,
replace newline characters with spaces or " \\n" escapes) and trim
trailing/leading whitespace, then join body_parts into footnote_def using the
sanitized strings (referenced symbols: body_parts, comment['content'],
reply['content'], footnote_def, label).
- Around line 375-377: The code currently accepts anchors even when
_normalize(anchor) returns an empty string, causing accidental matches; after
computing norm_anchor = _normalize(anchor) in the anchor handling block, check
whether norm_anchor is non-empty (truthy) and only proceed to set
search_from/use the anchor if it is; otherwise treat the comment as unanchored
(skip anchor-specific logic). Update the logic around norm_anchor and any
subsequent matching/search variables (e.g., search_from) in the same block so
whitespace-only anchors are ignored.

---

Nitpick comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Line 8: Update the usage string in gdoc2md.py to use a skill-relative python3
invocation instead of "python gdoc2md.py"; specifically change the example usage
line ("python gdoc2md.py [--comments] [--include-resolved] <url> [output]") to
use "python3" and a skill-relative path (e.g., "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]") so it follows the same-skill
script reference convention.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 331153d2-04d7-4d0d-a109-103a688675d0

📥 Commits

Reviewing files that changed from the base of the PR and between 68e08af and 2b61fa6.

📒 Files selected for processing (2)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py

@bburt-rh bburt-rh force-pushed the gdoc-comment-extraction branch from a004e05 to b6537ea Compare April 24, 2026 18:48
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 50-59: The CLI currently defines only --no-comments and
--include-resolved, so passing --comments errors; change argument parsing to
accept both --comments and --no-comments as mutually exclusive flags and set a
single boolean attribute (e.g., args.comments) that drives downstream logic;
update creation of the arguments around parser.add_argument for comments (and
the similar block at the other occurrence) to use
parser.add_mutually_exclusive_group(), add both --comments (store_true) and
--no-comments (store_false) mapped to the same dest ('comments'), and ensure
code that checks comment behavior and the existing --include-resolved handling
uses args.comments (and treats include_resolved as ignored when args.comments is
False).
- Around line 429-446: The loop in _find_original_end currently treats
whitespace as the literal set " \t\n\r", causing mismatches with _normalize
which uses \s+; update both whitespace checks inside _find_original_end (the if
original[i] in " \t\n\r" branches) to use original[i].isspace() so all Unicode
whitespace is handled consistently when computing consumed/chars_left and
in_space; adjust both the initial consumed loop and the chars_left loop
references to the same isspace() check to align behavior with _normalize().
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: fc1384bd-653c-4f65-8ff1-ca0fc7124b1f

📥 Commits

Reviewing files that changed from the base of the PR and between 2b61fa6 and a004e05.

📒 Files selected for processing (2)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py

Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py Outdated
@bburt-rh bburt-rh force-pushed the gdoc-comment-extraction branch 3 times, most recently from f8fa9e9 to 16cbc66 Compare April 24, 2026 19:17
@bburt-rh
Copy link
Copy Markdown
Collaborator Author

@coderabbitai resume

@bburt-rh
Copy link
Copy Markdown
Collaborator Author

@coderabbitai review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Review triggered.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Apr 24, 2026

✅ Actions performed

Reviews resumed.

@bburt-rh bburt-rh force-pushed the gdoc-comment-extraction branch 2 times, most recently from 75a2196 to b8d8d6d Compare April 24, 2026 19:56
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 379-394: The stored positions in used_positions (from norm_md)
become stale after inserting a label into markdown; change the tracking to
record offsets in the current markdown string (or a stable marker) instead of
raw norm_md positions: after finding pos in norm_md and computing end_of_anchor
via _find_original_end/_snap_to_word_boundary, map that location to the
corresponding character offset in markdown (or insert a temporary placeholder
token and track its index) and add that markdown offset to used_positions;
update the collision check logic that currently compares against norm_md
positions to compare against these markdown offsets so subsequent iterations see
the updated text and avoid re-matching anchors, and ensure fn_index/footnotes
handling remains the same.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: f6405d70-d423-46f5-8db5-65c6299ba37b

📥 Commits

Reviewing files that changed from the base of the PR and between 75a2196 and b8d8d6d.

📒 Files selected for processing (3)
  • plugins/docs-tools/.claude-plugin/plugin.json
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
✅ Files skipped from review due to trivial changes (2)
  • plugins/docs-tools/.claude-plugin/plugin.json
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md

Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 365-393: The loop currently searches norm_anchor inside norm_md
and mutates markdown as it finds matches, which invalidates later searches;
instead, first scan the original markdown (use _normalize, norm_md,
_find_original_end and _snap_to_word_boundary) to collect all valid
end_of_anchor offsets and their corresponding label/footnote_def entries without
modifying markdown or used_offsets, then sort those offsets descending and apply
insertions into markdown in reverse order (update used_offsets, append to
footnotes and increment fn_index as you insert) so earlier insertions don't
shift later match locations.
- Line 8: Update the usage example string in gdoc2md.py to use a skill-relative
path instead of the environment variable: replace the current example text
"python3 ${CLAUDE_SKILL_DIR}/scripts/gdoc2md.py [--comments]
[--include-resolved] <url> [output]" with "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]" so the docstring follows the
same-skill reference guideline; edit the top-level usage/README string in the
gdoc2md.py file where the example appears.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 5c87909b-3927-45b2-b625-1c438ace99c5

📥 Commits

Reviewing files that changed from the base of the PR and between b8d8d6d and 1ae9e4d.

📒 Files selected for processing (1)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py

Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Comment thread plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Pull comment threads from the Drive v3 API and insert them as Markdown
footnotes when --comments is passed.  Resolved threads are excluded by
default; --include-resolved brings them back.  Anchor matching uses
whitespace-normalized fuzzy search with word-boundary snapping so
footnote references never split a word.

Argument parsing migrated from sys.argv to argparse; backward compatible.

Closes redhat-documentation#112

Made-with: Cursor
…nchors

used_positions stored norm_md offsets that became stale after each
footnote insertion shifted the normalized string. Switch to tracking
original markdown offsets (end_of_anchor) which remain stable across
insertions. Re-normalize per iteration instead of after insertion.

Also restores ${CLAUDE_SKILL_DIR} in SKILL.md and docstring paths.
Refactor insert_comment_footnotes to a two-pass approach: first resolve
all anchor positions against the unmodified markdown, then apply
insertions from end to start so earlier labels cannot invalidate later
anchor matches.
@bburt-rh bburt-rh force-pushed the gdoc-comment-extraction branch from 1ae9e4d to eb4837a Compare April 27, 2026 15:40
Refactor insert_comment_footnotes to resolve all anchor positions
against the unmodified markdown first, then apply insertions in reverse
order so earlier labels cannot invalidate later matches.

Regenerate deps.json after rebase picked up create_mr.sh -> create_mr.py
migration from upstream.
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (1)

378-386: Avoid redundant _find_original_end call.

When the loop finds an unused offset at line 378, the same _find_original_end call is repeated at line 384 with identical arguments (pos == candidate). Consider caching the result.

Suggested optimization
             while True:
                 candidate = norm_md.find(norm_anchor, search_from)
                 if candidate == -1:
                     break
                 orig_end = _find_original_end(markdown, norm_md, candidate, len(norm_anchor))
                 if orig_end not in used_offsets:
                     pos = candidate
+                    end_of_anchor = orig_end
                     break
                 search_from = candidate + 1
             if pos != -1:
-                end_of_anchor = _find_original_end(
-                    markdown,
-                    norm_md,
-                    pos,
-                    len(norm_anchor),
-                )
                 end_of_anchor = _snap_to_word_boundary(markdown, end_of_anchor)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` around
lines 378 - 386, The code calls _find_original_end(markdown, norm_md, candidate,
len(norm_anchor)) twice for the same candidate; cache the first result (e.g.,
orig_end) when you detect orig_end not in used_offsets and set pos=candidate,
then reuse that cached value instead of calling _find_original_end again before
passing it to _snap_to_word_boundary and adding to used_offsets; update uses of
pos/candidate so the second branch uses the cached end_of_anchor value to avoid
redundant computation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 378-386: The code calls _find_original_end(markdown, norm_md,
candidate, len(norm_anchor)) twice for the same candidate; cache the first
result (e.g., orig_end) when you detect orig_end not in used_offsets and set
pos=candidate, then reuse that cached value instead of calling
_find_original_end again before passing it to _snap_to_word_boundary and adding
to used_offsets; update uses of pos/candidate so the second branch uses the
cached end_of_anchor value to avoid redundant computation.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: 57d36b67-15d1-4f61-8ff8-65403b1570cf

📥 Commits

Reviewing files that changed from the base of the PR and between 1ae9e4d and eb4837a.

📒 Files selected for processing (3)
  • plugins/docs-tools/.claude-plugin/plugin.json
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
  • plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
✅ Files skipped from review due to trivial changes (1)
  • plugins/docs-tools/.claude-plugin/plugin.json
🚧 Files skipped from review as they are similar to previous changes (1)
  • plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md

@aireilly
Copy link
Copy Markdown
Member

aireilly commented Apr 27, 2026

/lgtm provided this was tested and verified locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

docs-convert-gdoc: Add Google Docs comment extraction as Markdown footnotes

2 participants