docs-convert-gdoc: Add extracted Google Docs comments as footnotes to converted md file#113
docs-convert-gdoc: Add extracted Google Docs comments as footnotes to converted md file#113bburt-rh wants to merge 4 commits intoredhat-documentation:mainfrom
Conversation
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: CHILL Plan: Enterprise Run ID: 📒 Files selected for processing (1)
WalkthroughAdds argparse CLI flags Changes
Sequence DiagramsequenceDiagram
participant User
participant Script as gdoc2md.py
participant DriveAPI as Google Drive API
participant Exporter as Markdown Exporter
User->>Script: Run with --comments [--include-resolved]
Script->>DriveAPI: Export file content (Docs/Slides/Sheets)
DriveAPI-->>Script: Exported content
alt mode == Docs and include_comments
Script->>DriveAPI: GET /drive/v3/files/{fileId}/comments (paged, include_resolved?)
DriveAPI-->>Script: Comment threads (anchors, authors, replies)
Script->>Script: Normalize text & fuzzy-match anchors
Script->>Exporter: Insert inline [^N] refs and append footnote defs
else non-Docs or no comments
Script->>Script: Skip comment fetch (emit stderr warning if --comments)
end
Script-->>User: Final Markdown (with appended footnotes)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes 🚥 Pre-merge checks | ✅ 10✅ Passed checks (10 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 359-381: The code only checks the first occurrence of norm_anchor
via norm_md.find(norm_anchor) and if that pos is already in used_positions it
falls back to unanchored footnote; change this to iterate/find subsequent
occurrences (e.g., use norm_md.find(norm_anchor, start) in a loop) until you
find a pos not in used_positions or pos == -1, then proceed to call
_find_original_end, _snap_to_word_boundary and insert label into markdown,
update norm_md, footnotes and fn_index; if no unused occurrence is found keep
the existing fallback behavior. Ensure you reference the existing symbols
norm_md, norm_anchor, used_positions, _find_original_end,
_snap_to_word_boundary, footnotes and fn_index.
- Around line 54-78: The CLI currently allows --include-resolved without
enabling --comments, which results in resolved threads being ignored; update the
argument handling after parser.parse_args() to enforce the dependency by setting
args.comments = True when args.include_resolved is true (or alternatively call
parser.error when args.include_resolved and not args.comments); modify the block
that currently checks args.comments (referencing args.include_resolved,
args.comments, MODE_MAP, and output calculation) so that include_resolved always
implies comments before returning file_id, output, mode, args.comments,
args.include_resolved and before fetch() / fetch_comments() are used.
- Around line 285-316: The comments fetch currently requests only one page
(pageSize=100) and ignores nextPageToken, causing loss of comments; update the
logic around fields, api_url, download(), and the loop that builds comments to
implement pagination: include nextPageToken in the fields parameter, make
repeated requests using pageToken until no nextPageToken is returned, accumulate
comment objects into the comments list across pages (preserving the existing
extraction of author, content, quotedFileContent/value, resolved, and replies),
and ensure includeDeleted/pageSize params are kept while passing the current
pageToken in subsequent api_url requests.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 2aa8670f-d9a8-4e23-8acf-397ede53a892
📒 Files selected for processing (2)
plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.mdplugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
52c0c96 to
68e08af
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (1)
8-8: Use skill-relative script path in the module usage example.Line 8 currently shows
python gdoc2md.py ...; align it with skill conventions (python3 scripts/gdoc2md.py ...) so docs and script usage stay consistent.As per coding guidelines,
**/skills/**/scripts/**/*.{sh,py,yaml,yml}: For same skill script references: use paths relative to the skill directory (e.g.,python3 scripts/foo.py).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` at line 8, Update the usage example in gdoc2md.py to use the skill-relative path and python3: replace the existing "python gdoc2md.py [--comments] [--include-resolved] <url> [output]" invocation with "python3 scripts/gdoc2md.py [--comments] [--include-resolved] <url> [output]" so the example matches the repository convention; locate the usage/help string in the gdoc2md.py module (the top-level usage text or the function that prints/returns the usage) and update it accordingly.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 375-377: anchor may be non-empty but normalize to an empty string
causing accidental matches; after calling _normalize(anchor) (result stored in
norm_anchor) check whether norm_anchor is truthy and only then set search_from
or proceed with anchor-based matching—if norm_anchor is empty, treat it as no
anchor (skip anchor handling) so labels won't be placed using an empty
normalized anchor.
- Around line 367-373: The footnote builder injects raw comment/reply text
(comment['content'] and reply['content']) into footnote_def which breaks
Markdown if they contain newlines; before appending to body_parts, normalize
each comment/reply string by collapsing all internal newlines and excessive
whitespace into single spaces and trimming surrounding whitespace (e.g., via a
small helper like sanitize_comment(text)), then use the sanitized values when
formatting f"**{comment['author']}{status}:** ..." and the reply f-strings so
the generated footnote_def remains a single-line, well-formed Markdown footnote.
---
Nitpick comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Line 8: Update the usage example in gdoc2md.py to use the skill-relative path
and python3: replace the existing "python gdoc2md.py [--comments]
[--include-resolved] <url> [output]" invocation with "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]" so the example matches the
repository convention; locate the usage/help string in the gdoc2md.py module
(the top-level usage text or the function that prints/returns the usage) and
update it accordingly.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: ac09697a-cfc3-4a5c-8a58-bbf8104fc6d2
📒 Files selected for processing (2)
plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.mdplugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
b183621 to
a004e05
Compare
There was a problem hiding this comment.
♻️ Duplicate comments (2)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (2)
367-373:⚠️ Potential issue | 🟠 MajorSanitize multiline comment/reply text before composing footnote definitions.
Line 367 and Line 371 currently inject raw content; embedded newlines can break footnote formatting and render malformed Markdown.
Suggested fix
- body_parts = [] + body_parts = [] + def _flat(text: str) -> str: + return re.sub(r"\s+", " ", (text or "")).strip() status = " (resolved)" if comment["resolved"] else "" body_parts.append( - f"**{comment['author']}{status}:** {comment['content']}" + f"**{comment['author']}{status}:** {_flat(comment['content'])}" ) for reply in comment["replies"]: body_parts.append( - f" **{reply['author']}:** {reply['content']}" + f" **{reply['author']}:** {_flat(reply['content'])}" )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` around lines 367 - 373, Sanitize comment and reply text before composing footnote_def to prevent embedded newlines from breaking footnote formatting: in the block that builds body_parts (where comment['author'], comment['content'], reply['author'], reply['content'] are used) normalize or escape newlines (e.g., replace newline characters with spaces or " \\n" escapes) and trim trailing/leading whitespace, then join body_parts into footnote_def using the sanitized strings (referenced symbols: body_parts, comment['content'], reply['content'], footnote_def, label).
375-377:⚠️ Potential issue | 🟡 MinorTreat normalized-empty anchors as unanchored comments.
Line 375 allows whitespace-only anchors through;
_normalize(anchor)can become empty and produce accidental matches.Suggested fix
- if anchor: - norm_anchor = _normalize(anchor) + norm_anchor = _normalize(anchor) if anchor else "" + if norm_anchor: search_from = 0 pos = -1🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` around lines 375 - 377, The code currently accepts anchors even when _normalize(anchor) returns an empty string, causing accidental matches; after computing norm_anchor = _normalize(anchor) in the anchor handling block, check whether norm_anchor is non-empty (truthy) and only proceed to set search_from/use the anchor if it is; otherwise treat the comment as unanchored (skip anchor-specific logic). Update the logic around norm_anchor and any subsequent matching/search variables (e.g., search_from) in the same block so whitespace-only anchors are ignored.
🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (1)
8-8: Use a skill-relativepython3invocation in the usage string.The usage example should align with skill script path conventions to avoid ambiguity in execution context.
Suggested update
-python gdoc2md.py [--comments] [--include-resolved] <url> [output] +python3 scripts/gdoc2md.py [--comments] [--include-resolved] <url> [output]As per coding guidelines,
**/skills/**/scripts/**/*.{sh,py,yaml,yml}requires same-skill script references to use paths relative to the skill directory (e.g.,python3 scripts/foo.py).🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` at line 8, Update the usage string in gdoc2md.py to use a skill-relative python3 invocation instead of "python gdoc2md.py"; specifically change the example usage line ("python gdoc2md.py [--comments] [--include-resolved] <url> [output]") to use "python3" and a skill-relative path (e.g., "python3 scripts/gdoc2md.py [--comments] [--include-resolved] <url> [output]") so it follows the same-skill script reference convention.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Duplicate comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 367-373: Sanitize comment and reply text before composing
footnote_def to prevent embedded newlines from breaking footnote formatting: in
the block that builds body_parts (where comment['author'], comment['content'],
reply['author'], reply['content'] are used) normalize or escape newlines (e.g.,
replace newline characters with spaces or " \\n" escapes) and trim
trailing/leading whitespace, then join body_parts into footnote_def using the
sanitized strings (referenced symbols: body_parts, comment['content'],
reply['content'], footnote_def, label).
- Around line 375-377: The code currently accepts anchors even when
_normalize(anchor) returns an empty string, causing accidental matches; after
computing norm_anchor = _normalize(anchor) in the anchor handling block, check
whether norm_anchor is non-empty (truthy) and only proceed to set
search_from/use the anchor if it is; otherwise treat the comment as unanchored
(skip anchor-specific logic). Update the logic around norm_anchor and any
subsequent matching/search variables (e.g., search_from) in the same block so
whitespace-only anchors are ignored.
---
Nitpick comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Line 8: Update the usage string in gdoc2md.py to use a skill-relative python3
invocation instead of "python gdoc2md.py"; specifically change the example usage
line ("python gdoc2md.py [--comments] [--include-resolved] <url> [output]") to
use "python3" and a skill-relative path (e.g., "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]") so it follows the same-skill
script reference convention.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 331153d2-04d7-4d0d-a109-103a688675d0
📒 Files selected for processing (2)
plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.mdplugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
a004e05 to
b6537ea
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 50-59: The CLI currently defines only --no-comments and
--include-resolved, so passing --comments errors; change argument parsing to
accept both --comments and --no-comments as mutually exclusive flags and set a
single boolean attribute (e.g., args.comments) that drives downstream logic;
update creation of the arguments around parser.add_argument for comments (and
the similar block at the other occurrence) to use
parser.add_mutually_exclusive_group(), add both --comments (store_true) and
--no-comments (store_false) mapped to the same dest ('comments'), and ensure
code that checks comment behavior and the existing --include-resolved handling
uses args.comments (and treats include_resolved as ignored when args.comments is
False).
- Around line 429-446: The loop in _find_original_end currently treats
whitespace as the literal set " \t\n\r", causing mismatches with _normalize
which uses \s+; update both whitespace checks inside _find_original_end (the if
original[i] in " \t\n\r" branches) to use original[i].isspace() so all Unicode
whitespace is handled consistently when computing consumed/chars_left and
in_space; adjust both the initial consumed loop and the chars_left loop
references to the same isspace() check to align behavior with _normalize().
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: fc1384bd-653c-4f65-8ff1-ca0fc7124b1f
📒 Files selected for processing (2)
plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.mdplugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
f8fa9e9 to
16cbc66
Compare
|
@coderabbitai resume |
|
@coderabbitai review |
✅ Actions performedReview triggered.
|
✅ Actions performedReviews resumed. |
75a2196 to
b8d8d6d
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 379-394: The stored positions in used_positions (from norm_md)
become stale after inserting a label into markdown; change the tracking to
record offsets in the current markdown string (or a stable marker) instead of
raw norm_md positions: after finding pos in norm_md and computing end_of_anchor
via _find_original_end/_snap_to_word_boundary, map that location to the
corresponding character offset in markdown (or insert a temporary placeholder
token and track its index) and add that markdown offset to used_positions;
update the collision check logic that currently compares against norm_md
positions to compare against these markdown offsets so subsequent iterations see
the updated text and avoid re-matching anchors, and ensure fn_index/footnotes
handling remains the same.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: f6405d70-d423-46f5-8db5-65c6299ba37b
📒 Files selected for processing (3)
plugins/docs-tools/.claude-plugin/plugin.jsonplugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.mdplugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
✅ Files skipped from review due to trivial changes (2)
- plugins/docs-tools/.claude-plugin/plugin.json
- plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 365-393: The loop currently searches norm_anchor inside norm_md
and mutates markdown as it finds matches, which invalidates later searches;
instead, first scan the original markdown (use _normalize, norm_md,
_find_original_end and _snap_to_word_boundary) to collect all valid
end_of_anchor offsets and their corresponding label/footnote_def entries without
modifying markdown or used_offsets, then sort those offsets descending and apply
insertions into markdown in reverse order (update used_offsets, append to
footnotes and increment fn_index as you insert) so earlier insertions don't
shift later match locations.
- Line 8: Update the usage example string in gdoc2md.py to use a skill-relative
path instead of the environment variable: replace the current example text
"python3 ${CLAUDE_SKILL_DIR}/scripts/gdoc2md.py [--comments]
[--include-resolved] <url> [output]" with "python3 scripts/gdoc2md.py
[--comments] [--include-resolved] <url> [output]" so the docstring follows the
same-skill reference guideline; edit the top-level usage/README string in the
gdoc2md.py file where the example appears.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 5c87909b-3927-45b2-b625-1c438ace99c5
📒 Files selected for processing (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
Pull comment threads from the Drive v3 API and insert them as Markdown footnotes when --comments is passed. Resolved threads are excluded by default; --include-resolved brings them back. Anchor matching uses whitespace-normalized fuzzy search with word-boundary snapping so footnote references never split a word. Argument parsing migrated from sys.argv to argparse; backward compatible. Closes redhat-documentation#112 Made-with: Cursor
…nchors
used_positions stored norm_md offsets that became stale after each
footnote insertion shifted the normalized string. Switch to tracking
original markdown offsets (end_of_anchor) which remain stable across
insertions. Re-normalize per iteration instead of after insertion.
Also restores ${CLAUDE_SKILL_DIR} in SKILL.md and docstring paths.
Refactor insert_comment_footnotes to a two-pass approach: first resolve all anchor positions against the unmodified markdown, then apply insertions from end to start so earlier labels cannot invalidate later anchor matches.
1ae9e4d to
eb4837a
Compare
Refactor insert_comment_footnotes to resolve all anchor positions against the unmodified markdown first, then apply insertions in reverse order so earlier labels cannot invalidate later matches. Regenerate deps.json after rebase picked up create_mr.sh -> create_mr.py migration from upstream.
There was a problem hiding this comment.
🧹 Nitpick comments (1)
plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py (1)
378-386: Avoid redundant_find_original_endcall.When the loop finds an unused offset at line 378, the same
_find_original_endcall is repeated at line 384 with identical arguments (pos == candidate). Consider caching the result.Suggested optimization
while True: candidate = norm_md.find(norm_anchor, search_from) if candidate == -1: break orig_end = _find_original_end(markdown, norm_md, candidate, len(norm_anchor)) if orig_end not in used_offsets: pos = candidate + end_of_anchor = orig_end break search_from = candidate + 1 if pos != -1: - end_of_anchor = _find_original_end( - markdown, - norm_md, - pos, - len(norm_anchor), - ) end_of_anchor = _snap_to_word_boundary(markdown, end_of_anchor)🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py` around lines 378 - 386, The code calls _find_original_end(markdown, norm_md, candidate, len(norm_anchor)) twice for the same candidate; cache the first result (e.g., orig_end) when you detect orig_end not in used_offsets and set pos=candidate, then reuse that cached value instead of calling _find_original_end again before passing it to _snap_to_word_boundary and adding to used_offsets; update uses of pos/candidate so the second branch uses the cached end_of_anchor value to avoid redundant computation.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@plugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py`:
- Around line 378-386: The code calls _find_original_end(markdown, norm_md,
candidate, len(norm_anchor)) twice for the same candidate; cache the first
result (e.g., orig_end) when you detect orig_end not in used_offsets and set
pos=candidate, then reuse that cached value instead of calling
_find_original_end again before passing it to _snap_to_word_boundary and adding
to used_offsets; update uses of pos/candidate so the second branch uses the
cached end_of_anchor value to avoid redundant computation.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Enterprise
Run ID: 57d36b67-15d1-4f61-8ff8-65403b1570cf
📒 Files selected for processing (3)
plugins/docs-tools/.claude-plugin/plugin.jsonplugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.mdplugins/docs-tools/skills/docs-convert-gdoc-md/scripts/gdoc2md.py
✅ Files skipped from review due to trivial changes (1)
- plugins/docs-tools/.claude-plugin/plugin.json
🚧 Files skipped from review as they are similar to previous changes (1)
- plugins/docs-tools/skills/docs-convert-gdoc-md/SKILL.md
|
/lgtm provided this was tested and verified locally. |
Summary
Pull comment threads from the Drive v3 API and insert them as Markdown footnotes when
--commentsis passed on the command line.Resolved threads are excluded by default;
--include-resolvedbrings them back.Anchor matching uses whitespace-normalized fuzzy search with word-boundary snapping so footnote references never split a word.
Argument parsing migrated from
sys.argvtoargparse; backward compatible.Changes
Test plan
Closes #112
Made-with: Cursor
Summary by CodeRabbit