fix(jike): harden export and plugin packaging#2
Conversation
Converge repository scripts onto package entry points and add packaged export commands. Harden Markdown export content, image downloads, URL parameters, token output, and marketplace metadata. Authored-by: GPT-5 <codex@openai.com>
📝 WalkthroughSummary by CodeRabbit
WalkthroughThis release consolidates duplicated script implementations into reusable library modules, introduces a complete export pipeline with security hardening, enhances the HTTP client with automatic retry logic and typed responses, and establishes a new CLI architecture. The package version is bumped to 0.4.2 with dependency constraints tightened and comprehensive CI testing added. ChangesCore Library and CLI Release v0.4.2
Release Configuration, Dependencies, and Documentation
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes The PR introduces substantial new code (export pipeline, CLI refactoring, test suite) across multiple functional areas. While individual components are well-structured, reviewers must verify security hardening (image validation, HTML escaping, path safety), automatic retry logic correctness (401/429 handling), checkpoint resumability, and proper argument dispatch across the new module boundaries. The refactoring consolidates duplicated logic, requiring validation that the split is clean and all behaviors are preserved. Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 79528fca6d
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| return tokens | ||
|
|
||
| def _write_token_file(path: Path, tokens: TokenPair) -> None: | ||
| fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) |
There was a problem hiding this comment.
Reject symlink targets for auth token output
Writing --out with os.open(..., O_CREAT|O_TRUNC) follows symlinks, so on a shared or attacker-controlled directory an attacker can pre-create a symlink and make jike auth --out ... overwrite an unintended file. This is a security regression in the new file-output path despite advertising secure token storage; use O_NOFOLLOW (where available) plus an atomic temp-file replace strategy in the destination directory to avoid symlink clobbering.
Useful? React with 👍 / 👎.
| return self._request(method, path, retry_on_401=False, **kwargs) | ||
|
|
||
| if resp.status_code == 429 and retry_on_429: | ||
| time.sleep(_retry_after(resp)) | ||
| return self._request(method, path, retry_on_429=False, **kwargs) |
There was a problem hiding this comment.
Preserve both retry guards across recursive requests
The retry recursion resets the other guard: the 401 path retries without forwarding retry_on_429, and the 429 path retries without forwarding retry_on_401. If the server alternates 401 and 429 responses (for example during token expiry plus rate limiting), the function can recurse repeatedly until a RecursionError instead of failing cleanly after one retry of each class.
Useful? React with 👍 / 👎.
| return _api_call(method, path, at, rt, retry=False, **kwargs) | ||
| if resp.status_code == 429 and retry_429: | ||
| time.sleep(_retry_after(resp)) | ||
| return _api_call(method, path, at, rt, retry_429=False, **kwargs) |
There was a problem hiding this comment.
Keep retry flags when recursing in export API helper
As in the main client, _api_call drops one retry flag when handling the other status (401 retry omits retry_429, 429 retry omits retry). Under alternating 401/429 responses, this can keep re-enabling retries and recurse until stack overflow rather than terminating after bounded retries.
Useful? React with 👍 / 👎.
| return tokens | ||
|
|
||
| def _write_token_file(path: Path, tokens: TokenPair) -> None: | ||
| fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) |
There was a problem hiding this comment.
Enforce 0600 when overwriting existing token files
The mode argument on os.open(..., O_CREAT|O_TRUNC, 0o600) only applies when creating a new file, so existing --out files keep their old permissions (for example 0644). That means tokens can remain readable by other local users even though the CLI reports they were written with mode 0600. Explicitly fchmod the opened FD (or chmod the path) after opening.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 8
🧹 Nitpick comments (1)
.github/workflows/ci.yml (1)
19-22: ⚡ Quick winSecurity: Pin GitHub Actions to commit hashes.
Actions are not pinned to hashes, which creates a supply-chain risk if tags are moved or compromised. Pinning to commit hashes (with tag comments) is more secure.
🔒 Proposed fix
- - uses: actions/checkout@v4 - - uses: actions/setup-python@v5 + - uses: actions/checkout@11bd71901bbe5b1630ceea73d27597364c9af683 # v4.2.2 + - uses: actions/setup-python@0b93645e9fea7318ecaed2b359559ac225c90a2b # v5.3.0 with: python-version: ${{ matrix.python-version }}🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In @.github/workflows/ci.yml around lines 19 - 22, Pin the workflow steps using actions/checkout@v4 and actions/setup-python@v5 to their specific commit SHAs instead of the floating tag; update the uses entries for actions/checkout and actions/setup-python to the corresponding full commit hash (and keep the tag as a comment for readability) so the CI references an immutable version.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/ci.yml:
- Line 19: The checkout step using actions/checkout@v4 currently leaves
GITHUB_TOKEN persisted; update that checkout step configuration to add
persist-credentials: false so the workflow does not expose the runner's token to
subsequent steps (keep other options like fetch-depth or submodules unchanged
unless needed). Locate the step invoking actions/checkout@v4 in the CI workflow
and add the persist-credentials: false input to the step definition.
In `@scripts/auth.py`:
- Around line 6-8: The code computes SRC = Path(__file__).resolve().parents[1] /
"src" and unconditionally inserts it into sys.path; instead, first check
SRC.is_dir() and fail fast if it's missing to avoid falling back to an installed
"jike" package—e.g., if not SRC.is_dir(): print an explicit error and exit (or
raise SystemExit/RuntimeError) rather than mutating sys.path; only call
sys.path.insert(0, str(SRC)) when the directory exists, and apply the same
SRC.is_dir() guard to the other thin wrapper scripts that perform the same
import setup.
In `@src/jike/auth.py`:
- Around line 183-187: The _write_token_file function currently truncates the
user-provided path (following symlinks) and can leave partial JSON; change it to
write atomically by creating a new temporary file in path.parent (ensure the
parent directory exists), open that temp with mode 0o600, write tokens.to_dict()
JSON to it, flush and call os.fsync on the file descriptor to persist data,
close it, then atomically replace the target with os.replace(tmp_path, path);
this prevents symlink clobbering and truncated JSON while keeping TokenPair
serialization unchanged.
In `@src/jike/client_cli.py`:
- Around line 97-102: _warn_process_visible_tokens currently only checks for
"--access-token" and "--refresh-token" exact entries and misses equals-style
flags; update the check in _warn_process_visible_tokens to detect either the
flag alone or any argument that starts with "--access-token=" or
"--refresh-token=" (e.g., use membership OR startswith checks against argv
entries) so the warning is printed for both space-separated and equals-style
token usages.
In `@src/jike/client.py`:
- Around line 57-63: The recursive calls in _request drop the other retry flag;
when handling a 401 you must call _request with retry_on_401=False while
preserving retry_on_429, and when handling a 429 call _request with
retry_on_429=False while preserving retry_on_401, so update the two recursive
invocations to pass both retry_on_401 and retry_on_429 (only flipping the flag
you consumed) so the caller's preferences are not silently changed.
In `@src/jike/export_utils.py`:
- Around line 162-165: The atomic_write function can fail when the target's
parent directories don't exist; before creating the temporary file (tmp) ensure
its parent directory exists by calling Path.mkdir with parents=True and
exist_ok=True on tmp.parent (or path.parent) so the directory tree is created
before tmp.write_text and os.replace are called; keep the rest of atomic_write
(tmp name and os.replace) unchanged.
In `@src/jike/export.py`:
- Around line 115-117: _warn_process_visible_tokens currently only detects
separate flag tokens like "--access-token" or "--refresh-token" and misses
equals-style forms like "--access-token=..."; update the check to detect both
forms by iterating argv and treating an arg as a match if it equals the flag or
startswith the flag + "=" (e.g., any(a == "--access-token" or
a.startswith("--access-token=") or a == "--refresh-token" or
a.startswith("--refresh-token=") for a in argv)), then print the same stderr
warning when any match is found.
- Around line 86-98: The pagination loop in export_user posts can hang if the
API returns a truthy load_more_key but the cursor doesn't advance; modify the
loop in the block using fetch_user_posts to detect non-advancing cursors by
tracking the previous load_more_key (and optionally previous (at, rt) or total
posts count) and aborting after a small number of repeated, unchanged cursors
(or when returned batch length is zero despite a truthy load_more_key); when
aborting, persist the checkpoint via _save_checkpoint and return the collected
posts, at, rt; reference variables/functions: fetch_user_posts, load_more_key,
_save_checkpoint, posts, page, RATE_LIMIT_DELAY.
---
Nitpick comments:
In @.github/workflows/ci.yml:
- Around line 19-22: Pin the workflow steps using actions/checkout@v4 and
actions/setup-python@v5 to their specific commit SHAs instead of the floating
tag; update the uses entries for actions/checkout and actions/setup-python to
the corresponding full commit hash (and keep the tag as a comment for
readability) so the CI references an immutable version.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro Plus
Run ID: b8ca556b-be9a-4c97-a3bd-3572eb7f7106
📒 Files selected for processing (30)
.agents/plugins/marketplace.json.claude-plugin/marketplace.json.claude-plugin/plugin.json.codex-plugin/plugin.json.github/workflows/ci.yml.gitignoreCHANGELOG.mdLICENSEREADME.mdSECURITY.mdSKILL.mdpyproject.tomlreferences/api.mdscripts/auth.pyscripts/client.pyscripts/export.pyskills/jike/SKILL.mdsrc/jike/__init__.pysrc/jike/__main__.pysrc/jike/auth.pysrc/jike/client.pysrc/jike/client_cli.pysrc/jike/export.pysrc/jike/export_utils.pysrc/jike/types.pytests/conftest.pytests/test_auth.pytests/test_client.pytests/test_export.pytests/test_main.py
| matrix: | ||
| python-version: ["3.10", "3.13"] | ||
| steps: | ||
| - uses: actions/checkout@v4 |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Security: Set persist-credentials: false to prevent token exposure.
The workflow does not set persist-credentials: false, leaving the GITHUB_TOKEN available in subsequent steps. If any step writes files or uploads artifacts, the token could be exposed.
🔒 Proposed fix
- uses: actions/checkout@v4
+ with:
+ persist-credentials: false
- uses: actions/setup-python@v5📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| - uses: actions/checkout@v4 | |
| - uses: actions/checkout@v4 | |
| with: | |
| persist-credentials: false |
🧰 Tools
🪛 zizmor (1.25.2)
[warning] 19-19: credential persistence through GitHub Actions artifacts (artipacked): does not set persist-credentials: false
(artipacked)
[error] 19-19: unpinned action reference (unpinned-uses): action is not pinned to a hash (required by blanket policy)
(unpinned-uses)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In @.github/workflows/ci.yml at line 19, The checkout step using
actions/checkout@v4 currently leaves GITHUB_TOKEN persisted; update that
checkout step configuration to add persist-credentials: false so the workflow
does not expose the runner's token to subsequent steps (keep other options like
fetch-depth or submodules unchanged unless needed). Locate the step invoking
actions/checkout@v4 in the CI workflow and add the persist-credentials: false
input to the step definition.
| SRC = Path(__file__).resolve().parents[1] / "src" | ||
| if str(SRC) not in sys.path: | ||
| sys.path.insert(0, str(SRC)) |
There was a problem hiding this comment.
Fail fast when the bundled src/ tree is missing.
If scripts/../src is absent here, the later import jike... will fall through to any other jike package already on sys.path, which can silently run a stale installed version instead of this checkout/plugin bundle. Please guard SRC.is_dir() before mutating sys.path, and mirror the same check in the other thin wrappers in this PR.
Proposed fix
SRC = Path(__file__).resolve().parents[1] / "src"
+if not SRC.is_dir():
+ raise RuntimeError(f"Expected bundled src directory at {SRC}")
if str(SRC) not in sys.path:
sys.path.insert(0, str(SRC))📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| SRC = Path(__file__).resolve().parents[1] / "src" | |
| if str(SRC) not in sys.path: | |
| sys.path.insert(0, str(SRC)) | |
| SRC = Path(__file__).resolve().parents[1] / "src" | |
| if not SRC.is_dir(): | |
| raise RuntimeError(f"Expected bundled src directory at {SRC}") | |
| if str(SRC) not in sys.path: | |
| sys.path.insert(0, str(SRC)) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@scripts/auth.py` around lines 6 - 8, The code computes SRC =
Path(__file__).resolve().parents[1] / "src" and unconditionally inserts it into
sys.path; instead, first check SRC.is_dir() and fail fast if it's missing to
avoid falling back to an installed "jike" package—e.g., if not SRC.is_dir():
print an explicit error and exit (or raise SystemExit/RuntimeError) rather than
mutating sys.path; only call sys.path.insert(0, str(SRC)) when the directory
exists, and apply the same SRC.is_dir() guard to the other thin wrapper scripts
that perform the same import setup.
| def _write_token_file(path: Path, tokens: TokenPair) -> None: | ||
| fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) | ||
| with os.fdopen(fd, "w", encoding="utf-8") as f: | ||
| json.dump(tokens.to_dict(), f, indent=2) | ||
| f.write("\n") |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Locate and inspect the relevant function and surrounding context
sed -n '140,230p' src/jike/auth.py
# Show imports to see whether tempfile/os.replace/etc are already used elsewhere in the module
sed -n '1,120p' src/jike/auth.py
# Search for other token file write logic and for where the destination path is derived
rg -n "_write_token_file|token_file|out\b|--out|write_token|replace\(|O_TRUNC|fdopen\(" src/jike/auth.pyRepository: MidnightDarling/jike-skill
Length of output: 8425
🏁 Script executed:
#!/bin/bash
set -euo pipefail
# Check python version constraints
ls -la
fd -a -t f 'pyproject.toml|setup.cfg|setup.py|requirements*.txt|Pipfile|tox.ini' . | sed -n '1,200p' || true
if [ -f pyproject.toml ]; then
rg -n "requires-python|python_requires" pyproject.toml || true
fi
if [ -f setup.cfg ]; then
rg -n "python_requires" setup.cfg || true
fi
if [ -f setup.py ]; then
rg -n "python_requires" setup.py || true
fi
if [ -f tox.ini ]; then
rg -n "envlist|basepython" tox.ini || true
fi
# Also sanity-check whether missing_ok is used elsewhere
rg -n "missing_ok=True" -S src . || trueRepository: MidnightDarling/jike-skill
Length of output: 2536
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,260p' tests/test_auth.py
rg -n "out|--out|_write_token_file|token file|symlink|O_NOFOLLOW|replace|atomic" tests/test_auth.py src/jike/auth.pyRepository: MidnightDarling/jike-skill
Length of output: 10604
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "_write_token_file|--out|Tokens written to|mode 0600|args\.out|O_TRUNC|replace\(" tests/test_auth.py
# If any matches exist, print the surrounding sections
for n in $(rg -n "_write_token_file|--out|Tokens written to|mode 0600|args\.out" tests/test_auth.py | cut -d: -f1 | sort -u); do
echo "---- around line $n ----"
sed -n "$((n-25)),$((n+80))p" tests/test_auth.py
doneRepository: MidnightDarling/jike-skill
Length of output: 52
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "args\.out|--out|_write_token_file|Tokens written" tests/test_auth.py || true
rg -n "symlink|O_NOFOLLOW|os\.replace|atomic" tests/test_auth.py || trueRepository: MidnightDarling/jike-skill
Length of output: 52
🏁 Script executed:
#!/bin/bash
set -euo pipefail
echo "== matches for --out / _write_token_file =="
rg -n "args\.out|--out|_write_token_file|Tokens written" tests/test_auth.py
echo "== matches for symlink / replace / atomic in tests =="
rg -n "symlink|O_NOFOLLOW|os\.replace|atomic" tests/test_auth.py || trueRepository: MidnightDarling/jike-skill
Length of output: 114
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '1,220p' pyproject.toml
rg -n "classifiers|Programming Language|Operating System|python_requires|platform" pyproject.toml || trueRepository: MidnightDarling/jike-skill
Length of output: 2653
🏁 Script executed:
#!/bin/bash
set -euo pipefail
rg -n "Windows|win32|os\.name|POSIX|posix|nt\." README.md SECURITY.md SKILL.md pyproject.toml .github -S || true
ls -la .github || true
fd -t f .github -a || trueRepository: MidnightDarling/jike-skill
Length of output: 1725
🏁 Script executed:
#!/bin/bash
set -euo pipefail
sed -n '140,230p' SKILL.md
sed -n '170,230p' README.md
sed -n '1,120p' .github/workflows/ci.ymlRepository: MidnightDarling/jike-skill
Length of output: 6759
Harden token --out writes against symlink clobbering and partial/truncated JSON.
src/jike/auth.py:184truncates the destination withos.open(... O_TRUNC ...)on the user-provided path, which follows symlinks and can leave partial JSON if the process is interrupted.- Write to a temp file in
path.parent(mode0600),flush+fsync, then atomically commit withos.replace(tmp_path, path)(and ensure the parent dir exists) to avoid both symlink-target clobbering and incomplete writes.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/jike/auth.py` around lines 183 - 187, The _write_token_file function
currently truncates the user-provided path (following symlinks) and can leave
partial JSON; change it to write atomically by creating a new temporary file in
path.parent (ensure the parent directory exists), open that temp with mode
0o600, write tokens.to_dict() JSON to it, flush and call os.fsync on the file
descriptor to persist data, close it, then atomically replace the target with
os.replace(tmp_path, path); this prevents symlink clobbering and truncated JSON
while keeping TokenPair serialization unchanged.
| def _warn_process_visible_tokens(argv: list[str]) -> None: | ||
| if "--access-token" in argv or "--refresh-token" in argv: | ||
| print( | ||
| "[!] Token flags may be visible to other local users; prefer env vars.", | ||
| file=sys.stderr, | ||
| ) |
There was a problem hiding this comment.
Token warning misses --access-token=... / --refresh-token=... forms.
Users passing equals-style token flags won’t see the process-visibility warning.
Suggested fix
def _warn_process_visible_tokens(argv: list[str]) -> None:
- if "--access-token" in argv or "--refresh-token" in argv:
+ if any(
+ arg == "--access-token"
+ or arg.startswith("--access-token=")
+ or arg == "--refresh-token"
+ or arg.startswith("--refresh-token=")
+ for arg in argv
+ ):
print(
"[!] Token flags may be visible to other local users; prefer env vars.",
file=sys.stderr,
)📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def _warn_process_visible_tokens(argv: list[str]) -> None: | |
| if "--access-token" in argv or "--refresh-token" in argv: | |
| print( | |
| "[!] Token flags may be visible to other local users; prefer env vars.", | |
| file=sys.stderr, | |
| ) | |
| def _warn_process_visible_tokens(argv: list[str]) -> None: | |
| if any( | |
| arg == "--access-token" | |
| or arg.startswith("--access-token=") | |
| or arg == "--refresh-token" | |
| or arg.startswith("--refresh-token=") | |
| for arg in argv | |
| ): | |
| print( | |
| "[!] Token flags may be visible to other local users; prefer env vars.", | |
| file=sys.stderr, | |
| ) |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/jike/client_cli.py` around lines 97 - 102, _warn_process_visible_tokens
currently only checks for "--access-token" and "--refresh-token" exact entries
and misses equals-style flags; update the check in _warn_process_visible_tokens
to detect either the flag alone or any argument that starts with
"--access-token=" or "--refresh-token=" (e.g., use membership OR startswith
checks against argv entries) so the warning is printed for both space-separated
and equals-style token usages.
| if resp.status_code == 401 and retry_on_401: | ||
| self._refresh() | ||
| self._refresh(previous_access_token=self._tokens.access_token) | ||
| return self._request(method, path, retry_on_401=False, **kwargs) | ||
|
|
||
| if resp.status_code == 429 and retry_on_429: | ||
| time.sleep(_retry_after(resp)) | ||
| return self._request(method, path, retry_on_429=False, **kwargs) | ||
| resp.raise_for_status() |
There was a problem hiding this comment.
Preserve retry flags across recursive retries.
Line 59 and Line 62 re-enter _request without carrying the other retry flag, which can silently re-enable a retry mode the caller disabled.
Suggested patch
if resp.status_code == 401 and retry_on_401:
self._refresh(previous_access_token=self._tokens.access_token)
- return self._request(method, path, retry_on_401=False, **kwargs)
+ return self._request(
+ method,
+ path,
+ retry_on_401=False,
+ retry_on_429=retry_on_429,
+ **kwargs,
+ )
if resp.status_code == 429 and retry_on_429:
time.sleep(_retry_after(resp))
- return self._request(method, path, retry_on_429=False, **kwargs)
+ return self._request(
+ method,
+ path,
+ retry_on_401=retry_on_401,
+ retry_on_429=False,
+ **kwargs,
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if resp.status_code == 401 and retry_on_401: | |
| self._refresh() | |
| self._refresh(previous_access_token=self._tokens.access_token) | |
| return self._request(method, path, retry_on_401=False, **kwargs) | |
| if resp.status_code == 429 and retry_on_429: | |
| time.sleep(_retry_after(resp)) | |
| return self._request(method, path, retry_on_429=False, **kwargs) | |
| resp.raise_for_status() | |
| if resp.status_code == 401 and retry_on_401: | |
| self._refresh(previous_access_token=self._tokens.access_token) | |
| return self._request( | |
| method, | |
| path, | |
| retry_on_401=False, | |
| retry_on_429=retry_on_429, | |
| **kwargs, | |
| ) | |
| if resp.status_code == 429 and retry_on_429: | |
| time.sleep(_retry_after(resp)) | |
| return self._request( | |
| method, | |
| path, | |
| retry_on_401=retry_on_401, | |
| retry_on_429=False, | |
| **kwargs, | |
| ) | |
| resp.raise_for_status() |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/jike/client.py` around lines 57 - 63, The recursive calls in _request
drop the other retry flag; when handling a 401 you must call _request with
retry_on_401=False while preserving retry_on_429, and when handling a 429 call
_request with retry_on_429=False while preserving retry_on_401, so update the
two recursive invocations to pass both retry_on_401 and retry_on_429 (only
flipping the flag you consumed) so the caller's preferences are not silently
changed.
| def atomic_write(path: Path, content: str) -> None: | ||
| tmp = path.with_suffix(path.suffix + ".partial") | ||
| tmp.write_text(content, encoding="utf-8") | ||
| os.replace(tmp, path) |
There was a problem hiding this comment.
Ensure parent directories exist before atomic writes.
atomic_write can fail for valid nested output paths because the parent directory may not exist.
Suggested fix
def atomic_write(path: Path, content: str) -> None:
+ path.parent.mkdir(parents=True, exist_ok=True)
tmp = path.with_suffix(path.suffix + ".partial")
tmp.write_text(content, encoding="utf-8")
os.replace(tmp, path)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/jike/export_utils.py` around lines 162 - 165, The atomic_write function
can fail when the target's parent directories don't exist; before creating the
temporary file (tmp) ensure its parent directory exists by calling Path.mkdir
with parents=True and exist_ok=True on tmp.parent (or path.parent) so the
directory tree is created before tmp.write_text and os.replace are called; keep
the rest of atomic_write (tmp name and os.replace) unchanged.
| while True: | ||
| page += 1 | ||
| print(f" Fetching page {page}...", file=sys.stderr, end="", flush=True) | ||
| data, at, rt = fetch_user_posts(username, at, rt, load_more_key=load_more_key) | ||
| batch = data.get("data", []) | ||
| posts.extend(batch) | ||
| print(f" got {len(batch)} posts (total: {len(posts)})", file=sys.stderr) | ||
| load_more_key = data.get("loadMoreKey") | ||
| if checkpoint_path: | ||
| _save_checkpoint(checkpoint_path, username, posts, load_more_key) | ||
| if not load_more_key or not batch: | ||
| return posts, at, rt | ||
| time.sleep(RATE_LIMIT_DELAY) |
There was a problem hiding this comment.
Guard pagination against non-advancing cursors to prevent infinite loops.
If the API keeps returning a truthy loadMoreKey with non-empty data but doesn’t advance the cursor, this loop never terminates.
Suggested fix
def fetch_all_posts(
@@
) -> tuple[list[dict], str, str]:
posts, load_more_key = _load_checkpoint(checkpoint_path, username) if resume and checkpoint_path else ([], None)
if resume and posts and load_more_key is None:
return posts, at, rt
page = 0
+ seen_cursors: set[str] = set()
while True:
page += 1
print(f" Fetching page {page}...", file=sys.stderr, end="", flush=True)
data, at, rt = fetch_user_posts(username, at, rt, load_more_key=load_more_key)
batch = data.get("data", [])
posts.extend(batch)
print(f" got {len(batch)} posts (total: {len(posts)})", file=sys.stderr)
load_more_key = data.get("loadMoreKey")
+ if load_more_key is not None:
+ cursor_key = json.dumps(load_more_key, sort_keys=True, ensure_ascii=False)
+ if cursor_key in seen_cursors:
+ raise ValueError("pagination cursor did not advance")
+ seen_cursors.add(cursor_key)
if checkpoint_path:
_save_checkpoint(checkpoint_path, username, posts, load_more_key)
if not load_more_key or not batch:
return posts, at, rt
time.sleep(RATE_LIMIT_DELAY)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/jike/export.py` around lines 86 - 98, The pagination loop in export_user
posts can hang if the API returns a truthy load_more_key but the cursor doesn't
advance; modify the loop in the block using fetch_user_posts to detect
non-advancing cursors by tracking the previous load_more_key (and optionally
previous (at, rt) or total posts count) and aborting after a small number of
repeated, unchanged cursors (or when returned batch length is zero despite a
truthy load_more_key); when aborting, persist the checkpoint via
_save_checkpoint and return the collected posts, at, rt; reference
variables/functions: fetch_user_posts, load_more_key, _save_checkpoint, posts,
page, RATE_LIMIT_DELAY.
| def _warn_process_visible_tokens(argv: list[str]) -> None: | ||
| if "--access-token" in argv or "--refresh-token" in argv: | ||
| print("[!] Token flags may be visible to other local users; prefer env vars.", file=sys.stderr) |
There was a problem hiding this comment.
Token warning misses --access-token=... / --refresh-token=... forms.
Current matching only catches split-arg style and silently skips warning for equals-style flags.
Suggested fix
def _warn_process_visible_tokens(argv: list[str]) -> None:
- if "--access-token" in argv or "--refresh-token" in argv:
+ if any(
+ arg == "--access-token"
+ or arg.startswith("--access-token=")
+ or arg == "--refresh-token"
+ or arg.startswith("--refresh-token=")
+ for arg in argv
+ ):
print("[!] Token flags may be visible to other local users; prefer env vars.", file=sys.stderr)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@src/jike/export.py` around lines 115 - 117, _warn_process_visible_tokens
currently only detects separate flag tokens like "--access-token" or
"--refresh-token" and misses equals-style forms like "--access-token=...";
update the check to detect both forms by iterating argv and treating an arg as a
match if it equals the flag or startswith the flag + "=" (e.g., any(a ==
"--access-token" or a.startswith("--access-token=") or a == "--refresh-token" or
a.startswith("--refresh-token=") for a in argv)), then print the same stderr
warning when any match is found.
MidnightDarling
left a comment
There was a problem hiding this comment.
/code-review --effort max 审查报告
5 angles × 8 候选并行扫描 → verifier 验证 → 严重度排序 → Top 15 已挂到对应代码行(见上方 inline 评论)。
严重度分布
| 级别 | 数量 | 涉及 |
|---|---|---|
| 安全 · 严重 | 3 | SSRF 旁路(图片下载未禁止重定向)、Content-Type 头缺失绕过、os.open 0o600 mode 对已存在文件无效 |
| 安全/隐私 | 3 | 内联图 URL 无 host 白名单、images_dir 父目录 symlink TOCTOU、--access-token=VALUE 等号形式漏告警 |
| 可用性回归 | 3 | validate_username 拒绝 CJK / 含点用户名、转发块条件收窄丢失数据、图片下载失败不再 fallback URL |
| 逻辑 Bug | 3 | 401/429 重试递归丢失对方开关、json_path_for 多点后缀覆盖、fetch_all_posts 无上限 |
| 稳健性 | 3 | _load_checkpoint 非 dict 触发 AttributeError、丢失 KeyboardInterrupt 处理、refresh_tokens 与 _extract_tokens 契约不一致 |
必须修复(合并前)
- 图片下载
allow_redirects=False(评论 #1) - Content-Type 默认拒绝(评论 #2)
_write_token_file显式fchmod(0o600)或 tmp+replace(评论 #3)- 内联图 URL 走 host 白名单 / 警示降级(评论 #4)
--access-token=VALUE等号形式纳入告警判定(评论 #6)
建议修复(同 PR 内)
6/7/8/9/10/11/12/13/14/15 —— 详见对应 inline 评论。
测试缺口
tests/test_export.py:27把 "notes.md.backup→notes.md.json" 这个 footgun 钉死成了 API 契约 —— 应在修复json_path_for时同步移除/翻转该断言。- 无任何并发写入测试覆盖
atomic_write与download_image的.partial竞争(次级风险,未进 Top 15)。 - 无
--access-token=VALUE等号形式告警的负面测试。
文档/配置层附带建议(未进 Top 15)
.github/workflows/ci.yml的 wheel 内容校验列表里没有jike_skill/SKILL.md,但pyproject.tomlforce-include 了它 —— CI 假阴性窗口。SKILL.md:123Operations 表把export列在和feed/post/search同一组里,可能让用户误以为python3 scripts/client.py export ...也能跑(实际只走__main__.py路径)。README.md命令表只文档化了jike-export,对jike-api/jike-auth这两个同样在pyproject.toml注册的 entry point 没有记录。
由于 PR 作者与审查账号同一,本次以 COMMENT 提交(不允许 self-REQUEST_CHANGES),实质等同请求修改。
Generated by Claude Code
| raise ValueError("images directory must not be a symlink") | ||
| if filepath.exists(): | ||
| return _relative(filepath, output_dir) | ||
| with requests.get(safe, timeout=30, stream=True) as resp: |
There was a problem hiding this comment.
[安全 · 严重] SSRF / 域名白名单旁路:未禁止重定向
requests.get(safe, timeout=30, stream=True) 未设置 allow_redirects=False。host_matches 只校验初始 URL 的主机,但 requests 默认会跟随 30x 跳转。
攻击场景:白名单内的 CDN(如 cdn.ruguoapp.com)或任意被攻击者影响的资源若返回 302 Location: http://169.254.169.254/latest/meta-data/ 或 http://127.0.0.1:6379/...,下载会跟随跳到任意内网/云元数据地址,并将其响应作为 "图片" 落盘为 post_NNNN_orig_N.png。允许列表给了一种虚假的安全感。
建议:
with requests.get(safe, timeout=30, stream=True, allow_redirects=False) as resp:
if resp.is_redirect or resp.is_permanent_redirect:
raise ValueError("image host returned a redirect")或在跟随前手动验证每跳的 host_matches。
Generated by Claude Code
| resp.raise_for_status() | ||
| ctype = resp.headers.get("Content-Type", "").split(";", 1)[0].lower() | ||
| size = int(resp.headers.get("Content-Length", "0") or 0) | ||
| if ctype and ctype not in IMAGE_TYPES: |
There was a problem hiding this comment.
[安全 · 严重] Content-Type 白名单可被「无头」响应绕过
if ctype and ctype not in IMAGE_TYPES —— 当响应缺失 Content-Type 头时,ctype 为空字符串,and ctype 短路成 False,整个 Content-Type 类型检查被跳过。攻击者只要让 CDN 边缘节点(或同源缓存)回一个不带 Content-Type 的 200 即可绕过类型校验,把任意小于 20 MB 的 HTML / 可执行文件 / shell 脚本落盘为 .png/.jpg,再被 Markdown 渲染器或下游消费者拉起。
修复(应"默认拒绝"):
ctype = resp.headers.get("Content-Type", "").split(";", 1)[0].strip().lower()
if ctype not in IMAGE_TYPES:
raise ValueError("unexpected image content type")Generated by Claude Code
| def _write_token_file(path: Path, tokens: TokenPair) -> None: | ||
| fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600) | ||
| with os.fdopen(fd, "w", encoding="utf-8") as f: | ||
| json.dump(tokens.to_dict(), f, indent=2) |
There was a problem hiding this comment.
[安全 · 严重] os.open 的 mode 参数对已存在文件无效 —— 0o600 承诺被虚假宣告
POSIX 语义:os.open(path, O_WRONLY|O_CREAT|O_TRUNC, 0o600) 仅在首次创建时应用 mode;若 tokens.json 已经存在(任意一次 re-run,或先前 touch tokens.json 留下的 0o644),mode 被静默忽略,但 auth.main() 仍然打印 [+] Tokens written to {args.out} with mode 0600 给用户错觉。Jike refresh_token 是长期凭据,世界可读的影响和 README 中"--out creates 0600"的保证完全不符 —— SECURITY.md 里也明确承诺了这个属性。
修复:
fd = os.open(path, os.O_WRONLY | os.O_CREAT | os.O_TRUNC, 0o600)
try:
os.fchmod(fd, 0o600) # tighten regardless of pre-existing bits
with os.fdopen(fd, "w", encoding="utf-8") as f:
...
except BaseException:
os.close(fd); raise或者写到临时文件 → os.chmod(0o600) → os.replace(tmp, path),使最终 inode 一定是新创建的。
Generated by Claude Code
|
|
||
| def image_line(url: str, images: bool, images_dir: Optional[Path], output_dir: Path, index: int, key: str) -> Optional[str]: | ||
| src = download_image(url, images_dir, output_dir, index, key) if images and images_dir else url | ||
| return f"})" if src else None |
There was a problem hiding this comment.
[安全/隐私] 当未 --download-images 时,内联到 Markdown 的 URL 未做 host 白名单
image_line(url, images=False, ...) 直接返回 [img]({md_url(url)})。safe_url 只校验 http(s)+netloc,不校验主机。IMAGE_HOST_SUFFIXES 的白名单只在 download_image 分支里生效。
攻击场景:攻击者发一条即刻动态,pictures[0].picUrl 指向 https://attacker.example.com/track.png?u=victim。导出器把这条原样写入受害者本地 Markdown。受害者一旦用支持远程加载的 Markdown 渲染器(VS Code preview / Obsidian / 任意网页转换工具)打开导出文件,浏览器/插件就会替攻击者发起请求,泄露 IP/Referrer/UA,并构成持久化的"邮件追踪像素"等同物。
建议:对内联 URL 也走 host_matches(... , _image_host_suffixes()),命中失败时降级为纯文本占位符或保留原 URL 但用代码块标注不可信。
Generated by Claude Code
| try: | ||
| images_dir.mkdir(parents=True, exist_ok=True) | ||
| if images_dir.is_symlink(): | ||
| raise ValueError("images directory must not be a symlink") |
There was a problem hiding this comment.
[安全] images_dir TOCTOU:mkdir(parents=True, exist_ok=True) 会跟随父目录里的已存在 symlink
images_dir.is_symlink() 只检查叶节点;mkdir(parents=True, exist_ok=True) 会穿过任何已存在的中间符号链接组件,然后在被指向的目录里创建叶。叶本身不是 symlink,is_symlink() 返回 False,校验通过,但 tmp.open('wb') / os.replace 实际写到了攻击者控制的位置。
经验复现:
os.symlink("/tmp/attacker_dir", "alice_images")
Path("alice_images/sub").mkdir(parents=True, exist_ok=True)
# Path("alice_images/sub").is_symlink() → False
# 但 realpath 解析到 /tmp/attacker_dir/sub修复(任一):
- 收紧到
images_dir.mkdir(parents=False, exist_ok=True),并对解析后的images_dir.resolve()做白名单校验; - 或用
os.open(..., O_NOFOLLOW)+openat()风格写入; - 或要求
images_dir必须在output_dir.resolve()之下(前缀校验)。
Generated by Claude Code
| def json_path_for(output_path: str, username: str) -> Path: | ||
| if output_path == "-": | ||
| return Path(f"{validate_username(username)}_jike_export.json") | ||
| return Path(output_path).with_suffix(".json") |
There was a problem hiding this comment.
[Bug · 中] Path.with_suffix(".json") 只替换最末一段后缀 —— 多点文件名会被悄悄改写
Path("notes.md.backup").with_suffix(".json") → Path("notes.md.json")(覆盖一个早就可能存在的 Markdown 导出 sidecar 同名 .json)。同样地:
Path("archive.tar.gz").with_suffix(".json")→archive.tar.json(语义上的.gz信息丢了)Path(".posts").with_suffix(".json")→posts.json(前缀的.也被吃掉)
更糟糕的是 tests/test_export.py:27 把"notes.md.backup → notes.md.json"作为预期行为写进了断言,这等于把这个 footgun 钉死成了 API 契约。
建议:
def json_path_for(output_path, username):
if output_path == "-":
return Path(f"{validate_username(username)}_jike_export.json")
p = Path(output_path)
return p.with_name(p.name + ".json") # 始终追加,不删除原后缀并更新对应测试。
Generated by Claude Code
| if resume and posts and load_more_key is None: | ||
| return posts, at, rt | ||
| page = 0 | ||
| while True: |
There was a problem hiding this comment.
[DoS · 中] fetch_all_posts 缺少最大页数 / 最大条数上限,可被恶意或异常分页打成 OOM + 磁盘满
while True: ... if not load_more_key or not batch: return —— 唯一的退出条件是服务器自愿把 loadMoreKey 设为 falsy 或返回空批。一个返回同一个 batch + 同一个非空 loadMoreKey 的不良/敌对服务端会让循环以每 0.5 秒一页的速度无限扩张 posts,同时不断重写 checkpoint 文件,直到进程被 OOM kill 或磁盘耗尽。
虽然这条 bug 在主分支 scripts/export.py 中已存在(即不是本 PR 新引入),但本 PR 是"加固导出"的窗口,正适合一并修掉:
MAX_PAGES = 5000
MAX_POSTS = 200_000
seen_keys: set[str] = set()
...
if page > MAX_PAGES or len(posts) > MAX_POSTS:
raise ValueError("export aborted: pagination limit reached")
key_hash = json.dumps(load_more_key, sort_keys=True)
if key_hash in seen_keys:
raise ValueError("export aborted: repeated loadMoreKey detected")
seen_keys.add(key_hash)Generated by Claude Code
| data = json.loads(path.read_text(encoding="utf-8")) | ||
| except (OSError, json.JSONDecodeError) as exc: | ||
| raise ValueError("invalid checkpoint file") from exc | ||
| if data.get("username") != username or not isinstance(data.get("posts"), list): |
There was a problem hiding this comment.
[Bug · 中] _load_checkpoint 在 checkpoint 顶层不是 dict 时抛 AttributeError,逃出现有 except
data = json.loads(path.read_text(encoding="utf-8"))
if data.get("username") != username or not isinstance(data.get("posts"), list):当 checkpoint 文件顶层是列表([...])或纯量(数字 / 字符串)时,data.get(...) 不存在,触发 AttributeError。_load_checkpoint 自己只 catch (OSError, json.JSONDecodeError),export.main 只 catch (requests.RequestException, ValueError) —— AttributeError 一路泄漏,CLI 终端不是 {"error": ...} 的统一格式,而是原始 traceback,违反了"对外只暴露 error class"的安全意图。
修复:
try:
data = json.loads(path.read_text(encoding="utf-8"))
if not isinstance(data, dict):
raise ValueError("invalid checkpoint format")
except (OSError, json.JSONDecodeError) as exc:
raise ValueError("invalid checkpoint file") from excGenerated by Claude Code
| sys.exit(1) | ||
| except ValueError as exc: | ||
| print(json.dumps({"error": exc.__class__.__name__}), file=sys.stderr) | ||
| sys.exit(1) |
There was a problem hiding this comment.
[UX 回归] 丢失了 KeyboardInterrupt → exit 130 的清洁处理
主分支 scripts/export.py:419-421:
except KeyboardInterrupt:
print("\nExport interrupted.", file=sys.stderr)
sys.exit(130)新版 src/jike/export.py:main 只 catch RequestException / ValueError。用户在长时间分页拉取(通常几十秒到几分钟)中 Ctrl-C 时,会得到原始 Python traceback 而不是干净退出码 130。CI、调度脚本、shell trap 等下游消费者依赖此契约。
建议在最外层加:
except KeyboardInterrupt:
print(json.dumps({"error": "interrupted"}), file=sys.stderr)
sys.exit(130)Generated by Claude Code
| "x-jike-refresh-token", token_pair.refresh_token | ||
| ), | ||
| access_token=resp.headers.get("x-jike-access-token", token_pair.access_token), | ||
| refresh_token=resp.headers.get("x-jike-refresh-token", token_pair.refresh_token), |
There was a problem hiding this comment.
[一致性 · 低] refresh_tokens 只读 headers,与 _extract_tokens 兼顾 body+headers 的契约不一致
本 PR 加固了 _extract_tokens —— 支持从 JSON body 的 x-jike-access-token / access_token 字段或响应头中任一处取到 token。但 refresh_tokens 仍然只读响应头:
return TokenPair(
access_token=resp.headers.get("x-jike-access-token", token_pair.access_token),
refresh_token=resp.headers.get("x-jike-refresh-token", token_pair.refresh_token),
)后果:若 Jike 服务端任何时候改成把轮换后的 token 放在 body 里(这正是 _extract_tokens 已经在准备应对的情形),refresh_tokens 会静默继续使用旧 token,下一次请求就会 401,并且循环触发 refresh,构成"刷不动"的死局。
建议:把 token 提取逻辑统一到 _extract_tokens(resp),refresh_tokens 也复用之,失败时返回旧 token + 警告。
Generated by Claude Code
|
| 候选 | 文件:行 | Filter 评级 | 落选原因 |
|---|---|---|---|
| Content-Type 白名单可被缺失头绕过 | export_utils.py:99 |
conf 4 / DROP | 单独影响仅是"图片扩展名的字节落盘";要造成实际伤害需要与 SSRF 组合(已单列)。 |
os.open(..., 0o600) mode 对已存在文件无效 |
auth.py:184 |
conf 7 / DROP | 文档承诺 vs 实现的差异真实存在,但触发条件需要预先有较宽松权限的同名文件;happy-path 安全。 |
images_dir.mkdir 的父目录 symlink TOCTOU |
export_utils.py:90 |
conf 4 / DROP | 攻击者需要对受害者 CWD 或 --images-dir 父目录已有写权 —— 在那种前提下有更直接的攻击向量;未跨越权限边界。 |
以上三项仍然值得在本 PR 中一并修掉(属于"应硬化的次级问题"),详见 /code-review 在对应行上的 inline 评论。
总结:/security-review 维度下,1 个 HIGH 必修(重定向 SSRF),3 个建议级硬化。结合 /code-review 的 15 条 inline 评论一起处理。
Generated by Claude Code
Summary
Verification
Attribution
Implemented by GPT-5 codex@openai.com. Linear ARFD-365 has been updated to Done with evidence comment under GPT identity.
Closes ARFD-365