fix: guard trafilatura import to prevent cascading tool load failure on Python 3.13 by he-yufeng · Pull Request #1597 · MoonshotAI/kimi-cli

he-yufeng · 2026-03-27T02:12:18Z

Summary

On Python 3.13, charset-normalizer ships mypyc-compiled .so binaries that are incompatible with the interpreter, causing trafilatura to fail at import time. Since web/__init__.py unconditionally does from .fetch import FetchURL (which has a bare import trafilatura at module level), the entire web package fails to load — taking SearchWeb down with it even though SearchWeb has zero trafilatura dependency.

Changes:

Wrap the trafilatura import in try/except, set a _has_trafilatura flag
When trafilatura is unavailable, fetch_with_http_get falls back to returning raw page content (trimmed to 50k chars) instead of crashing
Service-based fetch path (_fetch_with_service) is completely unaffected
SearchWeb now loads normally regardless of the trafilatura situation

Root Cause

charset-normalizer (mypyc .so) incompatible with Python 3.13
  → trafilatura import fails
    → fetch.py fails to load
      → web/__init__.py fails to load
        → both FetchURL AND SearchWeb become "Invalid tools"

Test Plan

Verify SearchWeb loads on Python 3.13 without the charset-normalizer workaround
Verify FetchURL loads and returns raw content for HTML pages when trafilatura is unavailable
Verify existing FetchURL behavior unchanged when trafilatura is available
Existing tests pass (pytest tests/tools/test_fetch_url.py)

Fixes #1572

devin-ai-integration

✅ Devin Review: No Issues Found

Devin Review analyzed this PR and found no potential bugs to report.

View in Devin Review to see 2 additional findings.

RealKai42 · 2026-03-31T04:02:05Z

you can run make check before you create the PR

…on Python 3.13 On Python 3.13, charset-normalizer ships mypyc-compiled binaries that are incompatible, causing trafilatura to fail at import time. Since web/__init__.py unconditionally imports FetchURL (which imports trafilatura at module level), the entire web package fails to load — taking SearchWeb down with it even though SearchWeb has no trafilatura dependency. Wrap the trafilatura import in try/except and fall back to returning raw page content when it's unavailable. This keeps both tools loadable regardless of the trafilatura situation. Fixes MoonshotAI#1572

he-yufeng · 2026-04-01T15:13:43Z

Rebased and verified — ruff check and ruff format --check both pass. Sorry about that, will run make check before future PRs.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: d78db0cc04

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-04-01T15:16:51Z

src/kimi_cli/tools/web/fetch.py

+        if not _has_trafilatura:
+            # trafilatura unavailable (e.g. charset-normalizer binary
+            # incompatible with current Python), return raw HTML trimmed
+            builder.write(resp_text[:50000])
+            return builder.ok("trafilatura is not available; returning raw page content.")


Restrict raw-content fallback to HTML responses

When trafilatura is unavailable, this branch returns ok for every non-empty response body, regardless of Content-Type. In fetch_with_http_get, that means binary/media endpoints (for example application/pdf or images) now get treated as successful page fetches and can return garbled decoded bytes, whereas the previous behavior would fail extraction and surface an error. This is a regression for agents that rely on is_error to decide whether to retry with another tool/path; consider gating this fallback to HTML/text-like types and preserving an error for non-text content.

Useful? React with 👍 / 👎.

devin-ai-integration bot reviewed Mar 27, 2026

View reviewed changes

github-actions bot mentioned this pull request Mar 28, 2026

📊 AI CLI 工具社区动态日报 2026-03-28 gsscsd/big_model_radar#107

Open

he-yufeng added 2 commits April 1, 2026 23:13

style: fix ruff format for fetch.py

d78db0c

he-yufeng force-pushed the fix/trafilatura-import-guard branch from 6f04ab1 to d78db0c Compare April 1, 2026 15:13

chatgpt-codex-connector bot reviewed Apr 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: guard trafilatura import to prevent cascading tool load failure on Python 3.13#1597

fix: guard trafilatura import to prevent cascading tool load failure on Python 3.13#1597
he-yufeng wants to merge 2 commits intoMoonshotAI:mainfrom
he-yufeng:fix/trafilatura-import-guard

he-yufeng commented Mar 27, 2026 •

edited by devin-ai-integration bot

Loading

Uh oh!

devin-ai-integration bot left a comment

Uh oh!

RealKai42 commented Mar 31, 2026

Uh oh!

he-yufeng commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

he-yufeng commented Mar 27, 2026 • edited by devin-ai-integration bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Root Cause

Test Plan

Uh oh!

devin-ai-integration bot left a comment

Choose a reason for hiding this comment

✅ Devin Review: No Issues Found

Uh oh!

RealKai42 commented Mar 31, 2026

Uh oh!

he-yufeng commented Apr 1, 2026

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector bot Apr 1, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

he-yufeng commented Mar 27, 2026 •

edited by devin-ai-integration bot

Loading