Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
14 changes: 13 additions & 1 deletion src/kimi_cli/tools/web/fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,6 @@
from typing import override

import aiohttp
import trafilatura
from kosong.tooling import CallableTool2, ToolReturnValue
from pydantic import BaseModel, Field

Expand All @@ -14,6 +13,13 @@
from kimi_cli.utils.aiohttp import new_client_session
from kimi_cli.utils.logging import logger

try:
import trafilatura

_has_trafilatura = True
except Exception:
_has_trafilatura = False


class Params(BaseModel):
url: str = Field(description="The URL to fetch content from.")
Expand Down Expand Up @@ -92,6 +98,12 @@ async def fetch_with_http_get(params: Params) -> ToolReturnValue:
brief="Empty response body",
)

if not _has_trafilatura:
# trafilatura unavailable (e.g. charset-normalizer binary
# incompatible with current Python), return raw HTML trimmed
builder.write(resp_text[:50000])
return builder.ok("trafilatura is not available; returning raw page content.")
Comment on lines +101 to +105
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Restrict raw-content fallback to HTML responses

When trafilatura is unavailable, this branch returns ok for every non-empty response body, regardless of Content-Type. In fetch_with_http_get, that means binary/media endpoints (for example application/pdf or images) now get treated as successful page fetches and can return garbled decoded bytes, whereas the previous behavior would fail extraction and surface an error. This is a regression for agents that rely on is_error to decide whether to retry with another tool/path; consider gating this fallback to HTML/text-like types and preserving an error for non-text content.

Useful? React with 👍 / 👎.


extracted_text = trafilatura.extract(
resp_text,
include_comments=True,
Expand Down