Skip to content

feat: FetchURL tool supports downloading images#842

Open
bj456736 wants to merge 1 commit into
MoonshotAI:mainfrom
bj456736:feat/fetch-url-image-support
Open

feat: FetchURL tool supports downloading images#842
bj456736 wants to merge 1 commit into
MoonshotAI:mainfrom
bj456736:feat/fetch-url-image-support

Conversation

@bj456736

Copy link
Copy Markdown
Contributor

Summary

This PR adds image support to the FetchURL tool, allowing users to fetch images from the web and have them displayed directly in the conversation.

Problem

Previously, when a URL pointed to an image (e.g. https://example.com/chart.png), the FetchURL tool would try to parse the binary image data as HTML using Readability, which always failed or produced garbage. Users had no way to view web images through the fetch tool.

Solution

1. Extended UrlFetchResult interface

Added a new image field and 'image' kind to the result union.

2. LocalFetchURLProvider detects image responses

When the response Content-Type starts with image/, the provider reads binary data, validates maxBytes, converts to base64, sniffs dimensions, and returns kind: 'image'.

3. FetchURLTool emits multimodal output

When the provider returns an image, the tool builds a ContentPart[] array with system info, image wrapper tags, and an image_url part with a data: URL.

4. Updated tool description

The FetchURL description now mentions image support.

Testing

Added 4 new test cases covering PNG/JPEG fetching, ContentPart output, and null dimensions. All 28 fetch-url related tests pass.

Notes

MoonshotFetchURLProvider is unchanged; it falls back to LocalFetchURLProvider on any error, so image support is automatically available through the fallback path.

- Extend UrlFetchResult to support image responses (kind='image')
- LocalFetchURLProvider detects image/* content-type and returns base64 data
- FetchURLTool emits image_url ContentPart for multimodal models
- Include image dimensions sniffing for coordinate guidance
- Update tool description to mention image support
- Add tests for PNG/JPEG image fetching and ContentPart output

Closes MoonshotAI#626
@changeset-bot

changeset-bot Bot commented Jun 17, 2026

Copy link
Copy Markdown

⚠️ No Changeset found

Latest commit: a19a3bd

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@pkg-pr-new

pkg-pr-new Bot commented Jun 17, 2026

Copy link
Copy Markdown
pnpm dlx https://pkg.pr.new/@moonshot-ai/kimi-code@a19a3bd
npx https://pkg.pr.new/@moonshot-ai/kimi-code@a19a3bd

commit: a19a3bd

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a19a3bd75c

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

Comment on lines +118 to +119
type: 'image_url',
imageUrl: { url: `data:${image.mimeType};base64,${image.data}` },

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Gate fetched images on image_in capability

I checked the built-in tool registration: ReadMediaFile is gated on modelCapabilities.image_in || modelCapabilities.video_in in packages/agent-core/src/agent/tool/index.ts, while FetchURL is still registered unconditionally whenever a urlFetcher exists. With a text-only model (image_in: false), fetching any image/* URL now appends this image_url part to the conversation, so the next provider request can be rejected by models that do not accept image input instead of returning an actionable tool error. Please gate this image result on the current model capability or degrade it to a text-only/error result for non-vision models.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant