Skip to content

Conversation

@bittoby
Copy link
Contributor

@bittoby bittoby commented Feb 9, 2026

Closes: #956

Description

This PR enables all worker agents (Developer, Browser, Document, Social Media) to receive multi-modal information (image attachments) through task decomposition, not just the Multi-Modal Agent.

Problem: Previously, when users attached images to complex tasks, only the Multi-Modal Agent could access them. Worker agents received image file paths in additional_info but couldn't use them, causing subtasks requiring image analysis to fail.

Solution:

  • Propagate parent task attachments to all subtasks during decomposition
  • Include additional_info (image paths) in task assignment payloads
  • Add comprehensive logging for attachment propagation tracking
  • Add error handling to prevent failures when copying attachment metadata

Impact: Worker agents can now receive image context, enabling use cases like:

  • Developer Agent analyzing UI screenshots to write code
  • Browser Agent using reference images for web research
  • Document Agent incorporating images into reports
  • Social Media Agent analyzing images for content creation

What is the purpose of this pull request?

  • Bug fix
  • New Feature
  • Documentation update
  • Other

Note: This is both a bug fix (attachments weren't propagating) and a feature enabler (prepares for multi-modal toolkit integration in worker agents).

@bittoby
Copy link
Contributor Author

bittoby commented Feb 9, 2026

@Wendong-Fan Could you please review this PR? thanks

@Wendong-Fan Wendong-Fan added this to the Sprint 14 milestone Feb 9, 2026
@Wendong-Fan
Copy link
Contributor

@Wendong-Fan Could you please review this PR? thanks

thanks @bittoby 's contribution! could @nitpicker55555 and @Zephyroam help checking this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request] All worker could accept multi-modal information

2 participants