Skip to content

Conversation

@fdubut
Copy link
Contributor

@fdubut fdubut commented Jan 22, 2026

Description

This PR adds support for image edition/remix in OpenAIImageTarget, specifically being able to pass one text prompt (always required) and 1-16 image(s). It also fixes a handful of minor inconsistencies with class parameters (with different OpenAI models supporting different values).

Tests and Documentation

  • Updated unit tests to include support for image edition/remix, and the new requirements for input types.
  • Added an example of image edition/remix to the multi-modal target documentation.
  • Split up the multi-modal target notebook into four notebooks (one per output modality).

- file: code/targets/3_non_open_ai_chat_targets
- file: code/targets/4_non_llm_targets
- file: code/targets/5_multi_modal_targets
sections:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the hierarchy doesn't go deeper. Did you build the docs and check?

- file: code/targets/5.4_openai_tts_target
- file: code/targets/6_rate_limiting
- file: code/targets/7_http_target
- file: code/targets/8_openai_responses_target
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this also belong into the same category? At that point almost everything is in there so we might as well not have an additional level 🙂

Copy link
Contributor

@romanlutz romanlutz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is really good! Let us know what you think about the comments 🙂

"""
# Extract text and images from message pieces
text_prompt = message.message_pieces[0].converted_value
image_paths = [piece.converted_value for piece in message.message_pieces[1:]]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're making assumptions about the order here. It feels safer not to do that because when we load such seed prompts from somewhere else it may be ordered differently.

image_files = []
for img_path in image_paths:
try:
image_files.append(open(img_path, "rb"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm unable to look it up right now but we might have images stored in the cloud that won't be load able this way (I think). Roman TODO: look this up

image_files = []
for img_path in image_paths:
try:
image_files.append(open(img_path, "rb"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason for not wrapping this in a with block (context manager)? That would save you from having to do the close in two places

return [response]
return response

async def _send_edit_request_async(self, message: Message) -> Message:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if this is meant to be a different target (with just 1 method different). It is a different API after all.

My worry: what if there's a different image API tomorrow that takes text + image + image (let's say to remix image 1 together with image 2 based on text)? Then we would not be able to accommodate that well.

Option: name it OpenAIImageEditTarget (or Remix? Whatever the API is called is best).

raise ValueError(
f"All the message pieces after the first one must be image_path. Received: {data_type}."
)
else:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Super nit: handle this case first to lower indentation levels 😉

text_piece = MessagePiece(
role="user",
conversation_id="123",
original_value="edit this image",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would love to have an opposite order test case (image + text), too

await image_target.send_prompt_async(message=request)
finally:
if os.path.isfile(image_piece.original_value):
os.remove(image_piece.original_value)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need integration tests. Ideally, for the new functionality (entra and api key based) but also for the various models. I can help identify what we have so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants