-
Notifications
You must be signed in to change notification settings - Fork 646
FEAT: Support image edition/remix in OpenAIImageTarget #1322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| - file: code/targets/3_non_open_ai_chat_targets | ||
| - file: code/targets/4_non_llm_targets | ||
| - file: code/targets/5_multi_modal_targets | ||
| sections: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe the hierarchy doesn't go deeper. Did you build the docs and check?
| - file: code/targets/5.4_openai_tts_target | ||
| - file: code/targets/6_rate_limiting | ||
| - file: code/targets/7_http_target | ||
| - file: code/targets/8_openai_responses_target |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this also belong into the same category? At that point almost everything is in there so we might as well not have an additional level 🙂
romanlutz
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is really good! Let us know what you think about the comments 🙂
| """ | ||
| # Extract text and images from message pieces | ||
| text_prompt = message.message_pieces[0].converted_value | ||
| image_paths = [piece.converted_value for piece in message.message_pieces[1:]] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We're making assumptions about the order here. It feels safer not to do that because when we load such seed prompts from somewhere else it may be ordered differently.
| image_files = [] | ||
| for img_path in image_paths: | ||
| try: | ||
| image_files.append(open(img_path, "rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm unable to look it up right now but we might have images stored in the cloud that won't be load able this way (I think). Roman TODO: look this up
| image_files = [] | ||
| for img_path in image_paths: | ||
| try: | ||
| image_files.append(open(img_path, "rb")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason for not wrapping this in a with block (context manager)? That would save you from having to do the close in two places
| return [response] | ||
| return response | ||
|
|
||
| async def _send_edit_request_async(self, message: Message) -> Message: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if this is meant to be a different target (with just 1 method different). It is a different API after all.
My worry: what if there's a different image API tomorrow that takes text + image + image (let's say to remix image 1 together with image 2 based on text)? Then we would not be able to accommodate that well.
Option: name it OpenAIImageEditTarget (or Remix? Whatever the API is called is best).
| raise ValueError( | ||
| f"All the message pieces after the first one must be image_path. Received: {data_type}." | ||
| ) | ||
| else: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super nit: handle this case first to lower indentation levels 😉
| text_piece = MessagePiece( | ||
| role="user", | ||
| conversation_id="123", | ||
| original_value="edit this image", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would love to have an opposite order test case (image + text), too
| await image_target.send_prompt_async(message=request) | ||
| finally: | ||
| if os.path.isfile(image_piece.original_value): | ||
| os.remove(image_piece.original_value) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also need integration tests. Ideally, for the new functionality (entra and api key based) but also for the various models. I can help identify what we have so far.
Description
This PR adds support for image edition/remix in
OpenAIImageTarget, specifically being able to pass one text prompt (always required) and 1-16 image(s). It also fixes a handful of minor inconsistencies with class parameters (with different OpenAI models supporting different values).Tests and Documentation