FEAT: Support image edition/remix in OpenAIImageTarget #1322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

fdubut wants to merge 1 commit into Azure:main from fdubut:image_target

Contributor

fdubut commented Jan 22, 2026

Description

This PR adds support for image edition/remix in OpenAIImageTarget, specifically being able to pass one text prompt (always required) and 1-16 image(s). It also fixes a handful of minor inconsistencies with class parameters (with different OpenAI models supporting different values).

Tests and Documentation

Updated unit tests to include support for image edition/remix, and the new requirements for input types.
Added an example of image edition/remix to the multi-modal target documentation.
Split up the multi-modal target notebook into four notebooks (one per output modality).


          Support for text+image input for OpenAIImageTarget

2994ad1

romanlutz reviewed

View reviewed changes

doc/_toc.yml

    
                    - file: code/targets/3_non_open_ai_chat_targets

                    - file: code/targets/4_non_llm_targets

                    - file: code/targets/5_multi_modal_targets

                      sections:

Contributor

romanlutz Jan 23, 2026

I believe the hierarchy doesn't go deeper. Did you build the docs and check?

doc/_toc.yml

    
                      - file: code/targets/5.4_openai_tts_target

                    - file: code/targets/6_rate_limiting

                    - file: code/targets/7_http_target

                    - file: code/targets/8_openai_responses_target

Contributor

romanlutz Jan 23, 2026

Wouldn't this also belong into the same category? At that point almost everything is in there so we might as well not have an additional level 🙂

romanlutz reviewed

View reviewed changes

Contributor

romanlutz left a comment

This is really good! Let us know what you think about the comments 🙂

pyrit/prompt_target/openai/openai_image_target.py

    
                      """

                      # Extract text and images from message pieces

                      text_prompt = message.message_pieces[0].converted_value

                      image_paths = [piece.converted_value for piece in message.message_pieces[1:]]

Contributor

romanlutz Jan 23, 2026

We're making assumptions about the order here. It feels safer not to do that because when we load such seed prompts from somewhere else it may be ordered differently.

pyrit/prompt_target/openai/openai_image_target.py

    
                      image_files = []

                      for img_path in image_paths:

                          try:

                              image_files.append(open(img_path, "rb"))

Contributor

romanlutz Jan 23, 2026

I'm unable to look it up right now but we might have images stored in the cloud that won't be load able this way (I think). Roman TODO: look this up

pyrit/prompt_target/openai/openai_image_target.py

    
                      image_files = []

                      for img_path in image_paths:

                          try:

                              image_files.append(open(img_path, "rb"))

Contributor

romanlutz Jan 23, 2026

Any reason for not wrapping this in a with block (context manager)? That would save you from having to do the close in two places

pyrit/prompt_target/openai/openai_image_target.py

    
                      return [response]

                      return response

                  async def _send_edit_request_async(self, message: Message) -> Message:

Contributor

romanlutz Jan 23, 2026

I'm wondering if this is meant to be a different target (with just 1 method different). It is a different API after all.

My worry: what if there's a different image API tomorrow that takes text + image + image (let's say to remix image 1 together with image 2 based on text)? Then we would not be able to accommodate that well.

Option: name it OpenAIImageEditTarget (or Remix? Whatever the API is called is best).

pyrit/prompt_target/openai/openai_image_target.py

    
                                  raise ValueError(

                                      f"All the message pieces after the first one must be image_path. Received: {data_type}."

                                  )

                      else:

Contributor

romanlutz Jan 23, 2026

Super nit: handle this case first to lower indentation levels 😉

tests/unit/target/test_image_target.py

    
                  text_piece = MessagePiece(

                      role="user",

                      conversation_id="123",

                      original_value="edit this image",

Contributor

romanlutz Jan 23, 2026

Would love to have an opposite order test case (image + text), too

tests/unit/target/test_image_target.py

    
                          await image_target.send_prompt_async(message=request)

                  finally:

                      if os.path.isfile(image_piece.original_value):

                          os.remove(image_piece.original_value)

Contributor

romanlutz Jan 23, 2026

We also need integration tests. Ideally, for the new functionality (entra and api key based) but also for the various models. I can help identify what we have so far.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet