Skip to content

Added Image and video gen interactive tasks#31

Open
ParamThakkar123 wants to merge 1 commit into
mainfrom
add/image-gen-interact
Open

Added Image and video gen interactive tasks#31
ParamThakkar123 wants to merge 1 commit into
mainfrom
add/image-gen-interact

Conversation

@ParamThakkar123
Copy link
Copy Markdown
Contributor

@ParamThakkar123 ParamThakkar123 commented Apr 1, 2026

Changes

  • Added image-gen-interactive/ directory with interactive image and video generation
  • Created main.py with Gradio interface supporting multiple diffusion models (SDXL, SD, Flux, etc.)
  • Added support for text-to-image and text-to-video generation
  • Implemented model caching and GPU optimization
  • Added task.yaml with resource requirements and setup dependencies

Features

  • Interactive generation of images and videos using state-of-the-art diffusion models
  • Support for multiple model architectures (Stable Diffusion XL, Flux, ModelScope, etc.)
  • Configurable parameters: model selection, prompts, dimensions, inference steps, guidance scale
  • Video generation with frame control and automatic MP4 export
  • Optimized for GPU acceleration with torch.float16

Parameters

  • HF_TOKEN: HuggingFace token (required, set as secret)
  • Model selection from predefined list
  • Prompt and negative prompt inputs
  • Width/height, steps, guidance scale controls

How to Test

  1. In TransformerLab, select the 'image-gen-interactive' task
  2. Ensure HF_TOKEN secret is set in app settings
  3. Configure generation parameters
  4. Run the task and access the Gradio interface
  5. Test image generation with different models and prompts
  6. Try video generation (requires compatible models like ModelScope)

@greninja greninja self-assigned this Apr 17, 2026
@greninja
Copy link
Copy Markdown

greninja commented Apr 17, 2026

some feedback till now:

  1. ideally on changing the model from the default (SDXL 1.0 (1024x1024)) to say SD 1.5 (512x512) it should automatically update the width and height to 512 respectively -- but currently doenst happen
  2. maybe we can have a short description or explainer explaining what "Inference steps" and "Guidance scale" is?
  3. Not sure if this is anything to do with the code in this PR but I get this error:
/home/shadab/projects/transformerlab/transformerlab-examples/.venv/lib/python3.11/site-packages/diffusers/image_processor.py:142: RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8")

maybe some pixel values in the generated image are NaN or inf, so when it tries to cast them to uint8 (0-255), the result is undefined. I am guessing it won't crash the app just that the output image may have corrupted pixels, so ok to ignore for testing purposes and focus on other fixes.

  1. the task.yaml file structure maybe slightly off. Importing it throws this error:
  title: Extra inputs are not permitted; cpus: Extra inputs are not permitted; memory: Extra inputs are not permitted; accelerators: Extra inputs are not permitted; env_vars: Extra inputs are not permitted; description: Extra
  inputs are not permitted; interactive: Extra inputs are not permitted

maybe can refer this: task-submission. Specifically, based on the template in TaskYamlSpec, these fields are not allowed at the top level:


  - title
  - command (should be run)
  - cpus (should be nested under resources)
  - memory (should be nested under resources)
  - accelerators (should be nested under resources)
  - env_vars (should be envs)
  - description
  - interactive

  1. Also, no matter my prompt ("testing", "draw a cat playing soccer" or even "a sky full of butterflies" or something) it throws this:
    "Potential NSFW content was detected in one or more images. A black image will be returned instead. Try again with a different prompt and/or seed."
    and then just spits s black image.

@greninja
Copy link
Copy Markdown

  1. the task.yaml is missing github_repo_url and github_repo_dir

Copy link
Copy Markdown

@greninja greninja left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refer feedback comments above

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants