DreamLayer AI: Open-Source Benchmarking for Image and Video Diffusion Models

Automate prompts, seeds, metrics, and reproducible run logging.
Built for AI researchers, labs, and developers to evaluate image and video diffusion models faster and compare results consistently.

⭐ Star the repo for updates ⭐

Product Vision: AI Research

What is DreamLayer AI?

DreamLayer AI is an open-source benchmarking and evaluation platform for image generation models and video generation models. It automates prompts, seeds, metrics, configs, and reproducible run logging so researchers and developers can compare model quality faster and more consistently. It runs locally with a React frontend, Flask-based services, SQLite run storage, and ComfyUI integration for image workflows.

Compare model outputs across prompts, seeds, configs, and metrics with reproducible run logging.

Who is this for?

DreamLayer AI is built for:

AI researchers comparing diffusion models across prompts, seeds, and metrics
ML Engineers evaluating image and video generation quality
Labs and teams building internal benchmarking workflows for generative models
Open-source model creators testing checkpoints, LoRAs, and workflows
Developers integrating custom metrics and evaluation pipelines

What can DreamLayer benchmark?

DreamLayer can benchmark:

Image generation model outputs
Video generation model outputs
Prompt-to-image alignment
Image quality and aesthetic quality
Object-level prompt adherence
Temporal video consistency
Reference-based image and video similarity metrics

Status: ✨ Now live

Quick Start

⭐️ Run with Cursor (Smooth Setup with a Few Clicks)

Easiest way to run DreamLayer 😃

Download this repo
Open the folder in Cursor (an AI-native code editor)
Type run it or press the "Run" button — then follow the guided steps

Cursor will:

Walk you through each setup step
Install Python and Node dependencies
Create a virtual environment
Start the backend and frontend
Output a localhost:8080 link you can open in your browser

⏱️ Takes about 5-10 minutes. No terminal needed. Just click, run, and you’re in. 🚀

On macOS, PyTorch setup may take a few retries. Just keep pressing Run when prompted. Cursor will guide you through it.

Installation

linux:

./install_linux_dependencies.sh

macOS:

./install_mac_dependencies.sh

Windows (PowerShell):

# If needed, allow script execution for this session:
Set-ExecutionPolicy -Scope Process -ExecutionPolicy Bypass

.\install_windows_dependencies.ps1

Start Application

linux:

./start_dream_layer.sh

macOS:

./start_dream_layer.sh

Windows:

start_dream_layer.bat

Env Variables

install_dependencies_linux DLVENV_PATH // preferred path to python virtual env. default is /tmp/dlvenv

start_dream_layer DREAMLAYER_COMFYUI_CPU_MODE // if no nvidia drivers available run using CPU only. default is false

Access

Frontend: http://localhost:8080
ComfyUI: http://localhost:8188

Installing Models ⭐️

DreamLayer ships without weights to keep the download small. You have two ways to add models:

a) Closed-source API models

DreamLayer can also call external APIs (OpenAI DALL·E, Flux, Ideogram).

To enable them:

Edit your .env file in the repository root (./.env):

OPENAI_API_KEY=sk-...
BFL_API_KEY=flux-...
IDEOGRAM_API_KEY=id-...
STABILITY_API_KEY=sk-...

Once a key is present, the model becomes visible in the dropdown. No key = feature stays hidden.

b) Open-source checkpoints (offline)

Step 1: Download .safetensors or .ckpt files from:

Hugging Face
Civitai
Your own training runs

Step 2: Place the models in the appropriate folders (auto-created on first run):

Checkpoints/ → # full checkpoints (.safetensors)
Lora/ → # LoRA & LoCon files
ControlNet/ → # ControlNet models
VAE/ → # optional VAEs

Step 3: Click Settings ▸ Refresh Model List in the UI — the models appear in dropdowns.

Tip: Use symbolic links if your checkpoints live on another drive.

The installation scripts will automatically install all dependencies and set up the environment.

Optional: Download Evaluation Datasets

For FID scoring, download the CIFAR-10 reference dataset:

python scripts/fetch_datasets.py

Note: The YOLO model (yolov8n.pt, ~6MB) for object detection metrics auto-downloads on first use.

Why DreamLayer AI?

🔍 Feature	🚀 How it's better
Automated Benchmarking	One run sweeps N prompts by M seeds by K samplers. Metrics compute live during generation, so a 1 to 2 week manual benchmark finishes in 3 to 5 hours per model.
Reproducibility by Default	Every run persists to SQLite with prompt, negative prompt, seed, sampler, steps, CFG, model hash, LoRA stack, ControlNet config, and all computed metrics. Replay any run by `run_id`.
Image and Video Metrics, Built In	Image: CLIPScore (ViT-L/14), FID, LAION aesthetic, color harmony, sharpness, YOLOv8 composition F1. Video: FVD (I3D), SSIM, PSNR, LPIPS, temporal flickering, subject and background consistency (DINO), motion smoothness. Custom metrics pluggable.
Multi-Modal Today	Image and video evaluation are available out of the box. Audio benchmarking is on the roadmap. See the Metrics section below for the exact call graph and storage schema.
Reference-Free and Reference-Based	Works without a ground-truth image or video for CLIPScore, aesthetics, YOLO composition, temporal flickering, and DINO consistency. Add a reference video to unlock SSIM, PSNR, LPIPS. FID operates on a reference set.
Cached, Incremental, Comparable	Metrics persist per run in a dedicated SQLite table and return instantly on re-fetch. Batch backfill endpoints recompute missing metrics across the full history. Compare any two runs side by side via the comparison API.
Researcher-Friendly Exports	Run locally on your own GPU (CUDA, MPS, or CPU fallback). Export to CSV per run or a ZIP report bundle with images, metadata, and metrics for leaderboard submission or paper appendices.

Metrics

DreamLayer supports a working set of common image and video evaluation metrics, including CLIPScore, FID, aesthetic scoring, LPIPS, SSIM, PSNR, composition precision/recall/F1, temporal flickering, subject consistency, background consistency, and motion smoothness. These metrics run either automatically during generation or on demand per run, are exposed through live HTTP routes, and persist to SQLite for reproducible benchmarking and comparison.

Image metrics

CLIPScore: prompt-to-image alignment using cosine similarity between CLIP text and image embeddings. Higher is better (0 to 1). No reference needed. Backbone: CLIP ViT-L/14.
FID (Fréchet Inception Distance): distribution distance between generated images and a reference image set. CIFAR-10 ships as the default reference. Lower is better. Reference required. Backbone: Inception-V3.
LAION Aesthetic Score: learned aesthetic quality prediction from CLIP embeddings. Higher is better (0 to 10). No reference needed. Backbone: LAION linear predictor on CLIP ViT-L/14.
Color Harmony, Saturation Balance, Value Contrast: HSV-space color theory analysis using k-means clustering. Higher is better (0 to 1). No reference needed. Backbone: OpenCV.
Technical Quality: sharpness, noise level, and artifact detection per image. Higher is better (0 to 1). No reference needed. Backbone: Laplacian variance plus heuristics.
Composition Precision, Recall, F1: object-level prompt adherence, comparing detected objects against a prompt-derived object list. Higher is better (0 to 1). No reference needed. Backbone: YOLOv8n.

Video metrics

FVD (Fréchet Video Distance): distribution distance between two sets of videos in I3D feature space. Lower is better. Reference required.
Video SSIM: per-frame structural similarity, reported as mean and standard deviation across frames. Higher is better (0 to 1). Reference required.
Video PSNR: per-frame peak signal-to-noise ratio, reported as mean and standard deviation. Higher is better (dB). Reference required.
Video LPIPS: per-frame learned perceptual similarity between generated and reference frames. Lower is better. Reference required. Backbone: LPIPS with AlexNet.
Temporal Flickering: frame-to-frame stability using mean absolute error between consecutive frames. Higher is better (0 to 1). No reference needed.
Subject Consistency: how stable the main subject’s appearance is across frames. Higher is better (0 to 1). No reference needed. Backbone: DINO feature similarity.
Background Consistency: how stable the background is across frames. Higher is better (0 to 1). No reference needed. Backbone: DINO feature similarity.
Motion Smoothness: smoothness of optical flow between consecutive frames. Higher is better (0 to 1). No reference needed. Backbone: OpenCV optical flow.
Per-Frame Aesthetic: LAION aesthetic score applied to each frame, reported as a mean. Higher is better (0 to 10). No reference needed. Backbone: LAION predictor on CLIP ViT-L/14.

Temporal Flickering, Subject Consistency, Background Consistency, and Motion Smoothness are adapted from VBench (CVPR 2024).

When metrics compute

Live during image generation: CLIPScore, LAION aesthetic, color metrics, technical quality, and YOLO composition. Results are written to the metrics table as soon as the image is saved.
On demand for images: FID. Requires the CIFAR-10 reference stats (run python scripts/fetch_datasets.py once), then a POST /api/runs/calculate-metrics call, or the batch backfill script for historical runs.
On demand for video: all video metrics. Trigger per video with POST /api/calculate-video-metrics, or batch across all unscored videos with POST /api/calculate-all-video-metrics. Results are cached to the video_metrics table and return instantly on re-fetch.

Storage and export

Metrics persist across three dedicated SQLite tables:

metrics: image scalar metrics and aesthetic sub-scores
composition_metrics: YOLO precision, recall, F1, detected objects, missing objects
video_metrics: FVD, SSIM, PSNR, LPIPS, plus a JSON blob of per-frame arrays and VBench-style quality metrics

You can export any run or slice of runs to CSV through the report bundle endpoint, or download a ZIP containing images, prompts, configs, and every computed metric for leaderboard submissions or paper appendices.

Requirements

Python 3.8+
Node.js 16+
8GB+ RAM recommended

Get Involved Today

Star this repository.
Share the screenshot on X ⁄ Twitter with #DreamLayerAI to spread the word.

All contributions code, docs, art, tutorials—are welcome!

Contributing

Create a PR and follow the evidence requirements in the template.
See CHANGELOG Guidelines for detailed contribution process.

License

DreamLayer AI will ship under the GPL-3.0 license when the code is released.
All trademarks and closed-source models referenced belong to their respective owners.

🧪 Testing

DreamLayer AI includes a comprehensive test suite covering all functionality including ClipScore integration, database operations, and API endpoints.

Quick Start Testing

# Install test dependencies
pip install -r tests/requirements.txt

# Run all tests
python tests/run_all_tests.py

# Run specific test categories
python tests/run_all_tests.py unit          # Unit tests only
python tests/run_all_tests.py integration  # Integration tests only
python tests/run_all_tests.py api          # API endpoint tests
python tests/run_all_tests.py clipscore    # ClipScore functionality tests

# Run with verbose output
python tests/run_all_tests.py all -v

Test Categories

Test File	Coverage	Description
`test_txt2img_server.py`	Text-to-Image API	Tests txt2img generation and database integration
`test_img2img_server.py`	Image-to-Image API	Tests img2img generation and database integration
`test_run_registry.py`	Run Registry API	Tests database-first API with ClipScore retrieval
`test_report_bundle.py`	Report Generation	Tests Mac-compatible report bundle creation
`test_clip_score.py`	ClipScore Integration	Tests CLIP model calculation and database storage
`test_database_integration.py`	Database Operations	Tests 3-table schema and database operations

Test Features

✅ Unit Tests - Individual component testing
✅ Integration Tests - End-to-end workflow testing
✅ API Tests - HTTP endpoint testing with Flask test client
✅ Database Tests - SQLite operations with temporary test databases
✅ Mock Testing - External dependency mocking (ComfyUI, CLIP model)
✅ Error Handling - Edge cases and error condition testing
✅ Mac Compatibility - ZIP file generation testing

Running Individual Tests

# Run specific test file
python -m pytest tests/test_clip_score.py -v

# Run specific test method
python -m pytest tests/test_clip_score.py::TestClipScore::test_clip_score_calculation_with_mock -v

# Run with coverage report
python -m pytest tests/ --cov=dream_layer_backend --cov-report=html

Test Requirements

The test suite requires these additional dependencies:

pytest - Test framework
pytest-cov - Coverage reporting
pytest-mock - Mocking utilities
requests-mock - HTTP request mocking

Install with: pip install -r tests/requirements.txt

FAQ

Does DreamLayer support CLIPScore, FID, LPIPS, SSIM, and PSNR?

Yes. All five are fully implemented and persisted to SQLite. CLIPScore computes live during image generation. FID runs on demand against a reference image set. Video SSIM, Video PSNR, and Video LPIPS run on demand against a reference video. Batch backfill endpoints recompute missing metrics across the full run history.

How is DreamLayer different from ComfyUI?

ComfyUI is a node-based generation interface. DreamLayer is a benchmarking workbench built on top of ComfyUI for image workflows, paired with dedicated Flask services for run logging, metric computation, comparison APIs, and CSV or ZIP exports. ComfyUI handles "make this image." DreamLayer handles "benchmark these models across these prompts and seeds, log everything, and let me compare results."

How is DreamLayer different from Automatic1111, InvokeAI, or Forge?

Automatic1111, InvokeAI, and Forge are excellent generation UIs. DreamLayer is also a great generation UIs, but it adds benchmarking infrastructure on top: persistent SQLite logging with full prompt, seed, sampler, and config metadata; built-in image and video quality metrics; side-by-side run comparison; batch metric backfills; and CSV or ZIP exports for leaderboard submission. None of those generation UIs ship with end-to-end evaluation tooling.

How is DreamLayer different from VBench, EvalCrafter, and other diffusion evaluation frameworks?

VBench, EvalCrafter, HEIM, and similar evaluation frameworks are standardized benchmark suites: they define fixed prompts, tasks, and scoring methods so you can report comparable benchmark results. DreamLayer is benchmarking infrastructure: you bring your own prompts, models, and configs, then run generation, scoring, run logging, and comparison workflows in one place. The two are complementary. DreamLayer’s evaluation stack also draws on HELM-style benchmarking concepts and includes video quality metrics inspired by VBench, such as temporal flickering, subject consistency, background consistency, and motion smoothness.

Can I benchmark Stable Diffusion, Flux, DALL·E, Gemini, Runway, Luma, Ideogram, and Stability AI models with DreamLayer?

Yes. DreamLayer can benchmark both local open-source models and supported API-based models. For local workflows, that includes models like Stable Diffusion 1.5, SDXL, Flux, and custom checkpoints. For API-based workflows, DreamLayer supports models shown in the UI such as Luma Labs Photon, Black Forest Labs Flux Pro, OpenAI DALL·E 3, Google Gemini Nano Banana, Runway Gen 4, Ideogram V3, and Stability AI SD Turbo. Add local model files to the Checkpoints/, Lora/, ControlNet/, and VAE/ folders, or add API keys to .env, and supported models appear in the UI for benchmarking.

Can DreamLayer benchmark text-to-video models like Sora, Runway, Luma, or Veo3?

Yes for Luma AI, Runway ML, and Google's Veo3. DreamLayer integrates with their video APIs out of the box via the txt2vid_server — just add the API key to .env. Sora support depends on OpenAI exposing a public video generation API. For local open-source video models that run through ComfyUI, drop the checkpoint into the appropriate folder and refresh the model list.

Can I benchmark outputs across prompts, seeds, and configs?

Yes, this is a core use case. Every run persists to SQLite with the full prompt, negative prompt, seed, sampler, steps, CFG, model hash, LoRA stack, ControlNet config, and all computed metrics. You can replay any run by run_id, sweep across multiple seeds or samplers in one batch, and compare any two runs side by side via the comparison API.

How does DreamLayer calculate CLIPScore?

DreamLayer computes CLIPScore as the cosine similarity between CLIP text and image embeddings using the openai/clip-vit-large-patch14 backbone. The score lands in the 0 to 1 range, where higher values indicate stronger prompt-to-image alignment. No reference image is needed. CLIPScore computes live during image generation and writes directly to the metrics table, surfaced via the run registry API and included in CSV exports.

How does DreamLayer calculate FID, and which reference dataset does it use?

DreamLayer calculates FID using torchmetrics.image.fid.FrechetInceptionDistance with Inception-V3 features at 2048 dimensions. The default reference set is CIFAR-10, which you fetch once with python scripts/fetch_datasets.py. Lower FID indicates a closer distributional match to the reference. FID is on-demand: trigger per run via POST /api/runs/calculate-metrics, or batch-backfill across historical runs.

Can I add my own custom metrics?

Yes. The metric pipeline is modular. Each metric is implemented as a standalone calculator in dream_layer_backend_utils/, registered with the database layer, and surfaced through the existing metrics, composition_metrics, or video_metrics tables. Add your computation in the same pattern as the existing calculators and register it with the database queries module to flow through the registry, comparison API, CSV export, and ZIP report bundle.

Does DreamLayer support LoRAs, ControlNets, and custom VAEs?

Yes. Drop .safetensors files into the auto-created Lora/, ControlNet/, and VAE/ folders, then refresh the model list in Settings. The full stack of active LoRAs (with weights), ControlNet config, and VAE choice persists with every run, so you can replay an exact LoRA and ControlNet combination by run_id or compare results across LoRA variants in a single batch.

Can I sweep across multiple seeds, samplers, and CFG values in one batch?

Yes. A single benchmark run sweeps N prompts across M seeds across K samplers, and you can vary CFG, steps, and resolution per cell. Every cell becomes a row in the runs table with its own run_id and metrics. The comparison API lets you slice the resulting matrix any way you need: by sampler, by CFG value, by seed, or any combination.

Does DreamLayer run on Mac?

Yes, on both Intel and Apple Silicon Macs. The install script ./install_mac_dependencies.sh handles PyTorch and dependency setup on either architecture. On Apple Silicon (M1, M2, M3), DreamLayer uses the MPS (Metal Performance Shaders) backend automatically for GPU-accelerated metric computation. On Intel Macs or when MPS is unavailable, DreamLayer falls back to CPU, which works for every metric but runs slower.

What is a "run" in DreamLayer, and what gets logged?

A run is one image or video generation event tied to a unique run_id. DreamLayer logs the prompt, negative prompt, seed, sampler, steps, CFG, model hash, LoRA stack, ControlNet config, VAE, batch size, generation type (txt2img, img2img, txt2vid, img2vid), the workflow JSON, the output filename, and every metric computed for that output. Runs persist to SQLite indefinitely and can be replayed, exported, or compared at any time.

How do I reproduce a previous run?

Every run is assigned a run_id that links to its full configuration in SQLite: prompt, negative prompt, seed, sampler, steps, CFG, model hash, LoRA stack, and ControlNet config. Replay by run_id from the run registry to regenerate the exact image with the exact metrics, or fork a run by changing one parameter (such as the sampler or seed) for a controlled comparison.

Does DreamLayer send my prompts or images to any server?

No. DreamLayer runs locally on your machine, and prompts, generated images, configs, and metrics stay in your local filesystem and SQLite database by default. The only exception is when you choose to use an API-based model such as DALL·E, Flux, Ideogram, Stability AI, Runway, Luma, or Gemini, in which case the relevant request data is sent to that provider for generation. DreamLayer does not perform telemetry, analytics, or background uploads on its own.

Can I integrate DreamLayer into a CI/CD pipeline for regression testing?

Yes. Every Flask service exposes HTTP endpoints (txt2img, img2img, video metrics, run registry, report bundle) that you can call from a CI job. A typical pattern: trigger a fixed prompt set against a candidate model, fetch CLIPScore and aesthetic metrics from the run registry, compare against a baseline run_id from the previous release, and fail the build if any metric regresses beyond a defined threshold.

How long does a benchmark run take?

Benchmark runtime depends on the model, hardware, batch size, and selected metrics. In one representative image benchmark, DreamLayer processed 200 prompts in 45 minutes per model on an Intel MacBook Pro across API-based models including Photon, Flux Pro, DALL·E 3, Nano Banana, Runway Gen 4, Ideogram V3, and Stability SD Turbo. Using the same prompts, seeds, and configs across runs, DreamLayer handled generation, scoring, and output aggregation automatically. Larger batches and heavier metrics increase total runtime, but DreamLayer still makes reproducible benchmarking much faster than running the workflow manually.

⭐ Founding Supporters

We’re grateful to our earliest supporters who starred the repo and supported us from the start 🚀

@NyayadhishViraj: https://github.com/NyayadhishViraj	@yash120394: https://github.com/yash120394	@amyzliu: https://github.com/amyzliu
@joshiVishrut: https://github.com/joshiVishrut	@shreyaspapi: https://github.com/shreyaspapi	@vj72: https://github.com/vj72
@yashpkm: https://github.com/yashpkm	@sauravraiguru: https://github.com/sauravraiguru	@krishpat3366: https://github.com/krishpat3366
@prmdk: https://github.com/prmdk	@pkydev: https://github.com/pkydev	@calahoti: https://github.com/calahoti
@evangelinensy: https://github.com/evangelinensy	@swift9909: https://github.com/swift9909	@amit-chhabra-infinitusai: https://github.com/amit-chhabra-infinitusai
@chhabraamit: https://github.com/chhabraamit	@miraalk: https://github.com/miraalk	@BB-2603: https://github.com/BB-2603
@brianod: https://github.com/brianod	@ParasVc98: https://github.com/ParasVc98	@janetxrm: https://github.com/janetxrm
@uAreElle: https://github.com/uAreElle	@dk1223: https://github.com/dk1223	@mathurah: https://github.com/mathurah
@rajgopal123: https://github.com/rajgopal123	@Akhil9325: https://github.com/Akhil9325	@JeseKi: https://github.com/JeseKi
@Ggia71: https://github.com/Ggia71	@olivermontes: https://github.com/olivermontes	@pksrawal: https://github.com/pksrawal
@haroldkabiling: https://github.com/haroldkabiling	@rajat4064g: https://github.com/rajat4064g	@geeknik: https://github.com/geeknik
@Jovy550: https://github.com/Jovy550	@sru-cyber: https://github.com/sru-cyber	@animeshmitra21: https://github.com/animeshmitra21
@johannyu: https://github.com/johannyu	@arnob-sengupta: https://github.com/arnob-sengupta	@florrdv: https://github.com/florrdv
@michelle-chiu: https://github.com/michelle-chiu	@minseungseon: https://github.com/minseungseon	@shraddha55: https://github.com/shraddha55
@GozieN: https://github.com/GozieN	@heypeppercrunch: https://github.com/heypeppercrunch	@SWAYAMK44: https://github.com/SWAYAMK44
@IC-Induja: https://github.com/IC-Induja	@toluolubode: https://github.com/toluolubode	@aliceli-rr: https://github.com/aliceli-rr
@MadhuBajaj15: https://github.com/MadhuBajaj15	@RupaliLahoti: https://github.com/RupaliLahoti	@Pravoli: https://github.com/Pravoli
@lhepchabz: https://github.com/lhepchabz	@ahad-s: https://github.com/ahad-s	@MarcXMe: https://github.com/MarcXMe
@shivang710: https://github.com/shivang710	@umairinam76: https://github.com/umairinam76	@mhmmdihza: https://github.com/mhmmdihza
@Cod-cypher: https://github.com/Cod-cypher	@Intechlligent1: https://github.com/Intechlligent1	@ramadimasatria: https://github.com/ramadimasatria
@rajasami156: https://github.com/rajasami156	@UmerBaig123: https://github.com/UmerBaig123	@MrRStarkey: https://github.com/MrRStarkey
@kxhelilaj: https://github.com/kxhelilaj	@saadsh15: https://github.com/saadsh15	@serdarzuli: https://github.com/serdarzuli
@kevinstubbs: https://github.com/kevinstubbs	@jakedent: https://github.com/jakedent	@iknoorrawal: https://github.com/iknoorrawal
@chaowss: https://github.com/chaowss	@MGJillaniMughal: https://github.com/MGJillaniMughal	@najeebulhassan: https://github.com/najeebulhassan
@Mr-MeerMoazzam: https://github.com/Mr-MeerMoazzam	@Whitecoolman: https://github.com/Whitecoolman	@ChaymaBrk: https://github.com/ChaymaBrk
@Wasif-Maqsood: https://github.com/Wasif-Maqsood	@Sofstica-Najeeb-Khan: https://github.com/Sofstica-Najeeb-Khan	@TahirHameed74: https://github.com/TahirHameed74
@micheal0034: https://github.com/micheal0034	@Obaid005: https://github.com/Obaid005	@Najeeb-Idrees: https://github.com/Najeeb-Idrees
@cciliayang: https://github.com/cciliayang	@jenniferchen11: https://github.com/jenniferchen11	@abuzarmushtaq: https://github.com/abuzarmushtaq
@jihad1973: https://github.com/jihad1973	@Ponvishnu: https://github.com/Ponvishnu	@darkhorse00512: https://github.com/darkhorse00512
@birendra027: https://github.com/birendra027	@Haziq046: https://github.com/Haziq046	@kaivalyagandhi: https://github.com/kaivalyagandhi
@avikonduru: https://github.com/avikonduru	@sexylasagna: https://github.com/sexylasagna	@nk183: https://github.com/nk183
@AliMurtaza096: https://github.com/AliMurtaza096	@nokid7: https://github.com/nokid7	@NjbSyd: https://github.com/NjbSyd
@aslirajesh: https://github.com/aslirajesh	@cs96ai: https://github.com/cs96ai	@ethansbenjamin: https://github.com/ethansbenjamin
@alonso130r: https://github.com/alonso130r	@Najeebahmed11: https://github.com/Najeebahmed11	@surequinn: https://github.com/surequinn
@crispychili: https://github.com/crispychili	@scchang-catherine: https://github.com/scchang-catherine	@alimurtaza-idrak: https://github.com/alimurtaza-idrak
@karanbalaji: https://github.com/karanbalaji	@Husnain306: https://github.com/Husnain306	@upadhyaykshiti: https://github.com/upadhyaykshiti
@YoussefZayed: https://github.com/YoussefZayed	@Kblack0610: https://github.com/Kblack0610	@yousheng44: https://github.com/yousheng44
@harrishanlogan: https://github.com/harrishanlogan	@kfj001: https://github.com/kfj001	@mananomartinez: https://github.com/mananomartinez
@pr0mila: https://github.com/pr0mila	@anshit-chaudhari: https://github.com/anshit-chaudhari	@srinijammula: https://github.com/srinijammula
@Austincain1006: https://github.com/Austincain1006	@VThejas: https://github.com/VThejas	@garvitalwar: https://github.com/garvitalwar
@Gao-Yang-cpu: https://github.com/Gao-Yang-cpu	@swisherrr: https://github.com/swisherrr	@Malavya-Raval: https://github.com/Malavya-Raval
@TedDBear: https://github.com/TedDBear	@aniahb101: https://github.com/aniahb101	@NisargKotak: https://github.com/NisargKotak
@pratik-31: https://github.com/pratik-31	@ivankitanovski: https://github.com/ivankitanovski	@aliya-khalil21: https://github.com/aliya-khalil21
@Shubham91999: https://github.com/Shubham91999	@Kohink: https://github.com/Kohink	@ajinkya-rasane: https://github.com/ajinkya-rasane
@TLSZS0418: https://github.com/TLSZS0418	@fan70m: https://github.com/fan70m	@az-rye: https://github.com/az-rye
@akshay-SE-Maldev: https://github.com/akshay-SE-Maldev	@Mickey9315: https://github.com/Mickey9315	@juiceomilk: https://github.com/juiceomilk
@madhavramini: https://github.com/madhavramini	@AviralYO: https://github.com/AviralYO	@devanshi-ptk: https://github.com/devanshi-ptk
@srimur: https://github.com/srimur	@shivamkhare95: https://github.com/shivamkhare95	@Mgiri1234: https://github.com/Mgiri1234
@shreyyyansh: https://github.com/shreyyyansh	@Kreed22: https://github.com/Kreed22	@nidhikhatri18: https://github.com/nidhikhatri18
@divyaprakash0426: https://github.com/divyaprakash0426	@himangi05: https://github.com/himangi05	@carynn101: https://github.com/carynn101
@TeamBuilderApp: https://github.com/TeamBuilderApp	@NainAbdi: https://github.com/NainAbdi	@Nishkarsh1606: https://github.com/Nishkarsh1606
@bendemonium: https://github.com/bendemonium	@tonyshi1111: https://github.com/tonyshi1111	@Naranja-Sagged: https://github.com/Naranja-Sagged
@Jairo-Morelli: https://github.com/Jairo-Morelli	@Mickey105: https://github.com/Mickey105	@alfsiezar: https://github.com/alfsiezar
@abdulrehan1729: https://github.com/abdulrehan1729	@ISubomi: https://github.com/ISubomi	@BhavanaPolakala: https://github.com/BhavanaPolakala
@jack-makers: https://github.com/jack-makers	@pavansurya09: https://github.com/pavansurya09	@PrithhviSunil: https://github.com/PrithhviSunil
@shriakhilc: https://github.com/shriakhilc	@Ankith1999: https://github.com/Ankith1999	@Emenlentino: https://github.com/Emenlentino
@zaynnqureshi17: https://github.com/zaynnqureshi17	@Ashish-3000: https://github.com/Ashish-3000	@wavegate: https://github.com/wavegate
@richexplorer: https://github.com/richexplorer	@keeansarani: https://github.com/keeansarani	@Mustafaahmed00: https://github.com/Mustafaahmed00
@almzayyen: https://github.com/almzayyen	@derickmr: https://github.com/derickmr	@gastondana627: https://github.com/gastondana627

Name		Name	Last commit message	Last commit date
Latest commit History 204 Commits
.claude		.claude
.github		.github
ComfyUI		ComfyUI
comfy-easy-grids		comfy-easy-grids
docs		docs
dream_layer_backend		dream_layer_backend
dream_layer_frontend		dream_layer_frontend
scripts		scripts
tests		tests
.gitignore		.gitignore
GEMINI_INTEGRATION_DEMO.md		GEMINI_INTEGRATION_DEMO.md
GENERATION_HISTORY_IMPLEMENTATION.md		GENERATION_HISTORY_IMPLEMENTATION.md
KaggleDreamLayer.ipynb		KaggleDreamLayer.ipynb
LICENSE		LICENSE
README.md		README.md
install_linux_dependencies.sh		install_linux_dependencies.sh
install_linux_dependencies_integration_tests.sh		install_linux_dependencies_integration_tests.sh
install_mac_dependencies.sh		install_mac_dependencies.sh
install_windows_dependencies.bat		install_windows_dependencies.bat
install_windows_dependencies.ps1		install_windows_dependencies.ps1
mkdocs.yml		mkdocs.yml
start_dream_layer.bat		start_dream_layer.bat
start_dream_layer.sh		start_dream_layer.sh
start_dream_layer_linux.sh		start_dream_layer_linux.sh
test_gemini_integration.py		test_gemini_integration.py
ui_mockup_grid_export.html		ui_mockup_grid_export.html

Folders and files

Latest commit

History

Repository files navigation

DreamLayer AI: Open-Source Benchmarking for Image and Video Diffusion Models

What is DreamLayer AI?

Who is this for?

What can DreamLayer benchmark?

Quick Start

⭐️ Run with Cursor (Smooth Setup with a Few Clicks)

Installation

Start Application

Env Variables

Access

Installing Models ⭐️

a) Closed-source API models

b) Open-source checkpoints (offline)

Optional: Download Evaluation Datasets

Why DreamLayer AI?

Metrics

Image metrics

Video metrics

When metrics compute

Storage and export

Requirements

Get Involved Today

Contributing

License

🧪 Testing

Quick Start Testing

Test Categories

Test Features

Running Individual Tests

Test Requirements

FAQ

Does DreamLayer support CLIPScore, FID, LPIPS, SSIM, and PSNR?

How is DreamLayer different from ComfyUI?

How is DreamLayer different from Automatic1111, InvokeAI, or Forge?

How is DreamLayer different from VBench, EvalCrafter, and other diffusion evaluation frameworks?

Can I benchmark Stable Diffusion, Flux, DALL·E, Gemini, Runway, Luma, Ideogram, and Stability AI models with DreamLayer?

Can DreamLayer benchmark text-to-video models like Sora, Runway, Luma, or Veo3?

Can I benchmark outputs across prompts, seeds, and configs?

How does DreamLayer calculate CLIPScore?

How does DreamLayer calculate FID, and which reference dataset does it use?

Can I add my own custom metrics?

Does DreamLayer support LoRAs, ControlNets, and custom VAEs?

Can I sweep across multiple seeds, samplers, and CFG values in one batch?

Does DreamLayer run on Mac?

What is a "run" in DreamLayer, and what gets logged?

How do I reproduce a previous run?

Does DreamLayer send my prompts or images to any server?

Can I integrate DreamLayer into a CI/CD pipeline for regression testing?

How long does a benchmark run take?

⭐ Founding Supporters

About

Topics

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages