Image Search Benchmark Maker

A comprehensive framework for creating structured image search evaluation datasets. This tool automates the entire pipeline from image preprocessing to dataset upload, making it easy to build high-quality benchmarks for image retrieval systems.

Features

Complete Pipeline Automation: End-to-end workflow from raw images to published datasets
Flexible Adapter System: Pluggable adapters for vision annotation, relevance judging, and similarity scoring
Batch Processing: Efficient batch processing for large-scale datasets.
Query Planning: Intelligent query generation with diversity and difficulty balancing
Comprehensive Analysis: Automatic generation of dataset summaries, statistics, and visualizations
Hugging Face Integration: Direct upload to Hugging Face Hub with dataset cards and summaries
Cost Tracking: Monitor API costs throughout the pipeline
Progress Tracking: Built-in progress bars and logging for long-running operations

Installation

Basic Installation

pip install imsearch-benchmaker

With Optional Adapters

Install with specific adapters:

# OpenAI adapters (for vision annotation and relevance judging)
pip install imsearch-benchmaker[openai]

# Local CLIP adapter (for similarity scoring)
pip install imsearch-benchmaker[local]

# All adapters
pip install imsearch-benchmaker[all]

Development Installation

git clone https://github.com/waggle-sensor/imsearch_benchmaker.git
cd imsearch_benchmaker
pip install -e .

Quick Start

1. Create a Configuration File

Create a config.toml file with your benchmark settings:

# Basic configuration
benchmark_name = "MyBenchmark"
benchmark_description = "A benchmark for image search"
log_level = "INFO"

# File paths
image_root_dir = "/path/to/images"
images_jsonl = "outputs/images.jsonl"
annotations_jsonl = "outputs/annotations.jsonl"
query_plan_jsonl = "outputs/query_plan.jsonl"
qrels_jsonl = "outputs/qrels.jsonl"
summary_output_dir = "outputs/summary"
hf_dataset_dir = "outputs/hf_dataset"

# Vision adapter configuration
[vision_config]
adapter = "openai"
model = "gpt-4o"

# Judge adapter configuration
[judge_config]
adapter = "openai"
model = "gpt-4o"

# Similarity adapter configuration
[similarity_config]
adapter = "local_clip"
model_name = "openai/clip-vit-base-patch32"

2. Run the Complete Pipeline

benchmaker all --config config.toml #or use IMSEARCH_BENCHMAKER_CONFIG_PATH env variable with the path to the config file

This will run the entire pipeline:

Preprocess: Build images.jsonl from your image directory
Vision: Annotate images with tags, taxonomies, and metadata
Query Plan: Generate diverse queries with candidate images
Judge: Evaluate relevance of candidates for each query
Postprocess: Calculate similarity scores and generate summaries
Upload: Upload dataset to Hugging Face Hub

Configuration

Configuration is done via TOML files (JSON is also supported). The framework uses a BenchmarkConfig class that supports:

Benchmark metadata: Name, description, author information
Column mappings: Customize column names for your data structure
- Column Names: All fields starting with column_ or columns_ define dataset column names
File paths: Input and output file locations
- Metadata JSONL: Optional path to metadata JSONL file for merging additional metadata into images.jsonl
Adapter settings: Configure vision, judge, and similarity adapters
- Vision metadata columns: List of columns to extract into VisionImage.metadata for prompt interpolation
Query planning: Control query generation parameters
Hugging Face: Repository settings for dataset upload
Logging: Control logging level

See example/config.toml for a complete configuration example or check the BenchmarkConfig class documentation for more details.

Sensitive Fields

Fields starting with _ (e.g., _hf_token, _openai_api_key) are considered sensitive fields.

Rights Map (Metadata Configuration)

The rights_map.json file (configured via meta_json in your config) allows you to assign license, DOI, and dataset name metadata to images during preprocessing. This is useful when images come from multiple sources with different licensing requirements and you want to track which original dataset each image came from.

Syntax

The rights_map.json file has the following structure:

{
  "default": {
    "license": "UNKNOWN",
    "doi": "UNKNOWN",
    "dataset_name": "UNKNOWN"
  },
  "files": {
    "path/to/specific/image.jpg": {
      "license": "CC BY 4.0",
      "doi": "10.1234/example",
      "dataset_name": "CustomDataset"
    }
  },
  "prefixes": [
    {
      "prefix": "sage/",
      "license": "UNKNOWN",
      "doi": "10.1109/ICSENS.2016.7808975",
      "dataset_name": "Sage"
    },
    {
      "prefix": "wildfire/",
      "license": "CC BY 4.0",
      "doi": "10.3390/f14091697",
      "dataset_name": "Wildfire"
    }
  ]
}

Matching Rules

Metadata is assigned to images using the following priority order (most specific first):

Exact file match: If an image ID appears in the files object, use that metadata
Longest prefix match: If the image ID starts with any prefix in the prefixes array, use the metadata from the longest matching prefix
Default: Use the metadata from the default object

Dataset Name Fallback

If dataset_name is missing or set to "UNKNOWN" in the rights map, the framework will automatically extract the dataset name from the image ID:

If the image ID contains a / separator (e.g., sage/imagesampler-bottom-2726/image.jpg), it extracts the prefix before the first / (e.g., sage)
If the image ID has no / separator, it uses "UNKNOWN" as the dataset name

This allows you to track which original datasets contributed to your benchmark, which is useful for generating dataset proportion visualizations.

Example

For an image with ID sage/imagesampler-bottom-2726/image.jpg:

If files contains an exact match, use that metadata (including dataset_name)
Otherwise, if it starts with sage/, use the sage/ prefix metadata (including dataset_name: "Sage")
Otherwise, use the default metadata
If dataset_name is still "UNKNOWN" or missing, extract "sage" from the image ID prefix

Configuration

Set the path to your rights map file in config.toml:

meta_json = "path/to/rights_map.json"

Or pass it via command line:

benchmaker preprocess --meta-json path/to/rights_map.json

If no meta_json is provided, you'll be prompted for default license, DOI, and dataset name values during preprocessing.

The dataset name is stored in the original_dataset_name column (configurable via column_original_dataset_name in your config) and is used to generate dataset proportion visualizations in the summary output.

See example/rights_map.json for a complete example.

Using Existing Metadata to Help Vision Models

If you have existing metadata (e.g., human-annotated labels, categories, or other pre-existing annotations) that you want to pass to the vision model to help guide its annotations, you can use the metadata extraction and prompt interpolation feature.

Step 1: Create a Metadata JSONL File

Create a metadata.jsonl file where each row contains an image_id plus any additional metadata columns you want to merge:

{"image_id": "sage/image1.jpg", "existing_label": "wildfire", "category": "outdoor"}
{"image_id": "sage/image2.jpg", "existing_label": "smoke", "category": "outdoor"}
{"image_id": "wildfire/image3.jpg", "existing_label": "flame", "category": "emergency"}

Step 2: Configure Metadata Merging

Set the path to your metadata JSONL file in config.toml:

metadata_jsonl = "inputs/metadata.jsonl"

During preprocessing, the metadata from this file will be merged into images.jsonl by matching image_id values.

Step 3: Configure Metadata Column Extraction

Specify which columns from images.jsonl should be extracted into VisionImage.metadata for use in vision model prompts:

[vision_config]
vision_metadata_columns = ["existing_label", "category"]

Step 4: Use Metadata in Prompts

Write template placeholders in your vision model prompts (in config.toml) to include the metadata:

[vision_config]
system_prompt = """You are labeling images for a retrieval benchmark.
This image has existing label: {metadata.existing_label}
Category: {metadata.category}
Use this information to guide your annotations."""
user_prompt = """Analyze the image and output JSON with:
- summary: <= 30 words, factual, no speculation
- tags: choose 12-18 tags from the provided enum list
- confidence: 0..1 per field"""

The framework will automatically interpolate {metadata.column_name} placeholders with the actual values from the metadata when building vision requests.

Example Workflow

Create metadata.jsonl with image_id and your metadata columns
Set metadata_jsonl = "inputs/metadata.jsonl" in config
Run benchmaker preprocess - metadata will be merged into images.jsonl
Set vision_metadata_columns = ["existing_label", "category"] in [vision_config]
Add {metadata.existing_label} and {metadata.category} placeholders in your prompts
Run benchmaker vision - the vision model will receive the metadata in its prompts

This allows you to leverage existing annotations or metadata to help the vision model produce more accurate or consistent annotations.

CLI Commands

Main Pipeline Commands

# Set the path to the config file so you don't have to pass it to each command
export IMSEARCH_BENCHMAKER_CONFIG_PATH="path/to/config.toml"

# Run complete pipeline
benchmaker all

# Individual steps
benchmaker preprocess
benchmaker vision
benchmaker plan
benchmaker judge
benchmaker postprocess similarity
benchmaker postprocess summary
benchmaker postprocess add-metadata   # (optional) add vision_metadata_columns to qrels
benchmaker upload

Utility Commands

# Check if image URLs are reachable
benchmaker check-urls --images-jsonl outputs/images.jsonl

# Clean intermediate files
benchmaker clean --config config.toml

# List OpenAI batches
benchmaker list-batches --config config.toml

Granular Control (Vision)

For more control over the vision annotation process:

# Set the path to the config file so you don't have to pass it to each command
export IMSEARCH_BENCHMAKER_CONFIG_PATH="path/to/config.toml"

# Create batch input
benchmaker vision-make

# Submit batch
benchmaker vision-submit

# Wait for completion
benchmaker vision-wait

# Download results
benchmaker vision-download

# Parse results
benchmaker vision-parse

# Retry failed requests
benchmaker vision-retry

Granular Control (Judge)

Similar granular commands are available for the judge step:

export IMSEARCH_BENCHMAKER_CONFIG_PATH="path/to/config.toml"
benchmaker judge-make
benchmaker judge-submit
benchmaker judge-wait
benchmaker judge-download
benchmaker judge-parse
benchmaker judge-retry

Adapters

The framework uses an adapter pattern for extensibility. Adapters are automatically discovered and registered. You can use different adapters for different tasks simultaneously - for example, OpenAI for vision annotation and Google Gemini for relevance judging. Simply configure each adapter in your config.toml file.

Vision Adapters

OpenAI: Uses OpenAI API for image annotation with structured outputs
- Tags, taxonomies, boolean fields
- Confidence scores
- Controlled vocabularies

Judge Adapters

OpenAI: Uses OpenAI API to evaluate query-image relevance
- Binary and graded relevance labels
- Confidence scores

Similarity Adapters

Local CLIP: Local CLIP models for similarity scoring
- Supports any CLIP model from Hugging Face
- No API costs

Creating Custom Adapters

The framework supports creating custom adapters for vision annotation, relevance judging, and similarity scoring. Adapters are discovered when placed in the imsearch_benchmaker/adapters/ directory and registered in the imsearch_benchmaker/adapters/init.py file. For detailed instructions, code examples, and best practices, see Creating Custom Adapters.

Mixing Adapters

You can use different adapters for different tasks in the same pipeline. For example:

# Use OpenAI for vision annotation
[vision_config]
adapter = "openai"
model = "gpt-4o"

# Use Google Gemini for relevance judging
[judge_config]
adapter = "gemini"
model = "gemini-pro"

# Use local CLIP for similarity scoring
[similarity_config]
adapter = "local_clip"
model_name = "openai/clip-vit-base-patch32"

Each adapter is configured independently, allowing you to choose the best service for each task based on cost, performance, or feature requirements.

Pipeline Overview

┌─────────────┐
│   Images    │
└──────┬──────┘
       │
       ▼
┌─────────────┐     ┌─────────────┐
│ Preprocess  │────▶│ images.jsonl│
└─────────────┘     └──────┬───────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   Vision    │────▶ annotations.jsonl
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │ Query Plan  │────▶ query_plan.jsonl
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │    Judge    │────▶ qrels.jsonl
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │ Postprocess │────▶ qrels_with_score.jsonl
                    └──────┬──────┘      + summary/
                           │
                           ▼
                    ┌─────────────┐
                    │   Upload    │────▶ Hugging Face Hub
                    └─────────────┘

Pipeline Steps

Preprocess: Converts raw images into JSONL format with metadata (image IDs, licenses, DOIs)
Vision Annotation: Uses vision adapter to annotate images with summaries, tags, and categorical facets
Query Planning: Selects seed images and creates candidate pools (positives, neutrals, hard/easy negatives) for each query that will be created by the judge adapter. Use query_plan_pos_total for positives (all facets match) and query_plan_neutral_total for neutrals (one facet off) candidates.
Judge Generation: Uses judge adapter to generate queries and assign binary relevance labels for each candidate image in the query's candidate pool.
Postprocessing: Computes similarity scores for all query-image pairs using the similarity adapter. Also, generates exploratory data analysis visualizations and statistics.
Hugging Face Upload: Prepares the dataset in Hugging Face format for publication and uploads to the Hugging Face dataset repository.

Output Files

Intermediate Files

images.jsonl: Image metadata and URLs
seeds.jsonl: Seed images for query generation
annotations.jsonl: Vision annotations (tags, taxonomies, etc.)
query_plan.jsonl: Generated queries with candidate images
qrels.jsonl: Relevance judgments (query-image pairs)

Final Outputs

qrels_with_score_jsonl: QRELs with similarity scores
summary/: Directory containing:
- Dataset statistics (CSV)
- Visualizations (PNG):
  - Dataset proportion donut chart (showing percentage breakdown by original dataset)
  - Image proportion donuts (for taxonomy columns)
  - Query relevancy distributions
  - Relevance overview charts
  - Similarity score analysis
  - Confidence analysis
- Cost summaries
- Word clouds
hf_dataset/: Hugging Face dataset ready for upload
- Row order per query: For each query_id, the first row is always the seed image (the ground-truth match for the query). The metadata in that first row (e.g. summary, tags, taxonomy fields) is the matching metadata for the query. All following rows for the same query_id are candidate images; their metadata is the metadata for that candidate image. This is important when using the dataset for evaluation. As an example, if you are using the dataset for evaluation, you can use the first row of each query as the ground-truth match for the query and the following rows as candidate images.

Examples

See the example/ directory for:

Complete configuration file (config.toml)
Sample input files
Example outputs
Dataset card template

You can also see more examples in imsearch_benchmarks repository. This repository contains the benchmarks that were created using this framework to be used in Sage Image Search.

Requirements

Python >= 3.11
Core dependencies (automatically installed):
- see imsearch_benchmaker/requirements.txt
Optional adapter dependencies:
- see imsearch_benchmaker/adapters/{adapter_name}/requirements.txt

imsearch_benchmaker + imsearch_eval

Combine imsearch_benchmaker and imsearch_eval to create a complete pipeline for image search evaluation. imsearch_benchmaker creates the benchmarks and imsearch_eval uses them to evaluate the performance of the image search system.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

Author: Francisco Lozano
Email: francisco.lozano@northwestern.edu
Affiliation: Northwestern University
GitHub: FranciscoLozCoding

Support

For issues, questions, or contributions, please open an issue on GitHub.

Citation

If you use this framework in your research, please cite:

@software{imsearch_benchmaker,
  title = {Image Search Benchmark Maker},
  author = {Lozano, Francisco},
  organization = {Northwestern University},
  orcid = {0009-0003-8823-4046},
  year = {2026},
  url = {https://github.com/waggle-sensor/imsearch_benchmaker}
}

TODOs

fix bug where if you run vision-submit with a file input in cli, the batch id is saved in the same directory as the file input but it should be saved in the same place as if when config.toml is used
Add pytest and create a testing pipeline
Add support for more adapters

Name		Name	Last commit message	Last commit date
Latest commit History 156 Commits
.github		.github
docker		docker
example		example
imsearch_benchmaker		imsearch_benchmaker
.gitignore		.gitignore
README.md		README.md
setup.py		setup.py

Folders and files

Latest commit

History

Repository files navigation

Image Search Benchmark Maker

Features

Installation

Basic Installation

With Optional Adapters

Development Installation

Quick Start

1. Create a Configuration File

2. Run the Complete Pipeline

Configuration

Sensitive Fields

Rights Map (Metadata Configuration)

Syntax

Matching Rules

Dataset Name Fallback

Example

Configuration

Using Existing Metadata to Help Vision Models

Step 1: Create a Metadata JSONL File

Step 2: Configure Metadata Merging

Step 3: Configure Metadata Column Extraction

Step 4: Use Metadata in Prompts

Example Workflow

CLI Commands

Main Pipeline Commands

Utility Commands

Granular Control (Vision)

Granular Control (Judge)

Adapters

Vision Adapters

Judge Adapters

Similarity Adapters

Creating Custom Adapters

Mixing Adapters

Pipeline Overview

Pipeline Steps

Output Files

Intermediate Files

Final Outputs

Examples

Requirements

imsearch_benchmaker + imsearch_eval

Contributing

Author

Support

Citation

TODOs

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 13

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages