Skip to content

waggle-sensor/imsearch_benchmaker

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

156 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Image Search Benchmark Maker

A comprehensive framework for creating structured image search evaluation datasets. This tool automates the entire pipeline from image preprocessing to dataset upload, making it easy to build high-quality benchmarks for image retrieval systems.

Release on Version Change

Features

  • Complete Pipeline Automation: End-to-end workflow from raw images to published datasets
  • Flexible Adapter System: Pluggable adapters for vision annotation, relevance judging, and similarity scoring
  • Batch Processing: Efficient batch processing for large-scale datasets.
  • Query Planning: Intelligent query generation with diversity and difficulty balancing
  • Comprehensive Analysis: Automatic generation of dataset summaries, statistics, and visualizations
  • Hugging Face Integration: Direct upload to Hugging Face Hub with dataset cards and summaries
  • Cost Tracking: Monitor API costs throughout the pipeline
  • Progress Tracking: Built-in progress bars and logging for long-running operations

Installation

Basic Installation

pip install imsearch-benchmaker

With Optional Adapters

Install with specific adapters:

# OpenAI adapters (for vision annotation and relevance judging)
pip install imsearch-benchmaker[openai]

# Local CLIP adapter (for similarity scoring)
pip install imsearch-benchmaker[local]

# All adapters
pip install imsearch-benchmaker[all]

Development Installation

git clone https://github.com/waggle-sensor/imsearch_benchmaker.git
cd imsearch_benchmaker
pip install -e .

Quick Start

1. Create a Configuration File

Create a config.toml file with your benchmark settings:

# Basic configuration
benchmark_name = "MyBenchmark"
benchmark_description = "A benchmark for image search"
log_level = "INFO"

# File paths
image_root_dir = "/path/to/images"
images_jsonl = "outputs/images.jsonl"
annotations_jsonl = "outputs/annotations.jsonl"
query_plan_jsonl = "outputs/query_plan.jsonl"
qrels_jsonl = "outputs/qrels.jsonl"
summary_output_dir = "outputs/summary"
hf_dataset_dir = "outputs/hf_dataset"

# Vision adapter configuration
[vision_config]
adapter = "openai"
model = "gpt-4o"

# Judge adapter configuration
[judge_config]
adapter = "openai"
model = "gpt-4o"

# Similarity adapter configuration
[similarity_config]
adapter = "local_clip"
model_name = "openai/clip-vit-base-patch32"

2. Run the Complete Pipeline

benchmaker all --config config.toml #or use IMSEARCH_BENCHMAKER_CONFIG_PATH env variable with the path to the config file

This will run the entire pipeline:

  1. Preprocess: Build images.jsonl from your image directory
  2. Vision: Annotate images with tags, taxonomies, and metadata
  3. Query Plan: Generate diverse queries with candidate images
  4. Judge: Evaluate relevance of candidates for each query
  5. Postprocess: Calculate similarity scores and generate summaries
  6. Upload: Upload dataset to Hugging Face Hub

Configuration

Configuration is done via TOML files (JSON is also supported). The framework uses a BenchmarkConfig class that supports:

  • Benchmark metadata: Name, description, author information
  • Column mappings: Customize column names for your data structure
    • Column Names: All fields starting with column_ or columns_ define dataset column names
  • File paths: Input and output file locations
    • Metadata JSONL: Optional path to metadata JSONL file for merging additional metadata into images.jsonl
  • Adapter settings: Configure vision, judge, and similarity adapters
    • Vision metadata columns: List of columns to extract into VisionImage.metadata for prompt interpolation
  • Query planning: Control query generation parameters
  • Hugging Face: Repository settings for dataset upload
  • Logging: Control logging level

See example/config.toml for a complete configuration example or check the BenchmarkConfig class documentation for more details.

Sensitive Fields

Fields starting with _ (e.g., _hf_token, _openai_api_key) are considered sensitive fields.

Rights Map (Metadata Configuration)

The rights_map.json file (configured via meta_json in your config) allows you to assign license, DOI, and dataset name metadata to images during preprocessing. This is useful when images come from multiple sources with different licensing requirements and you want to track which original dataset each image came from.

Syntax

The rights_map.json file has the following structure:

{
  "default": {
    "license": "UNKNOWN",
    "doi": "UNKNOWN",
    "dataset_name": "UNKNOWN"
  },
  "files": {
    "path/to/specific/image.jpg": {
      "license": "CC BY 4.0",
      "doi": "10.1234/example",
      "dataset_name": "CustomDataset"
    }
  },
  "prefixes": [
    {
      "prefix": "sage/",
      "license": "UNKNOWN",
      "doi": "10.1109/ICSENS.2016.7808975",
      "dataset_name": "Sage"
    },
    {
      "prefix": "wildfire/",
      "license": "CC BY 4.0",
      "doi": "10.3390/f14091697",
      "dataset_name": "Wildfire"
    }
  ]
}

Matching Rules

Metadata is assigned to images using the following priority order (most specific first):

  1. Exact file match: If an image ID appears in the files object, use that metadata
  2. Longest prefix match: If the image ID starts with any prefix in the prefixes array, use the metadata from the longest matching prefix
  3. Default: Use the metadata from the default object

Dataset Name Fallback

If dataset_name is missing or set to "UNKNOWN" in the rights map, the framework will automatically extract the dataset name from the image ID:

  • If the image ID contains a / separator (e.g., sage/imagesampler-bottom-2726/image.jpg), it extracts the prefix before the first / (e.g., sage)
  • If the image ID has no / separator, it uses "UNKNOWN" as the dataset name

This allows you to track which original datasets contributed to your benchmark, which is useful for generating dataset proportion visualizations.

Example

For an image with ID sage/imagesampler-bottom-2726/image.jpg:

  • If files contains an exact match, use that metadata (including dataset_name)
  • Otherwise, if it starts with sage/, use the sage/ prefix metadata (including dataset_name: "Sage")
  • Otherwise, use the default metadata
  • If dataset_name is still "UNKNOWN" or missing, extract "sage" from the image ID prefix

Configuration

Set the path to your rights map file in config.toml:

meta_json = "path/to/rights_map.json"

Or pass it via command line:

benchmaker preprocess --meta-json path/to/rights_map.json

If no meta_json is provided, you'll be prompted for default license, DOI, and dataset name values during preprocessing.

The dataset name is stored in the original_dataset_name column (configurable via column_original_dataset_name in your config) and is used to generate dataset proportion visualizations in the summary output.

See example/rights_map.json for a complete example.

Using Existing Metadata to Help Vision Models

If you have existing metadata (e.g., human-annotated labels, categories, or other pre-existing annotations) that you want to pass to the vision model to help guide its annotations, you can use the metadata extraction and prompt interpolation feature.

Step 1: Create a Metadata JSONL File

Create a metadata.jsonl file where each row contains an image_id plus any additional metadata columns you want to merge:

{"image_id": "sage/image1.jpg", "existing_label": "wildfire", "category": "outdoor"}
{"image_id": "sage/image2.jpg", "existing_label": "smoke", "category": "outdoor"}
{"image_id": "wildfire/image3.jpg", "existing_label": "flame", "category": "emergency"}

Step 2: Configure Metadata Merging

Set the path to your metadata JSONL file in config.toml:

metadata_jsonl = "inputs/metadata.jsonl"

During preprocessing, the metadata from this file will be merged into images.jsonl by matching image_id values.

Step 3: Configure Metadata Column Extraction

Specify which columns from images.jsonl should be extracted into VisionImage.metadata for use in vision model prompts:

[vision_config]
vision_metadata_columns = ["existing_label", "category"]

Step 4: Use Metadata in Prompts

Write template placeholders in your vision model prompts (in config.toml) to include the metadata:

[vision_config]
system_prompt = """You are labeling images for a retrieval benchmark.
This image has existing label: {metadata.existing_label}
Category: {metadata.category}
Use this information to guide your annotations."""
user_prompt = """Analyze the image and output JSON with:
- summary: <= 30 words, factual, no speculation
- tags: choose 12-18 tags from the provided enum list
- confidence: 0..1 per field"""

The framework will automatically interpolate {metadata.column_name} placeholders with the actual values from the metadata when building vision requests.

Example Workflow

  1. Create metadata.jsonl with image_id and your metadata columns
  2. Set metadata_jsonl = "inputs/metadata.jsonl" in config
  3. Run benchmaker preprocess - metadata will be merged into images.jsonl
  4. Set vision_metadata_columns = ["existing_label", "category"] in [vision_config]
  5. Add {metadata.existing_label} and {metadata.category} placeholders in your prompts
  6. Run benchmaker vision - the vision model will receive the metadata in its prompts

This allows you to leverage existing annotations or metadata to help the vision model produce more accurate or consistent annotations.

CLI Commands

Main Pipeline Commands

# Set the path to the config file so you don't have to pass it to each command
export IMSEARCH_BENCHMAKER_CONFIG_PATH="path/to/config.toml"

# Run complete pipeline
benchmaker all

# Individual steps
benchmaker preprocess
benchmaker vision
benchmaker plan
benchmaker judge
benchmaker postprocess similarity
benchmaker postprocess summary
benchmaker postprocess add-metadata   # (optional) add vision_metadata_columns to qrels
benchmaker upload

Utility Commands

# Check if image URLs are reachable
benchmaker check-urls --images-jsonl outputs/images.jsonl

# Clean intermediate files
benchmaker clean --config config.toml

# List OpenAI batches
benchmaker list-batches --config config.toml

Granular Control (Vision)

For more control over the vision annotation process:

# Set the path to the config file so you don't have to pass it to each command
export IMSEARCH_BENCHMAKER_CONFIG_PATH="path/to/config.toml"

# Create batch input
benchmaker vision-make

# Submit batch
benchmaker vision-submit

# Wait for completion
benchmaker vision-wait

# Download results
benchmaker vision-download

# Parse results
benchmaker vision-parse

# Retry failed requests
benchmaker vision-retry

Granular Control (Judge)

Similar granular commands are available for the judge step:

export IMSEARCH_BENCHMAKER_CONFIG_PATH="path/to/config.toml"
benchmaker judge-make
benchmaker judge-submit
benchmaker judge-wait
benchmaker judge-download
benchmaker judge-parse
benchmaker judge-retry

Adapters

The framework uses an adapter pattern for extensibility. Adapters are automatically discovered and registered. You can use different adapters for different tasks simultaneously - for example, OpenAI for vision annotation and Google Gemini for relevance judging. Simply configure each adapter in your config.toml file.

Vision Adapters

  • OpenAI: Uses OpenAI API for image annotation with structured outputs
    • Tags, taxonomies, boolean fields
    • Confidence scores
    • Controlled vocabularies

Judge Adapters

  • OpenAI: Uses OpenAI API to evaluate query-image relevance
    • Binary and graded relevance labels
    • Confidence scores

Similarity Adapters

  • Local CLIP: Local CLIP models for similarity scoring
    • Supports any CLIP model from Hugging Face
    • No API costs

Creating Custom Adapters

The framework supports creating custom adapters for vision annotation, relevance judging, and similarity scoring. Adapters are discovered when placed in the imsearch_benchmaker/adapters/ directory and registered in the imsearch_benchmaker/adapters/init.py file. For detailed instructions, code examples, and best practices, see Creating Custom Adapters.

Mixing Adapters

You can use different adapters for different tasks in the same pipeline. For example:

# Use OpenAI for vision annotation
[vision_config]
adapter = "openai"
model = "gpt-4o"

# Use Google Gemini for relevance judging
[judge_config]
adapter = "gemini"
model = "gemini-pro"

# Use local CLIP for similarity scoring
[similarity_config]
adapter = "local_clip"
model_name = "openai/clip-vit-base-patch32"

Each adapter is configured independently, allowing you to choose the best service for each task based on cost, performance, or feature requirements.

Pipeline Overview

┌─────────────┐
│   Images    │
└──────┬──────┘
       │
       ▼
┌─────────────┐     ┌─────────────┐
│ Preprocess  │────▶│ images.jsonl│
└─────────────┘     └──────┬───────┘
                           │
                           ▼
                    ┌─────────────┐
                    │   Vision    │────▶ annotations.jsonl
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │ Query Plan  │────▶ query_plan.jsonl
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │    Judge    │────▶ qrels.jsonl
                    └──────┬──────┘
                           │
                           ▼
                    ┌─────────────┐
                    │ Postprocess │────▶ qrels_with_score.jsonl
                    └──────┬──────┘      + summary/
                           │
                           ▼
                    ┌─────────────┐
                    │   Upload    │────▶ Hugging Face Hub
                    └─────────────┘

Pipeline Steps

  1. Preprocess: Converts raw images into JSONL format with metadata (image IDs, licenses, DOIs)
  2. Vision Annotation: Uses vision adapter to annotate images with summaries, tags, and categorical facets
  3. Query Planning: Selects seed images and creates candidate pools (positives, neutrals, hard/easy negatives) for each query that will be created by the judge adapter. Use query_plan_pos_total for positives (all facets match) and query_plan_neutral_total for neutrals (one facet off) candidates.
  4. Judge Generation: Uses judge adapter to generate queries and assign binary relevance labels for each candidate image in the query's candidate pool.
  5. Postprocessing: Computes similarity scores for all query-image pairs using the similarity adapter. Also, generates exploratory data analysis visualizations and statistics.
  6. Hugging Face Upload: Prepares the dataset in Hugging Face format for publication and uploads to the Hugging Face dataset repository.

Output Files

Intermediate Files

  • images.jsonl: Image metadata and URLs
  • seeds.jsonl: Seed images for query generation
  • annotations.jsonl: Vision annotations (tags, taxonomies, etc.)
  • query_plan.jsonl: Generated queries with candidate images
  • qrels.jsonl: Relevance judgments (query-image pairs)

Final Outputs

  • qrels_with_score_jsonl: QRELs with similarity scores
  • summary/: Directory containing:
    • Dataset statistics (CSV)
    • Visualizations (PNG):
      • Dataset proportion donut chart (showing percentage breakdown by original dataset)
      • Image proportion donuts (for taxonomy columns)
      • Query relevancy distributions
      • Relevance overview charts
      • Similarity score analysis
      • Confidence analysis
    • Cost summaries
    • Word clouds
  • hf_dataset/: Hugging Face dataset ready for upload
    • Row order per query: For each query_id, the first row is always the seed image (the ground-truth match for the query). The metadata in that first row (e.g. summary, tags, taxonomy fields) is the matching metadata for the query. All following rows for the same query_id are candidate images; their metadata is the metadata for that candidate image. This is important when using the dataset for evaluation. As an example, if you are using the dataset for evaluation, you can use the first row of each query as the ground-truth match for the query and the following rows as candidate images.

Examples

See the example/ directory for:

  • Complete configuration file (config.toml)
  • Sample input files
  • Example outputs
  • Dataset card template

You can also see more examples in imsearch_benchmarks repository. This repository contains the benchmarks that were created using this framework to be used in Sage Image Search.

Requirements

  • Python >= 3.11
  • Core dependencies (automatically installed):
    • see imsearch_benchmaker/requirements.txt
  • Optional adapter dependencies:
    • see imsearch_benchmaker/adapters/{adapter_name}/requirements.txt

imsearch_benchmaker + imsearch_eval

Combine imsearch_benchmaker and imsearch_eval to create a complete pipeline for image search evaluation. imsearch_benchmaker creates the benchmarks and imsearch_eval uses them to evaluate the performance of the image search system.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Author

Support

For issues, questions, or contributions, please open an issue on GitHub.

Citation

If you use this framework in your research, please cite:

@software{imsearch_benchmaker,
  title = {Image Search Benchmark Maker},
  author = {Lozano, Francisco},
  organization = {Northwestern University},
  orcid = {0009-0003-8823-4046},
  year = {2026},
  url = {https://github.com/waggle-sensor/imsearch_benchmaker}
}

TODOs

  • fix bug where if you run vision-submit with a file input in cli, the batch id is saved in the same directory as the file input but it should be saved in the same place as if when config.toml is used
  • Add pytest and create a testing pipeline
  • Add support for more adapters

About

Image Search Benchmark Maker is a tool for creating image retrieval benchmarks.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors