🏭 Auto Annotation Pipeline

Automated annotation pipeline for CVAT — leveraging YOLOv8 segmentation, SAM2 masking, and embedding-based label assignment to handle images with large numbers of parameters at scale.

📋 Overview

This project automates the end-to-end annotation workflow for the CVAT annotation tool. It was built to handle datasets where images contain many objects with complex multi-attribute annotations (category, material, colour, application, grade, etc.) — making manual annotation impractical.

Pipeline Flow

S3 Images → YOLO/SAM2 Segmentation → Object Cropping → Embedding Matching → Label Assignment → COCO/CVAT Export

flowchart LR
    A[AWS S3 Bucket] -->|Download Images| B[data_acquisition]
    B --> C[segmentation_pipeline]
    C -->|YOLO Masks / SAM2 Polygons| D[Object Crops]
    D --> E[auto_labeling]
    E -->|PE Embeddings + FAISS| F[Portfolio JSONs]
    F --> G[label_mapping]
    G -->|Category + Attribute Mapping| H[COCO JSON with Attributes]
    H --> I[format_converters]
    I -->|Datumaro / CVAT XML| J[CVAT Import Ready]

📁 Project Structure

Auto annotation/
│
├── data_acquisition/             # Download images from AWS S3
│   ├── aws_s3.py                 # Basic S3 downloader with credentials
│   ├── aws_s3v2.py               # S3 downloader with JSONL exclusion list
│   └── Umbergaun.py              # HTTP-based image scraper (public bucket)
│
├── segmentation_pipeline/        # Object detection & mask generation
│   ├── pipeline_main.py          # 🔥 Main orchestrator (runs all steps)
│   ├── pipeline.py               # SAM2 multi-worker pipeline with ROI
│   ├── roi_sam2_pipeline.py      # Production Virtual ROI + SAM2 pipeline
│   ├── auto_yolo_polygon.py      # YOLOv8-seg → COCO polygon (with S3 download)
│   ├── auto_yolo_polygon1.py     # Extended: streaming JSONL + OOM-safe
│   ├── auto_polygon_download.py  # YOLOv8-seg → COCO + CVAT manifest + crop
│   ├── auto_polygon_download1.py # With resume, checkpoint, corruption scan
│   ├── auto_masking_yolo.py      # YOLO mask-based segmentation
│   ├── sam_virtual.py            # SAM2 with auto-detect vision box ROI
│   ├── sam_virtual_v2.py         # SAM2 variant (v2)
│   ├── sam_virtual_v3.py         # SAM2 variant (v3)
│   ├── sam_virtual_v4.py         # SAM2 variant (v4)
│   ├── virtualroisam.py          # Virtual ROI + SAM pipeline
│   ├── yolomask.py               # YOLO mask utilities
│   ├── sam2_coco_batch.py        # SAM2 batch COCO generation
│   ├── roi.py                    # Interactive ROI selector (GUI)
│   ├── roi_batch.py              # Batch ROI processing
│   ├── roi_detector.py           # Automatic ROI detection
│   ├── manual_roi.py             # Manual ROI input (headless)
│   └── verify_pipeline.py        # Unit tests for geometry utils
│
├── auto_labeling/                # Embedding-based automatic label assignment
│   ├── auto_label.py             # FAISS nearest-neighbor labeling (v1)
│   ├── auto_label_v2.py          # Top-K voting with similarity threshold
│   ├── auto_label_v3.py          # Real-time HNSW matcher (no pre-built index)
│   ├── auto_label_v4.py          # Matcher with output organization
│   ├── Portfolio.py              # Single-bank portfolio inference
│   ├── batch_portfolio.py        # Multi-bank portfolio (parallel matching)
│   ├── batch_portfolio_v2.py     # Portfolio v2 with improvements
│   ├── portfolio_inference_parallel.py  # Parallel portfolio inference
│   └── matching_1bank.py         # Single bank matching utility
│
├── label_mapping/                # Category & attribute normalization for CVAT
│   ├── mapping.py                # Portfolio → COCO merge (multiple iterations)
│   ├── mapping_stats.py          # Mapping statistics & diagnostics
│   ├── label_mapping.py          # Label normalization rules
│   ├── label_updater.py          # Batch label update utility
│   ├── final_annotation_mapping.py  # Official CVAT category + attribute mapping
│   ├── final_mapping_v2.py       # Mapping variant (v2)
│   ├── final_mapping_v3.py       # Mapping variant (v3)
│   ├── update_annotation.py      # Annotation update utility
│   ├── annotation_cleaning.py    # Clean up annotations
│   ├── annotation_clean_object.py # Object-level annotation cleanup
│   └── redistribute_annotation_jobwise.py  # Split annotations into CVAT jobs
│
├── format_converters/            # Annotation format conversion tools
│   ├── coco_to_datumaro.py       # COCO → Datumaro JSON conversion
│   ├── coco_datumaro.py          # Datumaro format handler
│   ├── coco_structure.py         # COCO JSON structure validator
│   ├── datumaro_convertor.py     # Datumaro conversion utility
│   └── clean_coco_yolo.py        # Clean COCO for YOLO compatibility
│
├── colour_classification/        # Train & run colour classifiers
│   ├── colour_classifier.py      # ResNet18 colour classifier trainer
│   ├── colour_classifier2.py     # Colour classifier variant
│   ├── train_classifier.py       # ResNet101 with class-aware augmentation
│   └── train_resnet50.py         # ResNet50 colour classifier
│
├── embedding_tools/              # Reference bank building & embedding utilities
│   ├── embedding_extraction.py   # Extract PE embeddings from reference images
│   ├── embedding_extraction_2.py # Embedding extraction (v2)
│   ├── embedding_visualization.py # t-SNE / UMAP embedding visualization
│   ├── refrence_embeddingbuilder.py # Build reference embedding index
│   ├── bank_building.py          # Multi-attribute bank builder
│   ├── bank_loader.py            # Bank loading utility
│   ├── build_single_bank.py      # Build single attribute bank
│   ├── build_single_class_bank.py # Build single class bank
│   └── bluehd_matching.py        # Blue HD specific matching
│
├── visualization/                # Annotation & result visualization
│   ├── visualize.py              # COCO polygon visualization overlay
│   ├── visualize1.py             # Visualization variant
│   └── visualize_label.py        # Label-aware visualization
│
├── utilities/                    # Helper scripts & image processing
│   ├── data_prepration.py        # Crop annotations from Label Studio JSON
│   ├── image_croping.py          # Image cropping utilities
│   ├── cropping_coco.py          # COCO-based image cropping
│   ├── crop_yolo_mask.py         # Crop using YOLO masks
│   ├── label_crop.py             # Label-based cropping
│   ├── folder_flat.py            # Flatten folder structure
│   ├── interactive_selector.py   # Interactive image selection tool
│   ├── version_5.py              # Pipeline version 5 iteration
│   ├── testing.py                # Test/debug scripts
│   └── testing1.py               # Test/debug scripts
│
├── model_weights/                # Pre-trained model checkpoints
│   ├── best.pt                   # YOLOv8m-seg best weights (~55 MB)
│   └── sam2.1_l.pt               # SAM 2.1 Large checkpoint (~449 MB)
│
├── setup/                        # Environment setup scripts
│   ├── install_sam.sh            # SAM2 + venv setup (CUDA 12.x)
│   └── Miniconda3-latest-Linux-x86_64.sh  # Miniconda installer
│
├── data_outputs/                 # Generated output data files
│   ├── auto_labeled_objects.csv  # Auto-labeled results
│   ├── colorhd_matched_as_bluehd.txt  # Colour matching results
│   └── colour                    # Colour data file
│
├── ai-data-engine/               # Separate AI Data Engine project (Docker-based)
│   ├── backend/                  # Backend API
│   ├── frontend/                 # Frontend UI
│   ├── inference-worker/         # ML inference workers
│   ├── workers/                  # Background workers
│   ├── deploy/                   # Deployment configs
│   ├── docker-compose.yml        # Docker orchestration
│   └── docs/                     # Documentation
│
└── README.md                     # ← You are here

🚀 Quick Start

Prerequisites

Requirement	Version
Python	3.10+
CUDA	12.x
PyTorch	2.4+
GPU	NVIDIA (CUDA-capable)

1. Environment Setup

# Option A: Use the provided install script (Linux)
chmod +x setup/install_sam.sh
bash setup/install_sam.sh

# Option B: Manual setup
python3.10 -m venv venv
source venv/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install ultralytics opencv-python-headless pycocotools faiss-cpu boto3 supervision tqdm pandas pillow

2. Configure AWS Credentials

Create a .env file in the project root:

AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

3. Run the Pipeline

# Full pipeline (S3 download → YOLO segmentation → Portfolio inference)
python segmentation_pipeline/pipeline_main.py

Or run individual steps:

# Step 1: Download images from S3
python data_acquisition/aws_s3v2.py

# Step 2: YOLO segmentation → COCO polygons + crops
python segmentation_pipeline/auto_polygon_download1.py

# Step 3: Portfolio inference (auto-label assignment)
python auto_labeling/batch_portfolio.py

# Step 4: Map labels to official CVAT categories
python label_mapping/final_annotation_mapping.py

# Step 5: Convert to Datumaro format for CVAT import
python format_converters/coco_to_datumaro.py

🧠 How It Works

1. Image Acquisition (`data_acquisition`)

Images are downloaded from AWS S3 buckets (e.g., wi-dataset bucket) with date-range filtering and parallel download support. Previously annotated images are excluded via JSONL exclusion lists to avoid re-processing.

2. Object Segmentation (`segmentation_pipeline`)

Two segmentation backends are supported:

YOLOv8m-seg: Instance segmentation producing polygon masks in COCO format
SAM2 (Segment Anything Model 2): Zero-shot segmentation with Virtual ROI calibration

A Virtual ROI system restricts segmentation to a fixed conveyor-belt region within the frame, avoiding background noise. The ROI can be set interactively (GUI) or programmatically.

3. Automatic Label Assignment (`auto_labeling`)

Cropped objects are matched against pre-built reference embedding banks using:

PE (Perception Encoder) CLIP backbone for feature extraction
FAISS HNSW index for efficient nearest-neighbor search
Top-K voting with similarity thresholds for robust label assignment
Multi-bank portfolio matching — each bank handles one attribute dimension (category, material, colour, application, grade, etc.)

4. Label Mapping & Normalization (`label_mapping`)

Portfolio results are merged into the COCO JSON with official CVAT category IDs and normalized attribute values. The system supports 51 object categories and 9 attribute dimensions including Material, Colour, Application, Grade, Cap Material, Feature, Generic Material Type, and Damage.

5. Format Conversion (`format_converters`)

Final annotations are converted to CVAT-compatible formats:

COCO JSON (with polygon segmentation + attributes)
Datumaro JSON (native CVAT format)
CVAT manifest.jsonl (for cloud-storage-backed tasks)

📊 CVAT Annotation Schema

Object Categories (51 classes)

ID	Category	ID	Category
1	Unknown	14	Shiny Wrapper
2	Bottle	15	Wrapper
3	Tray	23	Cup
4	Mixed	25	Jar
5	Tray (merged with lid)	37	Cap
6	Can	41	Bucket
7	Film	49	Container
50	Glass	51	Object

Attribute Dimensions

Attribute	Example Values
Material	PET, HDPE, PP, PS, HIPS, MLP, Glass, Paper, Steel, Aluminum
Colour	Clear transparent, White opaque, Blue opaque, Mixed, Coloured
Application	Drinking Water, Milk Packaging, Toilet cleaner, Shampoo, Edible Oil
General Application	Hair care, Skin care, Bathroom care, Kitchen Essential, Food
Grade	Foodgrade, Non-foodgrade
Cap Material	Plastic, Metal
Feature	Cylinder, Circular, Rectangle, Flat, Irregular
Generic Material Type	Rigid, Flexible
Damaged	Undamaged, Broken

🔧 Key Technologies

Component	Technology
Object Detection	YOLOv8m-seg (Ultralytics)
Mask Generation	SAM 2.1 (Segment Anything Model 2)
Feature Extraction	PE-Core CLIP (B16-224 / L14-336)
Similarity Search	FAISS (HNSW index)
Colour Classification	ResNet-18 / ResNet-101 (fine-tuned)
Annotation Format	COCO JSON, Datumaro, CVAT
Cloud Storage	AWS S3 (boto3)
GPU Acceleration	CUDA + PyTorch AMP (float16)

⚠️ Notes

Model weights (model_weights/) contain large files (~500 MB total) — these should be downloaded separately or tracked with Git LFS.
Hardcoded paths — Many scripts contain hardcoded Linux paths (e.g., /home/wi/Avinash_Works/...). Update these to match your environment before running.
ROI Configuration — The Virtual ROI coordinates are calibrated for specific camera setups (Umbergaum 1 & 2). Recalibrate using roi.py for new camera positions.
Scripts contain multiple commented-out iterations preserving the development history of each module.

📄 License

Internal / Proprietary — not for public distribution.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🏭 Auto Annotation Pipeline

📋 Overview

Pipeline Flow

📁 Project Structure

🚀 Quick Start

Prerequisites

1. Environment Setup

2. Configure AWS Credentials

3. Run the Pipeline

🧠 How It Works

1. Image Acquisition (`data_acquisition`)

2. Object Segmentation (`segmentation_pipeline`)

3. Automatic Label Assignment (`auto_labeling`)

4. Label Mapping & Normalization (`label_mapping`)

5. Format Conversion (`format_converters`)

📊 CVAT Annotation Schema

Object Categories (51 classes)

Attribute Dimensions

🔧 Key Technologies

⚠️ Notes

📄 License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
auto_labeling		auto_labeling
colour_classification		colour_classification
data_acquisition		data_acquisition
embedding_tools		embedding_tools
format_converters		format_converters
label_mapping		label_mapping
segmentation_pipeline		segmentation_pipeline
setup		setup
utilities		utilities
visualization		visualization
.gitignore		.gitignore
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

🏭 Auto Annotation Pipeline

📋 Overview

Pipeline Flow

📁 Project Structure

🚀 Quick Start

Prerequisites

1. Environment Setup

2. Configure AWS Credentials

3. Run the Pipeline

🧠 How It Works

1. Image Acquisition (data_acquisition)

2. Object Segmentation (segmentation_pipeline)

3. Automatic Label Assignment (auto_labeling)

4. Label Mapping & Normalization (label_mapping)

5. Format Conversion (format_converters)

📊 CVAT Annotation Schema

Object Categories (51 classes)

Attribute Dimensions

🔧 Key Technologies

⚠️ Notes

📄 License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

1. Image Acquisition (`data_acquisition`)

2. Object Segmentation (`segmentation_pipeline`)

3. Automatic Label Assignment (`auto_labeling`)

4. Label Mapping & Normalization (`label_mapping`)

5. Format Conversion (`format_converters`)

Packages