Skip to content

rushillllchhaya/Auto-Annotation

Repository files navigation

🏭 Auto Annotation Pipeline

Automated annotation pipeline for CVAT — leveraging YOLOv8 segmentation, SAM2 masking, and embedding-based label assignment to handle images with large numbers of parameters at scale.


📋 Overview

This project automates the end-to-end annotation workflow for the CVAT annotation tool. It was built to handle datasets where images contain many objects with complex multi-attribute annotations (category, material, colour, application, grade, etc.) — making manual annotation impractical.

Pipeline Flow

S3 Images → YOLO/SAM2 Segmentation → Object Cropping → Embedding Matching → Label Assignment → COCO/CVAT Export
flowchart LR
    A[AWS S3 Bucket] -->|Download Images| B[data_acquisition]
    B --> C[segmentation_pipeline]
    C -->|YOLO Masks / SAM2 Polygons| D[Object Crops]
    D --> E[auto_labeling]
    E -->|PE Embeddings + FAISS| F[Portfolio JSONs]
    F --> G[label_mapping]
    G -->|Category + Attribute Mapping| H[COCO JSON with Attributes]
    H --> I[format_converters]
    I -->|Datumaro / CVAT XML| J[CVAT Import Ready]
Loading

📁 Project Structure

Auto annotation/
│
├── data_acquisition/             # Download images from AWS S3
│   ├── aws_s3.py                 # Basic S3 downloader with credentials
│   ├── aws_s3v2.py               # S3 downloader with JSONL exclusion list
│   └── Umbergaun.py              # HTTP-based image scraper (public bucket)
│
├── segmentation_pipeline/        # Object detection & mask generation
│   ├── pipeline_main.py          # 🔥 Main orchestrator (runs all steps)
│   ├── pipeline.py               # SAM2 multi-worker pipeline with ROI
│   ├── roi_sam2_pipeline.py      # Production Virtual ROI + SAM2 pipeline
│   ├── auto_yolo_polygon.py      # YOLOv8-seg → COCO polygon (with S3 download)
│   ├── auto_yolo_polygon1.py     # Extended: streaming JSONL + OOM-safe
│   ├── auto_polygon_download.py  # YOLOv8-seg → COCO + CVAT manifest + crop
│   ├── auto_polygon_download1.py # With resume, checkpoint, corruption scan
│   ├── auto_masking_yolo.py      # YOLO mask-based segmentation
│   ├── sam_virtual.py            # SAM2 with auto-detect vision box ROI
│   ├── sam_virtual_v2.py         # SAM2 variant (v2)
│   ├── sam_virtual_v3.py         # SAM2 variant (v3)
│   ├── sam_virtual_v4.py         # SAM2 variant (v4)
│   ├── virtualroisam.py          # Virtual ROI + SAM pipeline
│   ├── yolomask.py               # YOLO mask utilities
│   ├── sam2_coco_batch.py        # SAM2 batch COCO generation
│   ├── roi.py                    # Interactive ROI selector (GUI)
│   ├── roi_batch.py              # Batch ROI processing
│   ├── roi_detector.py           # Automatic ROI detection
│   ├── manual_roi.py             # Manual ROI input (headless)
│   └── verify_pipeline.py        # Unit tests for geometry utils
│
├── auto_labeling/                # Embedding-based automatic label assignment
│   ├── auto_label.py             # FAISS nearest-neighbor labeling (v1)
│   ├── auto_label_v2.py          # Top-K voting with similarity threshold
│   ├── auto_label_v3.py          # Real-time HNSW matcher (no pre-built index)
│   ├── auto_label_v4.py          # Matcher with output organization
│   ├── Portfolio.py              # Single-bank portfolio inference
│   ├── batch_portfolio.py        # Multi-bank portfolio (parallel matching)
│   ├── batch_portfolio_v2.py     # Portfolio v2 with improvements
│   ├── portfolio_inference_parallel.py  # Parallel portfolio inference
│   └── matching_1bank.py         # Single bank matching utility
│
├── label_mapping/                # Category & attribute normalization for CVAT
│   ├── mapping.py                # Portfolio → COCO merge (multiple iterations)
│   ├── mapping_stats.py          # Mapping statistics & diagnostics
│   ├── label_mapping.py          # Label normalization rules
│   ├── label_updater.py          # Batch label update utility
│   ├── final_annotation_mapping.py  # Official CVAT category + attribute mapping
│   ├── final_mapping_v2.py       # Mapping variant (v2)
│   ├── final_mapping_v3.py       # Mapping variant (v3)
│   ├── update_annotation.py      # Annotation update utility
│   ├── annotation_cleaning.py    # Clean up annotations
│   ├── annotation_clean_object.py # Object-level annotation cleanup
│   └── redistribute_annotation_jobwise.py  # Split annotations into CVAT jobs
│
├── format_converters/            # Annotation format conversion tools
│   ├── coco_to_datumaro.py       # COCO → Datumaro JSON conversion
│   ├── coco_datumaro.py          # Datumaro format handler
│   ├── coco_structure.py         # COCO JSON structure validator
│   ├── datumaro_convertor.py     # Datumaro conversion utility
│   └── clean_coco_yolo.py        # Clean COCO for YOLO compatibility
│
├── colour_classification/        # Train & run colour classifiers
│   ├── colour_classifier.py      # ResNet18 colour classifier trainer
│   ├── colour_classifier2.py     # Colour classifier variant
│   ├── train_classifier.py       # ResNet101 with class-aware augmentation
│   └── train_resnet50.py         # ResNet50 colour classifier
│
├── embedding_tools/              # Reference bank building & embedding utilities
│   ├── embedding_extraction.py   # Extract PE embeddings from reference images
│   ├── embedding_extraction_2.py # Embedding extraction (v2)
│   ├── embedding_visualization.py # t-SNE / UMAP embedding visualization
│   ├── refrence_embeddingbuilder.py # Build reference embedding index
│   ├── bank_building.py          # Multi-attribute bank builder
│   ├── bank_loader.py            # Bank loading utility
│   ├── build_single_bank.py      # Build single attribute bank
│   ├── build_single_class_bank.py # Build single class bank
│   └── bluehd_matching.py        # Blue HD specific matching
│
├── visualization/                # Annotation & result visualization
│   ├── visualize.py              # COCO polygon visualization overlay
│   ├── visualize1.py             # Visualization variant
│   └── visualize_label.py        # Label-aware visualization
│
├── utilities/                    # Helper scripts & image processing
│   ├── data_prepration.py        # Crop annotations from Label Studio JSON
│   ├── image_croping.py          # Image cropping utilities
│   ├── cropping_coco.py          # COCO-based image cropping
│   ├── crop_yolo_mask.py         # Crop using YOLO masks
│   ├── label_crop.py             # Label-based cropping
│   ├── folder_flat.py            # Flatten folder structure
│   ├── interactive_selector.py   # Interactive image selection tool
│   ├── version_5.py              # Pipeline version 5 iteration
│   ├── testing.py                # Test/debug scripts
│   └── testing1.py               # Test/debug scripts
│
├── model_weights/                # Pre-trained model checkpoints
│   ├── best.pt                   # YOLOv8m-seg best weights (~55 MB)
│   └── sam2.1_l.pt               # SAM 2.1 Large checkpoint (~449 MB)
│
├── setup/                        # Environment setup scripts
│   ├── install_sam.sh            # SAM2 + venv setup (CUDA 12.x)
│   └── Miniconda3-latest-Linux-x86_64.sh  # Miniconda installer
│
├── data_outputs/                 # Generated output data files
│   ├── auto_labeled_objects.csv  # Auto-labeled results
│   ├── colorhd_matched_as_bluehd.txt  # Colour matching results
│   └── colour                    # Colour data file
│
├── ai-data-engine/               # Separate AI Data Engine project (Docker-based)
│   ├── backend/                  # Backend API
│   ├── frontend/                 # Frontend UI
│   ├── inference-worker/         # ML inference workers
│   ├── workers/                  # Background workers
│   ├── deploy/                   # Deployment configs
│   ├── docker-compose.yml        # Docker orchestration
│   └── docs/                     # Documentation
│
└── README.md                     # ← You are here

🚀 Quick Start

Prerequisites

Requirement Version
Python 3.10+
CUDA 12.x
PyTorch 2.4+
GPU NVIDIA (CUDA-capable)

1. Environment Setup

# Option A: Use the provided install script (Linux)
chmod +x setup/install_sam.sh
bash setup/install_sam.sh

# Option B: Manual setup
python3.10 -m venv venv
source venv/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install ultralytics opencv-python-headless pycocotools faiss-cpu boto3 supervision tqdm pandas pillow

2. Configure AWS Credentials

Create a .env file in the project root:

AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key

3. Run the Pipeline

# Full pipeline (S3 download → YOLO segmentation → Portfolio inference)
python segmentation_pipeline/pipeline_main.py

Or run individual steps:

# Step 1: Download images from S3
python data_acquisition/aws_s3v2.py

# Step 2: YOLO segmentation → COCO polygons + crops
python segmentation_pipeline/auto_polygon_download1.py

# Step 3: Portfolio inference (auto-label assignment)
python auto_labeling/batch_portfolio.py

# Step 4: Map labels to official CVAT categories
python label_mapping/final_annotation_mapping.py

# Step 5: Convert to Datumaro format for CVAT import
python format_converters/coco_to_datumaro.py

🧠 How It Works

1. Image Acquisition (data_acquisition)

Images are downloaded from AWS S3 buckets (e.g., wi-dataset bucket) with date-range filtering and parallel download support. Previously annotated images are excluded via JSONL exclusion lists to avoid re-processing.

2. Object Segmentation (segmentation_pipeline)

Two segmentation backends are supported:

  • YOLOv8m-seg: Instance segmentation producing polygon masks in COCO format
  • SAM2 (Segment Anything Model 2): Zero-shot segmentation with Virtual ROI calibration

A Virtual ROI system restricts segmentation to a fixed conveyor-belt region within the frame, avoiding background noise. The ROI can be set interactively (GUI) or programmatically.

3. Automatic Label Assignment (auto_labeling)

Cropped objects are matched against pre-built reference embedding banks using:

  • PE (Perception Encoder) CLIP backbone for feature extraction
  • FAISS HNSW index for efficient nearest-neighbor search
  • Top-K voting with similarity thresholds for robust label assignment
  • Multi-bank portfolio matching — each bank handles one attribute dimension (category, material, colour, application, grade, etc.)

4. Label Mapping & Normalization (label_mapping)

Portfolio results are merged into the COCO JSON with official CVAT category IDs and normalized attribute values. The system supports 51 object categories and 9 attribute dimensions including Material, Colour, Application, Grade, Cap Material, Feature, Generic Material Type, and Damage.

5. Format Conversion (format_converters)

Final annotations are converted to CVAT-compatible formats:

  • COCO JSON (with polygon segmentation + attributes)
  • Datumaro JSON (native CVAT format)
  • CVAT manifest.jsonl (for cloud-storage-backed tasks)

📊 CVAT Annotation Schema

Object Categories (51 classes)

ID Category ID Category
1 Unknown 14 Shiny Wrapper
2 Bottle 15 Wrapper
3 Tray 23 Cup
4 Mixed 25 Jar
5 Tray (merged with lid) 37 Cap
6 Can 41 Bucket
7 Film 49 Container
50 Glass 51 Object

Attribute Dimensions

Attribute Example Values
Material PET, HDPE, PP, PS, HIPS, MLP, Glass, Paper, Steel, Aluminum
Colour Clear transparent, White opaque, Blue opaque, Mixed, Coloured
Application Drinking Water, Milk Packaging, Toilet cleaner, Shampoo, Edible Oil
General Application Hair care, Skin care, Bathroom care, Kitchen Essential, Food
Grade Foodgrade, Non-foodgrade
Cap Material Plastic, Metal
Feature Cylinder, Circular, Rectangle, Flat, Irregular
Generic Material Type Rigid, Flexible
Damaged Undamaged, Broken

🔧 Key Technologies

Component Technology
Object Detection YOLOv8m-seg (Ultralytics)
Mask Generation SAM 2.1 (Segment Anything Model 2)
Feature Extraction PE-Core CLIP (B16-224 / L14-336)
Similarity Search FAISS (HNSW index)
Colour Classification ResNet-18 / ResNet-101 (fine-tuned)
Annotation Format COCO JSON, Datumaro, CVAT
Cloud Storage AWS S3 (boto3)
GPU Acceleration CUDA + PyTorch AMP (float16)

⚠️ Notes

  • Model weights (model_weights/) contain large files (~500 MB total) — these should be downloaded separately or tracked with Git LFS.
  • Hardcoded paths — Many scripts contain hardcoded Linux paths (e.g., /home/wi/Avinash_Works/...). Update these to match your environment before running.
  • ROI Configuration — The Virtual ROI coordinates are calibrated for specific camera setups (Umbergaum 1 & 2). Recalibrate using roi.py for new camera positions.
  • Scripts contain multiple commented-out iterations preserving the development history of each module.

📄 License

Internal / Proprietary — not for public distribution.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors