Automated annotation pipeline for CVAT — leveraging YOLOv8 segmentation, SAM2 masking, and embedding-based label assignment to handle images with large numbers of parameters at scale.
This project automates the end-to-end annotation workflow for the CVAT annotation tool. It was built to handle datasets where images contain many objects with complex multi-attribute annotations (category, material, colour, application, grade, etc.) — making manual annotation impractical.
S3 Images → YOLO/SAM2 Segmentation → Object Cropping → Embedding Matching → Label Assignment → COCO/CVAT Export
flowchart LR
A[AWS S3 Bucket] -->|Download Images| B[data_acquisition]
B --> C[segmentation_pipeline]
C -->|YOLO Masks / SAM2 Polygons| D[Object Crops]
D --> E[auto_labeling]
E -->|PE Embeddings + FAISS| F[Portfolio JSONs]
F --> G[label_mapping]
G -->|Category + Attribute Mapping| H[COCO JSON with Attributes]
H --> I[format_converters]
I -->|Datumaro / CVAT XML| J[CVAT Import Ready]
Auto annotation/
│
├── data_acquisition/ # Download images from AWS S3
│ ├── aws_s3.py # Basic S3 downloader with credentials
│ ├── aws_s3v2.py # S3 downloader with JSONL exclusion list
│ └── Umbergaun.py # HTTP-based image scraper (public bucket)
│
├── segmentation_pipeline/ # Object detection & mask generation
│ ├── pipeline_main.py # 🔥 Main orchestrator (runs all steps)
│ ├── pipeline.py # SAM2 multi-worker pipeline with ROI
│ ├── roi_sam2_pipeline.py # Production Virtual ROI + SAM2 pipeline
│ ├── auto_yolo_polygon.py # YOLOv8-seg → COCO polygon (with S3 download)
│ ├── auto_yolo_polygon1.py # Extended: streaming JSONL + OOM-safe
│ ├── auto_polygon_download.py # YOLOv8-seg → COCO + CVAT manifest + crop
│ ├── auto_polygon_download1.py # With resume, checkpoint, corruption scan
│ ├── auto_masking_yolo.py # YOLO mask-based segmentation
│ ├── sam_virtual.py # SAM2 with auto-detect vision box ROI
│ ├── sam_virtual_v2.py # SAM2 variant (v2)
│ ├── sam_virtual_v3.py # SAM2 variant (v3)
│ ├── sam_virtual_v4.py # SAM2 variant (v4)
│ ├── virtualroisam.py # Virtual ROI + SAM pipeline
│ ├── yolomask.py # YOLO mask utilities
│ ├── sam2_coco_batch.py # SAM2 batch COCO generation
│ ├── roi.py # Interactive ROI selector (GUI)
│ ├── roi_batch.py # Batch ROI processing
│ ├── roi_detector.py # Automatic ROI detection
│ ├── manual_roi.py # Manual ROI input (headless)
│ └── verify_pipeline.py # Unit tests for geometry utils
│
├── auto_labeling/ # Embedding-based automatic label assignment
│ ├── auto_label.py # FAISS nearest-neighbor labeling (v1)
│ ├── auto_label_v2.py # Top-K voting with similarity threshold
│ ├── auto_label_v3.py # Real-time HNSW matcher (no pre-built index)
│ ├── auto_label_v4.py # Matcher with output organization
│ ├── Portfolio.py # Single-bank portfolio inference
│ ├── batch_portfolio.py # Multi-bank portfolio (parallel matching)
│ ├── batch_portfolio_v2.py # Portfolio v2 with improvements
│ ├── portfolio_inference_parallel.py # Parallel portfolio inference
│ └── matching_1bank.py # Single bank matching utility
│
├── label_mapping/ # Category & attribute normalization for CVAT
│ ├── mapping.py # Portfolio → COCO merge (multiple iterations)
│ ├── mapping_stats.py # Mapping statistics & diagnostics
│ ├── label_mapping.py # Label normalization rules
│ ├── label_updater.py # Batch label update utility
│ ├── final_annotation_mapping.py # Official CVAT category + attribute mapping
│ ├── final_mapping_v2.py # Mapping variant (v2)
│ ├── final_mapping_v3.py # Mapping variant (v3)
│ ├── update_annotation.py # Annotation update utility
│ ├── annotation_cleaning.py # Clean up annotations
│ ├── annotation_clean_object.py # Object-level annotation cleanup
│ └── redistribute_annotation_jobwise.py # Split annotations into CVAT jobs
│
├── format_converters/ # Annotation format conversion tools
│ ├── coco_to_datumaro.py # COCO → Datumaro JSON conversion
│ ├── coco_datumaro.py # Datumaro format handler
│ ├── coco_structure.py # COCO JSON structure validator
│ ├── datumaro_convertor.py # Datumaro conversion utility
│ └── clean_coco_yolo.py # Clean COCO for YOLO compatibility
│
├── colour_classification/ # Train & run colour classifiers
│ ├── colour_classifier.py # ResNet18 colour classifier trainer
│ ├── colour_classifier2.py # Colour classifier variant
│ ├── train_classifier.py # ResNet101 with class-aware augmentation
│ └── train_resnet50.py # ResNet50 colour classifier
│
├── embedding_tools/ # Reference bank building & embedding utilities
│ ├── embedding_extraction.py # Extract PE embeddings from reference images
│ ├── embedding_extraction_2.py # Embedding extraction (v2)
│ ├── embedding_visualization.py # t-SNE / UMAP embedding visualization
│ ├── refrence_embeddingbuilder.py # Build reference embedding index
│ ├── bank_building.py # Multi-attribute bank builder
│ ├── bank_loader.py # Bank loading utility
│ ├── build_single_bank.py # Build single attribute bank
│ ├── build_single_class_bank.py # Build single class bank
│ └── bluehd_matching.py # Blue HD specific matching
│
├── visualization/ # Annotation & result visualization
│ ├── visualize.py # COCO polygon visualization overlay
│ ├── visualize1.py # Visualization variant
│ └── visualize_label.py # Label-aware visualization
│
├── utilities/ # Helper scripts & image processing
│ ├── data_prepration.py # Crop annotations from Label Studio JSON
│ ├── image_croping.py # Image cropping utilities
│ ├── cropping_coco.py # COCO-based image cropping
│ ├── crop_yolo_mask.py # Crop using YOLO masks
│ ├── label_crop.py # Label-based cropping
│ ├── folder_flat.py # Flatten folder structure
│ ├── interactive_selector.py # Interactive image selection tool
│ ├── version_5.py # Pipeline version 5 iteration
│ ├── testing.py # Test/debug scripts
│ └── testing1.py # Test/debug scripts
│
├── model_weights/ # Pre-trained model checkpoints
│ ├── best.pt # YOLOv8m-seg best weights (~55 MB)
│ └── sam2.1_l.pt # SAM 2.1 Large checkpoint (~449 MB)
│
├── setup/ # Environment setup scripts
│ ├── install_sam.sh # SAM2 + venv setup (CUDA 12.x)
│ └── Miniconda3-latest-Linux-x86_64.sh # Miniconda installer
│
├── data_outputs/ # Generated output data files
│ ├── auto_labeled_objects.csv # Auto-labeled results
│ ├── colorhd_matched_as_bluehd.txt # Colour matching results
│ └── colour # Colour data file
│
├── ai-data-engine/ # Separate AI Data Engine project (Docker-based)
│ ├── backend/ # Backend API
│ ├── frontend/ # Frontend UI
│ ├── inference-worker/ # ML inference workers
│ ├── workers/ # Background workers
│ ├── deploy/ # Deployment configs
│ ├── docker-compose.yml # Docker orchestration
│ └── docs/ # Documentation
│
└── README.md # ← You are here
| Requirement | Version |
|---|---|
| Python | 3.10+ |
| CUDA | 12.x |
| PyTorch | 2.4+ |
| GPU | NVIDIA (CUDA-capable) |
# Option A: Use the provided install script (Linux)
chmod +x setup/install_sam.sh
bash setup/install_sam.sh
# Option B: Manual setup
python3.10 -m venv venv
source venv/bin/activate
pip install torch torchvision --index-url https://download.pytorch.org/whl/cu121
pip install ultralytics opencv-python-headless pycocotools faiss-cpu boto3 supervision tqdm pandas pillowCreate a .env file in the project root:
AWS_ACCESS_KEY_ID=your_access_key
AWS_SECRET_ACCESS_KEY=your_secret_key# Full pipeline (S3 download → YOLO segmentation → Portfolio inference)
python segmentation_pipeline/pipeline_main.pyOr run individual steps:
# Step 1: Download images from S3
python data_acquisition/aws_s3v2.py
# Step 2: YOLO segmentation → COCO polygons + crops
python segmentation_pipeline/auto_polygon_download1.py
# Step 3: Portfolio inference (auto-label assignment)
python auto_labeling/batch_portfolio.py
# Step 4: Map labels to official CVAT categories
python label_mapping/final_annotation_mapping.py
# Step 5: Convert to Datumaro format for CVAT import
python format_converters/coco_to_datumaro.pyImages are downloaded from AWS S3 buckets (e.g., wi-dataset bucket) with date-range filtering and parallel download support. Previously annotated images are excluded via JSONL exclusion lists to avoid re-processing.
Two segmentation backends are supported:
- YOLOv8m-seg: Instance segmentation producing polygon masks in COCO format
- SAM2 (Segment Anything Model 2): Zero-shot segmentation with Virtual ROI calibration
A Virtual ROI system restricts segmentation to a fixed conveyor-belt region within the frame, avoiding background noise. The ROI can be set interactively (GUI) or programmatically.
Cropped objects are matched against pre-built reference embedding banks using:
- PE (Perception Encoder) CLIP backbone for feature extraction
- FAISS HNSW index for efficient nearest-neighbor search
- Top-K voting with similarity thresholds for robust label assignment
- Multi-bank portfolio matching — each bank handles one attribute dimension (category, material, colour, application, grade, etc.)
Portfolio results are merged into the COCO JSON with official CVAT category IDs and normalized attribute values. The system supports 51 object categories and 9 attribute dimensions including Material, Colour, Application, Grade, Cap Material, Feature, Generic Material Type, and Damage.
Final annotations are converted to CVAT-compatible formats:
- COCO JSON (with polygon segmentation + attributes)
- Datumaro JSON (native CVAT format)
- CVAT manifest.jsonl (for cloud-storage-backed tasks)
| ID | Category | ID | Category |
|---|---|---|---|
| 1 | Unknown | 14 | Shiny Wrapper |
| 2 | Bottle | 15 | Wrapper |
| 3 | Tray | 23 | Cup |
| 4 | Mixed | 25 | Jar |
| 5 | Tray (merged with lid) | 37 | Cap |
| 6 | Can | 41 | Bucket |
| 7 | Film | 49 | Container |
| 50 | Glass | 51 | Object |
| Attribute | Example Values |
|---|---|
| Material | PET, HDPE, PP, PS, HIPS, MLP, Glass, Paper, Steel, Aluminum |
| Colour | Clear transparent, White opaque, Blue opaque, Mixed, Coloured |
| Application | Drinking Water, Milk Packaging, Toilet cleaner, Shampoo, Edible Oil |
| General Application | Hair care, Skin care, Bathroom care, Kitchen Essential, Food |
| Grade | Foodgrade, Non-foodgrade |
| Cap Material | Plastic, Metal |
| Feature | Cylinder, Circular, Rectangle, Flat, Irregular |
| Generic Material Type | Rigid, Flexible |
| Damaged | Undamaged, Broken |
| Component | Technology |
|---|---|
| Object Detection | YOLOv8m-seg (Ultralytics) |
| Mask Generation | SAM 2.1 (Segment Anything Model 2) |
| Feature Extraction | PE-Core CLIP (B16-224 / L14-336) |
| Similarity Search | FAISS (HNSW index) |
| Colour Classification | ResNet-18 / ResNet-101 (fine-tuned) |
| Annotation Format | COCO JSON, Datumaro, CVAT |
| Cloud Storage | AWS S3 (boto3) |
| GPU Acceleration | CUDA + PyTorch AMP (float16) |
- Model weights (
model_weights/) contain large files (~500 MB total) — these should be downloaded separately or tracked with Git LFS. - Hardcoded paths — Many scripts contain hardcoded Linux paths (e.g.,
/home/wi/Avinash_Works/...). Update these to match your environment before running. - ROI Configuration — The Virtual ROI coordinates are calibrated for specific camera setups (Umbergaum 1 & 2). Recalibrate using
roi.pyfor new camera positions. - Scripts contain multiple commented-out iterations preserving the development history of each module.
Internal / Proprietary — not for public distribution.