Skip to content
#

dataset-quality

Here are 15 public repositories matching this topic...

Industrial computer vision workflow for welding defect inspection using YOLO, OpenCV preprocessing, dataset QA, threshold governance, and edge-readiness analysis.

  • Updated May 1, 2026
  • Jupyter Notebook

GenProof detects model collapse risk in pre-training datasets before training begins. It combines semantic entropy, tail-density, and AI detection into a composite probability score (ICS). Built with FastAPI and scikit-learn to help ensure data quality and compliance.

  • Updated Apr 27, 2026
  • TeX

(WIP): 'Aporia' in Greek means 'inconsistent'. A Python library that detects and fixes dataset issues using both rule-based methods and ML models. It evaluates dataset quality across multiple metrics, including missing values, duplicates, outliers, class imbalance, and label consistency. It also suggests fixes based on the metric scores.

  • Updated Mar 28, 2025
  • Jupyter Notebook

How much labeled data do you actually need to deploy a parking occupancy system at a never-before-seen lot? A supervision study spanning CLIP zero-shot → ResNet-18 few-shot → full supervision on 432k parking space crops, with dataset annotation error discovery. Trained on NVIDIA A100 via IU Big Red 200.

  • Updated Apr 6, 2026
  • Python

The Dataset Quality Scoring Engine (DQS) evaluates the quality of any dataset using automated, model-agnostic metrics. The system processes user-uploaded datasets, computes embeddings, analyzes statistical and semantic properties, and outputs a standardized quality score

  • Updated Apr 8, 2026
  • JavaScript

Improve this page

Add a description, image, and links to the dataset-quality topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the dataset-quality topic, visit your repo's landing page and select "manage topics."

Learn more