Skip to content

feat: add show_progress option to dataset loaders and savers#2181

Open
satishkc7 wants to merge 4 commits intoroboflow:developfrom
satishkc7:feat/progress-bars-dataset-ops
Open

feat: add show_progress option to dataset loaders and savers#2181
satishkc7 wants to merge 4 commits intoroboflow:developfrom
satishkc7:feat/progress-bars-dataset-ops

Conversation

@satishkc7
Copy link
Copy Markdown

Summary

Closes #183

Adds an optional show_progress parameter to all time-consuming dataset operations so users can see loading/saving progress via a tqdm progress bar.

  • DetectionDataset.from_coco(show_progress=True)
  • DetectionDataset.from_pascal_voc(show_progress=True)
  • DetectionDataset.from_yolo(show_progress=True)
  • DetectionDataset.as_coco(show_progress=True)
  • DetectionDataset.as_yolo(show_progress=True)
  • DetectionDataset.as_pascal_voc(show_progress=True)

Details

  • Defaults to False - fully backward compatible, no existing code breaks
  • Uses tqdm.auto (already a project dependency) so progress bars work correctly in both terminal and Jupyter notebook environments
  • The parameter is propagated from the public DetectionDataset methods down to the internal format loader/saver functions

Test plan

  • Load a COCO dataset with show_progress=True and confirm the bar renders
  • Load a YOLO dataset with show_progress=True and confirm the bar renders
  • Load a Pascal VOC dataset with show_progress=True and confirm the bar renders
  • Confirm existing tests pass with no changes (default show_progress=False)

@satishkc7 satishkc7 requested a review from SkalskiP as a code owner March 24, 2026 20:32
@CLAassistant
Copy link
Copy Markdown

CLAassistant commented Mar 24, 2026

CLA assistant check
All committers have signed the CLA.

Add optional tqdm progress bars to all time-consuming dataset operations.
Addresses roboflow#183.

- load_coco_annotations / DetectionDataset.from_coco
- load_pascal_voc_annotations / DetectionDataset.from_pascal_voc
- load_yolo_annotations / DetectionDataset.from_yolo
- save_dataset_images / DetectionDataset.as_coco / as_yolo / as_pascal_voc

The show_progress parameter defaults to False for full backward
compatibility. Uses tqdm.auto so progress bars work in both terminal
and Jupyter notebook environments.
@satishkc7 satishkc7 force-pushed the feat/progress-bars-dataset-ops branch from b9abdbd to c9201d0 Compare March 24, 2026 21:16
@Borda Borda requested a review from Copilot March 26, 2026 13:11
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 26, 2026

Codecov Report

❌ Patch coverage is 50.84746% with 29 lines in your changes missing coverage. Please review.
✅ Project coverage is 76%. Comparing base (d94db74) to head (c9201d0).
⚠️ Report is 1 commits behind head on develop.

❌ Your patch check has failed because the patch coverage (51%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage.
❌ Your project check has failed because the head coverage (76%) is below the target coverage (95%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@           Coverage Diff           @@
##           develop   #2181   +/-   ##
=======================================
- Coverage       76%     76%   -0%     
=======================================
  Files           62      62           
  Lines         7547    7561   +14     
=======================================
+ Hits          5714    5722    +8     
- Misses        1833    1839    +6     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds an optional show_progress: bool = False flag to dataset load/export APIs so long-running dataset operations can display a tqdm progress bar (via tqdm.auto) without breaking existing callers.

Changes:

  • Add show_progress to DetectionDataset.from_* and DetectionDataset.as_* public methods and propagate it into format-specific loaders.
  • Wrap COCO/YOLO/Pascal VOC annotation loading loops with tqdm progress bars (disabled by default).
  • Add show_progress support to save_dataset_images (image export) with a tqdm progress bar.

Reviewed changes

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
src/supervision/dataset/utils.py Adds show_progress to save_dataset_images and wraps image export in tqdm.
src/supervision/dataset/formats/yolo.py Adds show_progress to YOLO loader and wraps per-image annotation processing in tqdm.
src/supervision/dataset/formats/pascal_voc.py Adds show_progress to Pascal VOC loader and wraps per-image annotation processing in tqdm.
src/supervision/dataset/formats/coco.py Adds show_progress to COCO loader and wraps per-image annotation processing in tqdm.
src/supervision/dataset/core.py Exposes show_progress on DetectionDataset.from_*/as_* and forwards it into image saving + loaders.
Comments suppressed due to low confidence (3)

src/supervision/dataset/core.py:567

  • show_progress is only applied to save_dataset_images(...) here; exporting annotations (and even data.yaml) can be the slow part for large datasets, and currently has no progress indication. Consider propagating show_progress into save_yolo_annotations/save_data_yaml (and adding tqdm there), or rename/clarify the parameter/docs so users don’t expect progress when only saving annotations.
        if images_directory_path is not None:
            save_dataset_images(
                dataset=self,
                images_directory_path=images_directory_path,
                show_progress=show_progress,
            )
        if annotations_directory_path is not None:
            save_yolo_annotations(
                dataset=self,
                annotations_directory_path=annotations_directory_path,
                min_image_area_percentage=min_image_area_percentage,
                max_image_area_percentage=max_image_area_percentage,
                approximation_percentage=approximation_percentage,
            )
        if data_yaml_path is not None:
            save_data_yaml(data_yaml_path=data_yaml_path, classes=self.classes)

src/supervision/dataset/core.py:680

  • show_progress is only wired to image saving; save_coco_annotations(...) can be time-consuming on large datasets but gets no progress indication. Consider adding show_progress support to save_coco_annotations (and forwarding it from here) so as_coco(show_progress=True) provides feedback even when only saving annotations.
        if images_directory_path is not None:
            save_dataset_images(
                dataset=self,
                images_directory_path=images_directory_path,
                show_progress=show_progress,
            )
        if annotations_path is not None:
            save_coco_annotations(
                dataset=self,
                annotation_path=annotations_path,
                min_image_area_percentage=min_image_area_percentage,
                max_image_area_percentage=max_image_area_percentage,
                approximation_percentage=approximation_percentage,
            )

src/supervision/dataset/core.py:388

  • as_pascal_voc(..., show_progress=...) currently only shows progress for image saving; when exporting only annotations (or when annotations dominate runtime), users won’t see any progress. Consider wrapping the annotations export loop in a tqdm controlled by show_progress (or adjusting the parameter/docs to reflect that it only applies to image saving).
        if images_directory_path:
            save_dataset_images(
                dataset=self,
                images_directory_path=images_directory_path,
                show_progress=show_progress,
            )
        if annotations_directory_path:
            Path(annotations_directory_path).mkdir(parents=True, exist_ok=True)
            for image_path, image, annotations in self:
                annotation_name = Path(image_path).stem
                annotations_path = os.path.join(
                    annotations_directory_path, f"{annotation_name}.xml"
                )
                image_name = Path(image_path).name
                pascal_voc_xml = detections_to_pascal_voc(
                    detections=annotations,
                    classes=self.classes,
                    filename=image_name,
                    image_shape=image.shape,
                    min_image_area_percentage=min_image_area_percentage,
                    max_image_area_percentage=max_image_area_percentage,
                    approximation_percentage=approximation_percentage,
                )

                with open(annotations_path, "w") as f:
                    f.write(pascal_voc_xml)

Comment on lines 224 to 225
f"Images must be 'RGB' or 'grayscale', \
but {image_path} mode is '{image.mode}'."
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ValueError message here includes a literal backslash + newline/indentation because the string is split across lines ("..., \n but ..."). This makes the error message noisy and hard to read; please format this as a single-line string (or use implicit string concatenation without embedding a backslash/newline).

Suggested change
f"Images must be 'RGB' or 'grayscale', \
but {image_path} mode is '{image.mode}'."
f"Images must be 'RGB' or 'grayscale', but {image_path} mode is '{image.mode}'."

Copilot uses AI. Check for mistakes.
Comment on lines +218 to +226
image = Image.open(image_path)
lines = read_txt_file(file_path=annotation_path, skip_empty=True)
w, h = image.size
resolution_wh = (w, h)
if image.mode not in ("RGB", "L"):
raise ValueError(
f"Images must be 'RGB' or 'grayscale', \
but {image_path} mode is '{image.mode}'."
)
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Image.open(image_path) keeps the underlying file handle open until the image is closed. In a large dataset loop this can exhaust file descriptors; use a context manager (with Image.open(...) as img:) or explicitly close the image after reading size/mode.

Suggested change
image = Image.open(image_path)
lines = read_txt_file(file_path=annotation_path, skip_empty=True)
w, h = image.size
resolution_wh = (w, h)
if image.mode not in ("RGB", "L"):
raise ValueError(
f"Images must be 'RGB' or 'grayscale', \
but {image_path} mode is '{image.mode}'."
)
lines = read_txt_file(file_path=annotation_path, skip_empty=True)
with Image.open(image_path) as image:
w, h = image.size
resolution_wh = (w, h)
if image.mode not in ("RGB", "L"):
raise ValueError(
f"Images must be 'RGB' or 'grayscale', \
but {image_path} mode is '{image.mode}'."
)

Copilot uses AI. Check for mistakes.
Comment on lines 569 to 576
@classmethod
def from_coco(
cls,
images_directory_path: str,
annotations_path: str,
force_masks: bool = False,
show_progress: bool = False,
) -> DetectionDataset:
Copy link

Copilot AI Mar 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New show_progress code paths (both loaders and exporters) aren’t covered by tests. Please add unit/integration tests that call from_coco/from_yolo/from_pascal_voc and as_coco/as_yolo/as_pascal_voc with show_progress=True (ideally by monkeypatching tqdm.auto.tqdm to a no-op) to ensure the option doesn’t change outputs or raise exceptions.

Copilot generated this review using guidance from repository custom instructions.
@Borda Borda self-assigned this Mar 26, 2026
satishkc7 and others added 3 commits March 26, 2026 09:35
- Add show_progress param to save_yolo_annotations and save_coco_annotations
- Wrap Pascal VOC annotation export loop in as_pascal_voc with tqdm
- Pass show_progress through as_yolo and as_coco to their annotation savers
- Add test_show_progress.py covering all loaders and savers with both
  show_progress=True and show_progress=False
@satishkc7
Copy link
Copy Markdown
Author

The Codecov report (51% patch coverage) is from the first commit and is now stale.

Since then, two follow-up commits were pushed:

  • Propagated show_progress into all annotation savers: save_yolo_annotations, save_coco_annotations, and the Pascal VOC annotation loop in as_pascal_voc
  • Added tests/dataset/test_show_progress.py with 20 tests covering every loader and saver (both show_progress=True and show_progress=False, plus output consistency checks for each)

All 20 tests pass. Could you re-trigger CI so Codecov picks up the updated coverage? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Show Progress in time consuming tasks

4 participants