feat: add show_progress option to dataset loaders and savers#2181
feat: add show_progress option to dataset loaders and savers#2181satishkc7 wants to merge 4 commits intoroboflow:developfrom
Conversation
Add optional tqdm progress bars to all time-consuming dataset operations. Addresses roboflow#183. - load_coco_annotations / DetectionDataset.from_coco - load_pascal_voc_annotations / DetectionDataset.from_pascal_voc - load_yolo_annotations / DetectionDataset.from_yolo - save_dataset_images / DetectionDataset.as_coco / as_yolo / as_pascal_voc The show_progress parameter defaults to False for full backward compatibility. Uses tqdm.auto so progress bars work in both terminal and Jupyter notebook environments.
b9abdbd to
c9201d0
Compare
Codecov Report❌ Patch coverage is ❌ Your patch check has failed because the patch coverage (51%) is below the target coverage (95%). You can increase the patch coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #2181 +/- ##
=======================================
- Coverage 76% 76% -0%
=======================================
Files 62 62
Lines 7547 7561 +14
=======================================
+ Hits 5714 5722 +8
- Misses 1833 1839 +6 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Pull request overview
Adds an optional show_progress: bool = False flag to dataset load/export APIs so long-running dataset operations can display a tqdm progress bar (via tqdm.auto) without breaking existing callers.
Changes:
- Add
show_progresstoDetectionDataset.from_*andDetectionDataset.as_*public methods and propagate it into format-specific loaders. - Wrap COCO/YOLO/Pascal VOC annotation loading loops with
tqdmprogress bars (disabled by default). - Add
show_progresssupport tosave_dataset_images(image export) with atqdmprogress bar.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/supervision/dataset/utils.py | Adds show_progress to save_dataset_images and wraps image export in tqdm. |
| src/supervision/dataset/formats/yolo.py | Adds show_progress to YOLO loader and wraps per-image annotation processing in tqdm. |
| src/supervision/dataset/formats/pascal_voc.py | Adds show_progress to Pascal VOC loader and wraps per-image annotation processing in tqdm. |
| src/supervision/dataset/formats/coco.py | Adds show_progress to COCO loader and wraps per-image annotation processing in tqdm. |
| src/supervision/dataset/core.py | Exposes show_progress on DetectionDataset.from_*/as_* and forwards it into image saving + loaders. |
Comments suppressed due to low confidence (3)
src/supervision/dataset/core.py:567
show_progressis only applied tosave_dataset_images(...)here; exporting annotations (and evendata.yaml) can be the slow part for large datasets, and currently has no progress indication. Consider propagatingshow_progressintosave_yolo_annotations/save_data_yaml(and adding tqdm there), or rename/clarify the parameter/docs so users don’t expect progress when only saving annotations.
if images_directory_path is not None:
save_dataset_images(
dataset=self,
images_directory_path=images_directory_path,
show_progress=show_progress,
)
if annotations_directory_path is not None:
save_yolo_annotations(
dataset=self,
annotations_directory_path=annotations_directory_path,
min_image_area_percentage=min_image_area_percentage,
max_image_area_percentage=max_image_area_percentage,
approximation_percentage=approximation_percentage,
)
if data_yaml_path is not None:
save_data_yaml(data_yaml_path=data_yaml_path, classes=self.classes)
src/supervision/dataset/core.py:680
show_progressis only wired to image saving;save_coco_annotations(...)can be time-consuming on large datasets but gets no progress indication. Consider addingshow_progresssupport tosave_coco_annotations(and forwarding it from here) soas_coco(show_progress=True)provides feedback even when only saving annotations.
if images_directory_path is not None:
save_dataset_images(
dataset=self,
images_directory_path=images_directory_path,
show_progress=show_progress,
)
if annotations_path is not None:
save_coco_annotations(
dataset=self,
annotation_path=annotations_path,
min_image_area_percentage=min_image_area_percentage,
max_image_area_percentage=max_image_area_percentage,
approximation_percentage=approximation_percentage,
)
src/supervision/dataset/core.py:388
as_pascal_voc(..., show_progress=...)currently only shows progress for image saving; when exporting only annotations (or when annotations dominate runtime), users won’t see any progress. Consider wrapping the annotations export loop in a tqdm controlled byshow_progress(or adjusting the parameter/docs to reflect that it only applies to image saving).
if images_directory_path:
save_dataset_images(
dataset=self,
images_directory_path=images_directory_path,
show_progress=show_progress,
)
if annotations_directory_path:
Path(annotations_directory_path).mkdir(parents=True, exist_ok=True)
for image_path, image, annotations in self:
annotation_name = Path(image_path).stem
annotations_path = os.path.join(
annotations_directory_path, f"{annotation_name}.xml"
)
image_name = Path(image_path).name
pascal_voc_xml = detections_to_pascal_voc(
detections=annotations,
classes=self.classes,
filename=image_name,
image_shape=image.shape,
min_image_area_percentage=min_image_area_percentage,
max_image_area_percentage=max_image_area_percentage,
approximation_percentage=approximation_percentage,
)
with open(annotations_path, "w") as f:
f.write(pascal_voc_xml)
| f"Images must be 'RGB' or 'grayscale', \ | ||
| but {image_path} mode is '{image.mode}'." |
There was a problem hiding this comment.
The ValueError message here includes a literal backslash + newline/indentation because the string is split across lines ("..., \n but ..."). This makes the error message noisy and hard to read; please format this as a single-line string (or use implicit string concatenation without embedding a backslash/newline).
| f"Images must be 'RGB' or 'grayscale', \ | |
| but {image_path} mode is '{image.mode}'." | |
| f"Images must be 'RGB' or 'grayscale', but {image_path} mode is '{image.mode}'." |
| image = Image.open(image_path) | ||
| lines = read_txt_file(file_path=annotation_path, skip_empty=True) | ||
| w, h = image.size | ||
| resolution_wh = (w, h) | ||
| if image.mode not in ("RGB", "L"): | ||
| raise ValueError( | ||
| f"Images must be 'RGB' or 'grayscale', \ | ||
| but {image_path} mode is '{image.mode}'." | ||
| ) |
There was a problem hiding this comment.
Image.open(image_path) keeps the underlying file handle open until the image is closed. In a large dataset loop this can exhaust file descriptors; use a context manager (with Image.open(...) as img:) or explicitly close the image after reading size/mode.
| image = Image.open(image_path) | |
| lines = read_txt_file(file_path=annotation_path, skip_empty=True) | |
| w, h = image.size | |
| resolution_wh = (w, h) | |
| if image.mode not in ("RGB", "L"): | |
| raise ValueError( | |
| f"Images must be 'RGB' or 'grayscale', \ | |
| but {image_path} mode is '{image.mode}'." | |
| ) | |
| lines = read_txt_file(file_path=annotation_path, skip_empty=True) | |
| with Image.open(image_path) as image: | |
| w, h = image.size | |
| resolution_wh = (w, h) | |
| if image.mode not in ("RGB", "L"): | |
| raise ValueError( | |
| f"Images must be 'RGB' or 'grayscale', \ | |
| but {image_path} mode is '{image.mode}'." | |
| ) |
| @classmethod | ||
| def from_coco( | ||
| cls, | ||
| images_directory_path: str, | ||
| annotations_path: str, | ||
| force_masks: bool = False, | ||
| show_progress: bool = False, | ||
| ) -> DetectionDataset: |
There was a problem hiding this comment.
New show_progress code paths (both loaders and exporters) aren’t covered by tests. Please add unit/integration tests that call from_coco/from_yolo/from_pascal_voc and as_coco/as_yolo/as_pascal_voc with show_progress=True (ideally by monkeypatching tqdm.auto.tqdm to a no-op) to ensure the option doesn’t change outputs or raise exceptions.
- Add show_progress param to save_yolo_annotations and save_coco_annotations - Wrap Pascal VOC annotation export loop in as_pascal_voc with tqdm - Pass show_progress through as_yolo and as_coco to their annotation savers - Add test_show_progress.py covering all loaders and savers with both show_progress=True and show_progress=False
…ssage in YOLO loader
|
The Codecov report (51% patch coverage) is from the first commit and is now stale. Since then, two follow-up commits were pushed:
All 20 tests pass. Could you re-trigger CI so Codecov picks up the updated coverage? Thanks! |
Summary
Closes #183
Adds an optional
show_progressparameter to all time-consuming dataset operations so users can see loading/saving progress via atqdmprogress bar.DetectionDataset.from_coco(show_progress=True)DetectionDataset.from_pascal_voc(show_progress=True)DetectionDataset.from_yolo(show_progress=True)DetectionDataset.as_coco(show_progress=True)DetectionDataset.as_yolo(show_progress=True)DetectionDataset.as_pascal_voc(show_progress=True)Details
False- fully backward compatible, no existing code breakstqdm.auto(already a project dependency) so progress bars work correctly in both terminal and Jupyter notebook environmentsDetectionDatasetmethods down to the internal format loader/saver functionsTest plan
show_progress=Trueand confirm the bar rendersshow_progress=Trueand confirm the bar rendersshow_progress=Trueand confirm the bar rendersshow_progress=False)