Skip to content

Load Image Dataset interface #4

@NapsterInBlue

Description

@NapsterInBlue

Alright, Round 2!

Main reason I wanted to build out this library a few weeks ago was to have a streamlined way to:

  • Quickly label arbitrarily-many images We're killing it here
  • Have a simple interface for loading all of your image data for ML purposes

Proposed Flow

After using the quickLabel CLI and spitting out our resulting labels.csv, we should be able to use that file to create a one-call loader for all of our data.

Maybe something like

>> from quickLabel.data.loader import load_data
>> PATH = '/usr/my_proj/data/labels.csv'
>> X, y = load_data(PATH)
>> X.shape
(NUM_IMAGES, X_DIM, Y_DIM, 3)
>> y.shape
(NUM_IMAGES,)

And then you're off doing whatever keras/pytorch/sklearn/etc implementation you're used to doing.

Particulars

This would essentially mean creating a file under quickLabel/data called loader.py that

  • Creates an X and y of type np.array
  • Iterates through each record of the .csv and per-row:
    • Loads up the image in PIL for X (handling the BGR → RGB conversion)
    • Appends the label value to y
  • Finally returning the two to the user

One Hangup

How do we want to handle variable-sized images?

For instance, say our data is of all shapes and sizes-- rectangles, squares, similar shapes but different resolutions, etc.

I see three possible solutions, but they both involve a preliminary scan through the data (before loading anything into X or y) to get some max_X, min_X, max_Y, min_Y values, then we use these to:

  1. Upscale all images to the max using the same function you called here
  2. Downscale all images the same way
  3. Determine the max dimensions in both directions and just pad everything to fit the space
    • (This is the one I'm considering most)

Or any other ideas you might have here


I'm happy to knock this out over the next week or so, but want to make sure you think this is a good idea before I dive in.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions