Skip to content

An explanation of how to store all the datasets in our team

License

Notifications You must be signed in to change notification settings

CV-AC/Dataset_management

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

25 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

The Oulu CMV DATASET managements empowers our group members to easily access and visit all the existing datasets (both our own datasets and other popular benchmark datasets) rooted at the CSC platform ⭐⭐⭐⭐⭐.

1. Introduction

We store the datasets on the CSC platform in two servers (Puhti/Mahti and Allas) 🐧

a. Allas for cold storage

Allas is for long-term storage/backup. The data won't be deleted that easily; unstructured data is accepted (you can put all your trash here).

b. Puhti/Mahti for hot storage

Puhti/Mahti is for handy access for the running experiments. The data WILL be deleted easily (90 days); unstructured data is NOT accepted (big chunk file is better); fast access

Tips

  • Basically, you only need to upload your datasets to Allas server, and you can pull the data from Allas to Mahti/Puhti whenever you need to.
  • Make sure that you pull your datasets from Allas to Puhti or Mahti under the same subfoler (make the structure well-orgnized).
  • Converting between Allas and Puhti/Mahti

2. Dataset storage arrangement

So far, we have applied two projects to store the datasets (MVG_dataset1 and MVG_dataset2); we will apply for more depending on the final space used.

  • MVG_dataset1 -> face
  • MVG_dataset2 -> body
  • MVG_dataset3 -> General
  • MVG_dataset4 -> EmotionAI

Below is the detailed data arrangement:

a. MVG_dataset1

Project name: MVG_dataset1

Project ID: project_2009201

Content: face datasets

Puhti address:/scratch/project_2009201/

Allas address:project_2009201:default:username

Folder hierarchy (people in charge):

.
β”œβ”€β”€ FER (Xingxun)
β”‚   └── ...
β”œβ”€β”€ rPPG (Marko)
β”‚   └── ...
β”œβ”€β”€ Antispoofing (Marko)
β”‚   └── ...
β”œβ”€β”€ Forgery (Yang)
β”‚   └── ...
β”œβ”€β”€ Others (Haotian)
β”‚   └── ...
└── MER (Yante)
    β”œβ”€β”€ SMIC
    └── CASME

b. MVG_dataset2

Project name: MVG_dataset2

Project ID: project_2009202

Content: body datasets

Puhti address:/scratch/project_2009202/

Allas address:project_2009202:default:username

Folder hierarchy (people in charge):

.

β”œβ”€β”€ Action (Atif)
β”‚   └── ...
β”œβ”€β”€ Gesture (Atif)
β”‚   └── ...
β”œβ”€β”€ Other (Atif)
β”‚   └── ...
└── MGs (Haoyu)
    β”œβ”€β”€ iMiGUE 
    └── SMG (70G)

d. MVG_dataset3

Project name: MVG_dataset3

Project ID: project_2003455

Content: general datasets

Puhti address:/scratch/project_2003455/

Allas address:project_2003455:default:username

Folder hierarchy (people in charge):

.
β”œβ”€β”€ Image  (Zhuo)
    β”œβ”€β”€ ImageNet (150G)
    └── CIFAR100
β”œβ”€β”€ others
β”‚   └── ...

c. MVG_dataset4 (TBD)

Project name: MVG_dataset4

Project ID: project_2009204

Content: EmotionAI-related datasets

Folder hierarchy (people in charge):

.
β”œβ”€β”€ EmotionAI (Hanlin)
β”‚   └── ...
β”œβ”€β”€ Sundown (Qianru)
β”‚   └── ...

3. TO DO LIST

  • Basic structure of the dataset management
  • Assign the correspondent of each project space
  • Finalize the dataset folder structure
  • Confirm all the dataset sizes
  • Complete the dataset credits
  • Upload all the datasets to Allas
  • Upload all the datasets to Puhti/Mahti(/scratch)
  • Apply for extra space for extremely large datasets, like EmotionAI
  • Add more documents

4. Maintenance instructions

  • πŸ‘‰ If you need to change the high-level dataset structure (like Face, General), please contact chen.haoyu@oulu.fi so we can update this README to make it consistent.
  • πŸš€ Do with BIG file chunk NEVER upload a huge amount of files; instead, upload a big compressed file (you can process the data later in your own scratch space)
  • ⬆️ Updating of both servers? You don't need to make the content of the two server exactly the same, but make sure the folder level arrangement is consistent.
  • πŸ—³ GDPR regulation Basically you don't need to take extra actions (the GDPR form was filled out when applying for the space)

5. Dataset credits

Please add or update the dataset links you are in charge of here.

About

An explanation of how to store all the datasets in our team

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published