The Oulu CMV DATASET managements empowers our group members to easily access and visit all the existing datasets (both our own datasets and other popular benchmark datasets) rooted at the CSC platform βββββ.
We store the datasets on the CSC platform in two servers (Puhti/Mahti and Allas) π§
a. Allas for cold storage
Allas is for long-term storage/backup. The data won't be deleted that easily; unstructured data is accepted (you can put all your trash here).
b. Puhti/Mahti for hot storage
Puhti/Mahti is for handy access for the running experiments. The data WILL be deleted easily (90 days); unstructured data is NOT accepted (big chunk file is better); fast access
Tips
- Basically, you only need to upload your datasets to Allas server, and you can pull the data from Allas to Mahti/Puhti whenever you need to.
- Make sure that you pull your datasets from Allas to Puhti or Mahti under the same subfoler (make the structure well-orgnized).
- Converting between Allas and Puhti/Mahti
So far, we have applied two projects to store the datasets (MVG_dataset1 and MVG_dataset2); we will apply for more depending on the final space used.
- MVG_dataset1 -> face
- MVG_dataset2 -> body
- MVG_dataset3 -> General
- MVG_dataset4 -> EmotionAI
Below is the detailed data arrangement:
a. MVG_dataset1
Project name: MVG_dataset1
Project ID: project_2009201
Content: face datasets
Puhti address:/scratch/project_2009201/
Allas address:project_2009201:default:username
Folder hierarchy (people in charge):
.
βββ FER (Xingxun)
β βββ ...
βββ rPPG (Marko)
β βββ ...
βββ Antispoofing (Marko)
β βββ ...
βββ Forgery (Yang)
β βββ ...
βββ Others (Haotian)
β βββ ...
βββ MER (Yante)
βββ SMIC
βββ CASME
b. MVG_dataset2
Project name: MVG_dataset2
Project ID: project_2009202
Content: body datasets
Puhti address:/scratch/project_2009202/
Allas address:project_2009202:default:username
Folder hierarchy (people in charge):
.
βββ Action (Atif)
β βββ ...
βββ Gesture (Atif)
β βββ ...
βββ Other (Atif)
β βββ ...
βββ MGs (Haoyu)
βββ iMiGUE
βββ SMG (70G)
d. MVG_dataset3
Project name: MVG_dataset3
Project ID: project_2003455
Content: general datasets
Puhti address:/scratch/project_2003455/
Allas address:project_2003455:default:username
Folder hierarchy (people in charge):
.
βββ Image (Zhuo)
βββ ImageNet (150G)
βββ CIFAR100
βββ others
β βββ ...
c. MVG_dataset4 (TBD)
Project name: MVG_dataset4
Project ID: project_2009204
Content: EmotionAI-related datasets
Folder hierarchy (people in charge):
.
βββ EmotionAI (Hanlin)
β βββ ...
βββ Sundown (Qianru)
β βββ ...
- Basic structure of the dataset management
- Assign the correspondent of each project space
- Finalize the dataset folder structure
- Confirm all the dataset sizes
- Complete the dataset credits
- Upload all the datasets to Allas
- Upload all the datasets to Puhti/Mahti(/scratch)
- Apply for extra space for extremely large datasets, like EmotionAI
- Add more documents
- π If you need to change the high-level dataset structure (like Face, General), please contact chen.haoyu@oulu.fi so we can update this README to make it consistent.
- π Do with BIG file chunk NEVER upload a huge amount of files; instead, upload a big compressed file (you can process the data later in your own scratch space)
- β¬οΈ Updating of both servers? You don't need to make the content of the two server exactly the same, but make sure the folder level arrangement is consistent.
- π³ GDPR regulation Basically you don't need to take extra actions (the GDPR form was filled out when applying for the space)
Please add or update the dataset links you are in charge of here.