Paper · Dataset · Model Weights
Code and pre-trained model for "Decoding Viewer Emotions in Video Ads" (Antonov et al., Nature Scientific Reports, 2024). The Temporal Shift Augmented Module (TSAM) predicts viewers' emotional reactions to video advertisements from short 5-second excerpts, processing both video frames and audio.
pip install -r requirements.txtffmpeg is also required for preprocessing.
from huggingface_hub import snapshot_download
# Download dataset (video clips and CSV splits)
snapshot_download(
repo_id="dnamodel/adcumen-viewer-emotions",
repo_type="dataset",
local_dir="./adcumen-data"
)
# Download model weights
snapshot_download(
repo_id="dnamodel/tsam-viewer-emotions",
local_dir="./adcumen-data"
)python setup_data.py --input ./adcumen-data --workers 8This extracts video frames, audio, and model weights into the expected directory structure. Run python setup_data.py --help for all options.
python predict.py \
--data config/default.json \
--model weights \
--type test \
--id test_runPredictions are saved to ./data/predicted/test_run/.
The dataset contains 26,635 five-second video clips from video advertisements, annotated for eight emotional categories:
| Emotion | Total | Train | Validation | Test |
|---|---|---|---|---|
| Anger | 2,894 | 2,282 | 404 | 208 |
| Contempt | 3,317 | 2,581 | 367 | 369 |
| Disgust | 3,061 | 2,564 | 254 | 243 |
| Fear | 3,166 | 2,549 | 317 | 300 |
| Happiness | 3,577 | 2,918 | 383 | 276 |
| Neutral | 3,491 | 2,771 | 398 | 322 |
| Sadness | 3,576 | 2,886 | 346 | 344 |
| Surprise | 3,553 | 2,841 | 387 | 325 |
| Total | 26,635 | 21,392 | 2,856 | 2,387 |
Dataset: huggingface.co/datasets/dnamodel/adcumen-viewer-emotions
training.csv,validation.csv,testing.csv-- dataset splits with columns:Video_Name,Start_Second,Label,Clips_Name5-second_MP4_Clips.zip-- the 26,635 five-second video clips (MP4)
Model weights: huggingface.co/dnamodel/tsam-viewer-emotions
backbone_weights.tar-- ResNet50 backbone pre-trained on ImageNet-21Ktsam_weights.tar-- TSAM model checkpoint (best balanced accuracy)
.
├── setup_data.py # Preprocesses HuggingFace download
├── predict.py # Run inference with trained model
├── train.py # Train TSAM model
├── config/
│ └── default.json # Default config (relative paths)
├── lib/
│ ├── dataset/ # Data loading (video + audio)
│ ├── model/ # TSAM architecture
│ └── utils/ # Training utilities
├── mvlib/ # Video processing library
├── DataAdcumen/ # Split files and VDB
├── requirements.txt
└── LICENCE
- Python 3.10+
- PyTorch 2.5+
- ffmpeg (system install required for both preprocessing and audio loading)
- CUDA-capable GPU (for inference)
See requirements.txt for Python packages.
python train.py \
--config config/default.json \
--cuda_ids 0 \
--run_id my_experimentThe dataset leverages System1's proprietary "Test Your Ad" tool for public, educational, and illustrative use. The advertisements and excerpts, while derived from System1's tool, remain the property of their original owners. Usage beyond this study's scope requires explicit permission from those owners. By accessing the dataset, you agree to these conditions.
The TSAM software and associated documentation are made available under a custom license that permits use solely for academic research and non-commercial evaluation. See LICENCE for full terms. For commercial use inquiries, contact Warwick Ventures at ventures@warwick.ac.uk.
@article{antonov2024decoding,
title={Decoding viewer emotions in video ads},
author={Antonov, Alexey and Kumar, Shravan Sampath and Wei, Jiefei and Headley, William and Wood, Orlando and Montana, Giovanni},
journal={Scientific Reports},
volume={14},
pages={25680},
year={2024},
publisher={Nature Publishing Group}
}For questions, suggestions, or collaborations, please contact Giovanni Montana at g.montana@warwick.ac.uk.
