Skip to content

annitziak/sunweb_thesis

Repository files navigation

LEVERAGING CLUSTERING FOR BOOKING PATTERN FORECASTING: A CASE STUDY

Table of Contents

Introduction

This thesis project focuses on evaluating various models for forecasting booking patterns, utilizing data from Sunweb, a leading holiday package provider in Europe. The core objective is to determine the effectiveness of incorporating clustering into forecasting models. The rationale behind clustering is straightforward: by training models at the cluster level, where more homogeneous booking patterns exist, models can better capture underlying trends. By conducting a comparative analysis, I aim to assess the value added by clustering compared to traditional forecasting models. Therefore, this study will evaluate the performance of ensemble models that integrate information from clustering alongside other forecasting methods and will be compared to a benchmark model.

Research Question: To what extent can clustering techniques improve the accuracy of booking forecasts for holiday packages?

Navigation in Repository

├── code/
│   ├── forecasting_models.ipynb/
│   │   ├── Data preparation
│   │   ├── Model definition 
│   │   ├── Clustering
│   │   ├── Ensemble models
|   |   └── Evaluation 
│   └── clustering.ipynb/
│       ├── Data preparation
│       ├── Clustering
│       └── Understand Clustering
├── visualizations/
│   ├── visualisations.ipynb
│   └── figures(13)    
├── README.md
├── cheatsheet.md
├── requirements.txt
└── final_results_tables.md

File Descriptions

  • forecasting_models.ipynb : includes all the code for all the forecasting models (1 benchmark model, 4 other forecasting models, 6 ensemble models). This includes hyperparameter tuning, model selection, and evaluation
  • clustering.ipynb: includes all code for clustering optimization, visualisations and insights generated. (3 clustering models)
  • visualisations.ipynb : includes all code to reproduce all figures used in this thesis

Description of the dataset

The dataset used in this analysis consists of cross-sectional data indexed by every holiday package’s departure week. For each departure week, the dataset includes information on the destination airport, destination country, and the number of passengers who booked a holiday package during a specific booking week. Additionally, for the booking weeks present in the dataset, the corresponding revenue and margin generated by Sunweb is captured.

Forecasting models used

  1. Linear Regression
    • Trained under two scenarios: model 4.1.2.1 and model 4.1.2.2
  2. Weighted Linear Regression
    • Trained under two scenarios: model 4.1.3.1 and model 4.1.3.2
  3. Ridge Regression
    • Hyperparameter tuning for value alpha
  4. Regression Tree
    • Hyperparameter tuning for max_depth, min_samples_leaf , min_samples_split , max_features

Clustering models used

  1. TS k-means Clustering
    • For two different models : Clustering 4.2.1.1 using the average booking patterns and Clustering 4.2.1.2 using the booking patterns aggregated in pairs.
  2. K-means clustering
    • For Clustering 4.2.2.1 : Using global Features extracted from the booking patterns

Packages to include

List of required Python packages:

  • pandas
  • seaborn
  • matplotlib
  • numpy
  • scikit-learn
  • tqdm
  • tslearn
  • scipy
  • statsmodels
  • fastdtw

To install these packages, look at requirements.txt file with these dependencies and use pip install -r requirements.txt to install them.

Conclusions

  1. Clustering is a valuable method in booking pattern forecasting.
  2. Ensemble models that utilized clustering outperformed all other models, including the benchmark model, which was not always outperformed by the general models.
  3. Clustering booking patterns is effective for grouping similar booking behaviors due to its unsupervised nature.
  4. The best performing clustering method was the TS k-means algorithm.
  5. Linear regression-based models generally outperformed non-linear ones.
  6. Contact

    Created by @annitziak - feel free to contact me!

About

Thesis 2024: A comparative study on enhancing holiday package booking forecasts using clustering-based ensemble models, leveraging data from Sunweb.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors