feat(model): Train LightGBM model and log ablation to MLflow by olveirap · Pull Request #72 · olveirap/pulsecast

olveirap · 2026-03-23T22:09:55Z

Fixes #58, Closes #17

This PR:

Adds coverage_80pct to LightGBM's cross-validation metrics.
Computes mean metrics for cross-validation runs in run_train.py.
Incorporates an explicit ablation study in run_train.py where LightGBM is trained without congestion features (travel_time_var, delay_index, disruption_flag).
Saves the trained model to models/lgbm_forecaster.pkl.
Sets an experimental tag for the TFT model run.
Updates RESULTS.md with the ablation test numbers.
Updates .gitignore to un-ignore pulsecast/models/.

- Compute coverage_80pct in LightGBM cross_validate - Run ablation in run_train.py with base features (no congestion) - Log LightGBM run metrics and save model artifact - Set experimental tag for TFT run - Fix TFT missing column names for covariates - Update RESULTS.md with empirical metrics Refs #58

gemini-code-assist · 2026-03-23T22:10:13Z

Summary of Changes

Hello, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the model training pipeline by introducing an ablation study for the LightGBM model, allowing for a clearer understanding of feature importance. It also enhances the robustness of MLflow tracking by logging mean cross-validation metrics and explicitly tagging experimental runs. These changes contribute to a more transparent and analytically sound machine learning development process.

Highlights

LightGBM Model Enhancements: Added 'coverage_80pct' to LightGBM's cross-validation metrics and implemented logging of mean metrics across CV folds.
Ablation Study Integration: Introduced an explicit ablation study for LightGBM, training a version without congestion-related features (travel_time_var, delay_index, disruption_flag) to assess their impact.
MLflow Tracking Improvements: Enhanced MLflow logging by setting an 'experimental' tag for TFT model runs and saving the trained LightGBM model as an artifact.
Data Feature Refinement: Updated the TFT model to use more specific feature names like 'origin_delay_index_lag1' and 'dest_delay_index_lag1', and ensured 'route_id' is cast to UTF-8 for proper handling.
Documentation and Configuration Updates: Updated 'RESULTS.md' with new LightGBM performance numbers and marked TFT as experimental, and modified '.gitignore' to correctly manage the 'models/' directory.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request effectively introduces an ablation study for the LightGBM model, enhancing the model evaluation process. The changes to the training script improve modularity and MLflow logging by calculating and logging mean cross-validation metrics. The addition of the 80% coverage interval metric is also a valuable improvement. I have one suggestion regarding the maintainability of feature selection for the ablation study. Overall, this is a solid contribution to the project's MLOps capabilities.

scripts/run_train.py

Replaces brittle string matching with an explicit set of congestion features to exclude during the ablation study run. Refs #58

gemini-code-assist bot reviewed Mar 23, 2026

View reviewed changes

scripts/run_train.py Outdated Show resolved Hide resolved

olveirap added 2 commits March 23, 2026 19:16

refactor(train): use explicit exclusion set for ablation features

ee8b0f8

Replaces brittle string matching with an explicit set of congestion features to exclude during the ablation study run. Refs #58

fix: typing and add mlruns to gitignore

edafafe

olveirap merged commit 7bd27a5 into main Mar 23, 2026
3 checks passed

olveirap deleted the feature/issue-58-train-lgbm branch March 23, 2026 22:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(model): Train LightGBM model and log ablation to MLflow#72

feat(model): Train LightGBM model and log ablation to MLflow#72
olveirap merged 3 commits intomainfrom
feature/issue-58-train-lgbm

olveirap commented Mar 23, 2026 •

edited

Loading

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

olveirap commented Mar 23, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist bot commented Mar 23, 2026

Summary of Changes

Highlights

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

olveirap commented Mar 23, 2026 •

edited

Loading