Skip to content

feat(model): Train LightGBM model and log ablation to MLflow#72

Merged
olveirap merged 3 commits intomainfrom
feature/issue-58-train-lgbm
Mar 23, 2026
Merged

feat(model): Train LightGBM model and log ablation to MLflow#72
olveirap merged 3 commits intomainfrom
feature/issue-58-train-lgbm

Conversation

@olveirap
Copy link
Copy Markdown
Owner

@olveirap olveirap commented Mar 23, 2026

Fixes #58, Closes #17

This PR:

  • Adds coverage_80pct to LightGBM's cross-validation metrics.
  • Computes mean metrics for cross-validation runs in run_train.py.
  • Incorporates an explicit ablation study in run_train.py where LightGBM is trained without congestion features (travel_time_var, delay_index, disruption_flag).
  • Saves the trained model to models/lgbm_forecaster.pkl.
  • Sets an experimental tag for the TFT model run.
  • Updates RESULTS.md with the ablation test numbers.
  • Updates .gitignore to un-ignore pulsecast/models/.

- Compute coverage_80pct in LightGBM cross_validate

- Run ablation in run_train.py with base features (no congestion)

- Log LightGBM run metrics and save model artifact

- Set experimental tag for TFT run

- Fix TFT missing column names for covariates

- Update RESULTS.md with empirical metrics

Refs #58
@gemini-code-assist
Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly refines the model training pipeline by introducing an ablation study for the LightGBM model, allowing for a clearer understanding of feature importance. It also enhances the robustness of MLflow tracking by logging mean cross-validation metrics and explicitly tagging experimental runs. These changes contribute to a more transparent and analytically sound machine learning development process.

Highlights

  • LightGBM Model Enhancements: Added 'coverage_80pct' to LightGBM's cross-validation metrics and implemented logging of mean metrics across CV folds.
  • Ablation Study Integration: Introduced an explicit ablation study for LightGBM, training a version without congestion-related features (travel_time_var, delay_index, disruption_flag) to assess their impact.
  • MLflow Tracking Improvements: Enhanced MLflow logging by setting an 'experimental' tag for TFT model runs and saving the trained LightGBM model as an artifact.
  • Data Feature Refinement: Updated the TFT model to use more specific feature names like 'origin_delay_index_lag1' and 'dest_delay_index_lag1', and ensured 'route_id' is cast to UTF-8 for proper handling.
  • Documentation and Configuration Updates: Updated 'RESULTS.md' with new LightGBM performance numbers and marked TFT as experimental, and modified '.gitignore' to correctly manage the 'models/' directory.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively introduces an ablation study for the LightGBM model, enhancing the model evaluation process. The changes to the training script improve modularity and MLflow logging by calculating and logging mean cross-validation metrics. The addition of the 80% coverage interval metric is also a valuable improvement. I have one suggestion regarding the maintainability of feature selection for the ablation study. Overall, this is a solid contribution to the project's MLOps capabilities.

Replaces brittle string matching with an explicit set of congestion features to exclude during the ablation study run.

Refs #58
@olveirap olveirap merged commit 7bd27a5 into main Mar 23, 2026
3 checks passed
@olveirap olveirap deleted the feature/issue-58-train-lgbm branch March 23, 2026 22:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Phase 2] Train LightGBM demand model and log ablation to MLflow Populate RESULTS.md with ablation study metrics and interpretation

1 participant