feat(model): Train LightGBM model and log ablation to MLflow#72
feat(model): Train LightGBM model and log ablation to MLflow#72
Conversation
- Compute coverage_80pct in LightGBM cross_validate - Run ablation in run_train.py with base features (no congestion) - Log LightGBM run metrics and save model artifact - Set experimental tag for TFT run - Fix TFT missing column names for covariates - Update RESULTS.md with empirical metrics Refs #58
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly refines the model training pipeline by introducing an ablation study for the LightGBM model, allowing for a clearer understanding of feature importance. It also enhances the robustness of MLflow tracking by logging mean cross-validation metrics and explicitly tagging experimental runs. These changes contribute to a more transparent and analytically sound machine learning development process. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request effectively introduces an ablation study for the LightGBM model, enhancing the model evaluation process. The changes to the training script improve modularity and MLflow logging by calculating and logging mean cross-validation metrics. The addition of the 80% coverage interval metric is also a valuable improvement. I have one suggestion regarding the maintainability of feature selection for the ablation study. Overall, this is a solid contribution to the project's MLOps capabilities.
Replaces brittle string matching with an explicit set of congestion features to exclude during the ablation study run. Refs #58
Fixes #58, Closes #17
This PR:
coverage_80pctto LightGBM's cross-validation metrics.run_train.py.run_train.pywhere LightGBM is trained without congestion features (travel_time_var,delay_index,disruption_flag).models/lgbm_forecaster.pkl.experimentaltag for the TFT model run.RESULTS.mdwith the ablation test numbers..gitignoreto un-ignorepulsecast/models/.