GitHub - krisoye/mdlr: An R toolkit for streamlined model development, training, and forecasting at scale.

mdlr

An R toolkit for streamlined model development, training, and forecasting at scale.

Description

mdlr provides a consistent, tidy-friendly workflow to:

load and prepare regression-ready datasets (including column selection and date filtering)
generate rolling/expanding training windows and forecast dates
fit a wide array of models with a single interface (base R, parsnip engines including glmnet, mgcv, rstanarm, and more)
iterate over model permutations (formulas, parameters, dates, and sub-models)
export fitted model artifacts and predictions to disk in tidy, partitioned layouts

Core exported functions (see @man/): fit_mdl, train_and_predict, gather_regression_source, generate_training_dates, generate_stat_grp, generate_model_permutations, iterate_train_and_predict, iterate_stat_grp, and export_mdl_stats. See @DESCRIPTION for dependencies and package metadata.

Features

Unified modeling interface: Use the same fit_mdl/train_and_predict surface across base R models and parsnip models and engines (e.g., lm, glmnet, mgcv, rstanarm).
Dataset sourcing for regression: gather_regression_source automatically loads Hive-style partitioned data (e.g., via arrow::open_dataset) and re-attaches meta columns for modeling with precise column selection helpers.
Time-window generation: generate_training_dates creates rolling or expanding backtest windows with forward forecast dates, driven by first/last train dates and window sizes.
Statistical grouping: generate_stat_grp builds hierarchical statistical groups with configurable thresholds and naming, enabling robust segment-based modeling.
Batch experimentation: generate_model_permutations and iterate_train_and_predict help cross model parameters with backtest dates and iterate training/forecasting runs.
Tidy exports at scale: Write tidy, glance, and forecast outputs using pluggable writers (e.g., arrow::write_dataset, readr::write_rds) with optional partitioning by MODEL_ID, DATE, SUB_MODEL, etc.

Installation

mdlr is an R package. From the project root directory:

# Option 1: Install from source in this folder
install.packages("devtools")
devtools::install(local = TRUE)

# Option 2: Develop locally without install
devtools::load_all()

Required imports are listed in @DESCRIPTION (e.g., dplyr, glmnet, lubridate, magrittr, assertthat, butcher, etlr). Some functionality uses additional packages if you opt into them:

For parquet/dataset IO and partitioned exports: arrow
For unified modeling engines: parsnip
For summaries: broom, broom.mixed
For writing RDS: readr
For examples in tests: fs, rstanarm, mgcv, glmnet

Install as needed, for example:

install.packages(c("arrow", "parsnip", "broom", "readr", "fs"))

Usage

Below are distilled examples inspired by @tests/ to help you get started quickly.

Fit a linear model (base R) via fit_mdl:

library(mdlr)
library(dplyr)

orig_frame <- tibble::tibble(
  ID3 = 1:60,
  N1 = sqrt(ID3),
  N2 = sqrt(N1),
  N3 = sqrt(N2) + rnorm(60)
)

fit <- fit_mdl(
  .data = orig_frame,
  .mdl_formula = formula("ID3 ~ N1 + N2 + N3"),
  .mdl_fxn = stats::lm
)

broom::tidy(fit)

Fit elastic net via parsnip + glmnet:

fit_en <- fit_mdl(
  .data = orig_frame,
  .mdl_formula = formula("ID3 ~ N1 + N2 + N3"),
  .mdl_fxn = parsnip::linear_reg,
  .mdl_parameters = list(penalty = double(1), mixture = double(1)),
  .engine_parameters = list(engine = "glmnet")
)

Train and export model artifacts:

base_dir <- etlr::create_temp_dir()
mdl_id <- "example_mdl"

mdl <- train_and_predict(
  .data_train = orig_frame,
  .mdl_formula = formula("ID3 ~ N3"),
  .mdl_fxn = parsnip::linear_reg,
  .mdl_id = mdl_id,
  .tidy_file_path = file.path(base_dir, "TIDY"),
  .glance_file_path = file.path(base_dir, "GLANCE"),
  .export_stat_fxn = arrow::write_dataset,
  .partitioning = c("MODEL_ID", "DATE", "SUB_MODEL"),
  .score_index_columns = c("DATE", "SYMBOL")
)

# Later, read exports
arrow::open_dataset(file.path(base_dir, "TIDY")) |> dplyr::collect()
arrow::open_dataset(file.path(base_dir, "GLANCE")) |> dplyr::collect()

Forecast export (train on a subset, score on future date):

train_data <- orig_frame |> dplyr::mutate(DATE = as.Date("2024-10-18"))
forecast_data <- orig_frame |> dplyr::mutate(DATE = as.Date("2024-10-25"))

forecast_dir <- file.path(base_dir, "FORECAST")

mdl <- train_and_predict(
  .data_train = train_data,
  .data_forecast = forecast_data,
  .mdl_formula = formula("ID3 ~ N3"),
  .mdl_fxn = parsnip::linear_reg,
  .mdl_id = mdl_id,
  .mdl_forecast_folder = forecast_dir,
  .score_index_columns = c("DATE", "SYMBOL"),
  .partitioning = c("MODEL_ID", "DATE")
)

arrow::open_dataset(forecast_dir) |> dplyr::collect()

Generate rolling/expanding training windows:

dates <- generate_training_dates(
  .first_train_date = lubridate::as_date("2005-01-01"),
  .last_train_date = lubridate::as_date("2005-03-30"),
  .training_window_weeks = 5,
  .increment_by = "1 week",
  .fwd_forecast_weeks = 1,
  .rolling = TRUE # set FALSE for expanding
)

Build hierarchical statistical groups:

stat_grp <- generate_stat_grp(
  .data = dplyr::tibble(RS_SUBIND = "SM1", RS_INDUSTRY = "MD1", RS_INDGRP = "LG1", RS_SECTOR = "XX1"),
  .stat_grouping_hierarchy = c("RS_SUBIND", "RS_INDUSTRY", "RS_INDGRP", "RS_SECTOR"),
  .stat_grouping_threshold = 25,
  .default_prefix = "STAT"
)

Source regression-ready data from Hive-style partitions:

src <- gather_regression_source(
  .datapath = "/path/to/dataset",
  .load_fxn = arrow::open_dataset,
  .context_vars = \() dplyr::any_of(c("DATE", "GROUP", "SYMBOL")),
  .response_vars = \() dplyr::any_of(c("y")),
  .covariate_vars = \() dplyr::matches("^x\\d$") ,
  .min_train_date = lubridate::as_date("2024-01-01"),
  .forecast_date = lubridate::as_date("2024-12-01")
)

For more examples, see the test files in @tests/.

Contributing

Contributions are welcome! Typical flow:

Open an issue describing the enhancement or bug.
Create a feature branch from main.
Add tests under tests/testthat/ demonstrating the change.
Ensure R CMD check passes locally.
Submit a pull request with a concise description and rationale.

Please keep code readable, prefer tidy principles, and ensure exports remain stable for downstream consumers. When adding features that rely on optional packages (e.g., arrow, parsnip), gate usage behind parameters and document clearly in @man/.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
R		R
man		man
tests		tests
DESCRIPTION		DESCRIPTION
NAMESPACE		NAMESPACE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mdlr

Description

Features

Installation

Usage

Contributing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

mdlr

Description

Features

Installation

Usage

Contributing

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages