Skip to content

TimoDimi/replication_triptych

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

16 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Replication package for "Evaluating Probabilistic Classifiers: The Triptych"

Timo Dimitriadis and Alexander I. Jordan

Overview & contents

The code in this replication material generates the 12 figures and 3 tables for the paper "Evaluating Probabilistic Classifiers: The Triptych". Each figure and table is generated separately by its corresponding script file Figure_[xx]_*.R or Table_[xx]_*.R, respectively.

The main contents of the repository are the following:

  • plots/: folder of generated plots as PDF files
  • tables/: folder of generated tables as txt files
  • data-raw/: folder of raw data files and the functions for processing them
  • data/: folder of processed data files
  • Figure_[xx]_*.R: R scripts to create the respective figures
  • Table_[xx]_*.R: R scripts to create the respective tables

Instructions & computational requirements.

All file paths are relative to the root of the replication package. Please set your working directory accordingly, or open the .Rproj file using RStudio.

The analysis files Figure_[xx]_*.R and Table_[xx]_*.R can be run individually, in any order.

These analyses were run on R 4.3.1, and we explicitly use the following packages in the analysis files: triptych (0.1.2), ggplot2 (3.4.3), patchwork (1.1.3), dplyr (1.1.3), tidyr (1.3.0), purrr (1.0.2), grid (base R), lubridate (1.9.2).

A comprehensive list of dependencies can be found in the renv.lock file. For a convenient setup in a (local) R session, we recommend using the renv package. The following steps are required once:

# install.packages("renv")
renv::activate()
renv::restore() # install dependencies
renv::status() # check environment

Data availability and provenance

Solar Flare Forecasts

The prepared forecast-observation data are located at data/C1_flares.rda and data/M1_flares.rda, for the classes C1.0+ and M1.0+ of solar flare intensity. These files are generated by the script data-raw/prepare_SolarFlares.R using the pre-processed data files SF.FC.C1.rda and SF.FC.M1.rda from Dimitriadis and Jordan (2021, https://doi.org/10.5281/zenodo.4699945). That replication package contains a description of the pre-processing of the original data on solar flares from Leka and Park (2019, https://doi.org/10.7910/DVN/HYP74O).

SPF Forecasts for Economic Recessions

The prepared forecast-observation data are located at data/spf.gpd.long.rda. They are also available from Dimitriadis and Jordan (2021, https://doi.org/10.5281/zenodo.4699945), a replication package that contains a description of the pre-processing of the original data from the Federal Reserve Bank of Philadelphia (https://www.philadelphiafed.org/surveys-and-data/).

Fragile Family Challenge

The Fragile Family Challenge (FFC) is a scientific mass collaboration where 160 teams built predictions for six variables, where we analyze two binary ones (eviction and job training). The prepared forecast and outcome data are located in the data/ folder, as files FFC_Eviction.rda and FFC_JobTraining.rda.

The forecasts (submissions) of the 160 teams together with the realizations originate from Salganik et al (2020, https://doi.org/10.7910/DVN/CXSECU), located in the data/derived/submissions.csv.zip file. The 9 benchmark forecasts have to be generated separately by obtaining data files from https://opr.princeton.edu/archive/ as described in Salganik et al (2020). We prepare the FFC data using these two (in this repository unavailable) files within the script prepare_FragileFamilyChallenge.R.

References

Dimitriadis T, Jordan AI. 2021. Replication package for "Stable reliability diagrams for probabilistic classifiers" (v1.0.0) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4699946

Leka KD, Park S. 2019. A Comparison of Flare Forecasting Methods II: Data and Supporting Code. Harvard Dataverse, V1, UNF:6:yz1noMojlzL7SZM+9flXhQ== [fileUNF]. https://doi.org/10.7910/DVN/HYP74O

Salganik M, Lundberg I, Kindel A, McLanahan S. 2020. Replication materials for "Measuring the predictability of life outcomes using a scientific mass collaboration". Harvard Dataverse, V3, UNF:6:Cj8wiioSf8JGyRLcDo5d3w== [fileUNF]. https://doi.org/10.7910/DVN/CXSECU

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages