Skip to content

Latest commit

 

History

History
380 lines (329 loc) · 20.9 KB

File metadata and controls

380 lines (329 loc) · 20.9 KB

Architecture — fect

Generated by scriber for run REQ-20260401-104739 on 2026-04-01.

Overview

fect is an R package for estimating causal effects in panel data using counterfactual imputation methods (Fixed Effects Counterfactual Estimators). It targets causal panel analysis with binary treatments under the parallel trends assumption, supporting treatment switching and limited carryover effects. The core abstraction is counterfactual imputation: impute missing potential outcomes Y(0) for treated units using control units, then compute the Average Treatment Effect on the Treated (ATT) as the gap between observed and imputed outcomes. The package is an R/C++ hybrid using Rcpp and RcppArmadillo for numerically intensive linear algebra (SVD, EM iterations, matrix factorization). Key external dependencies include fixest (initial FE regression), ggplot2 (visualization), doParallel/doFuture/future.apply (parallel bootstrap), MASS (generalized inverse), and mvtnorm (multivariate normal draws). Estimation methods include FE (fixed effects), IFE (interactive fixed effects / factor model), MC (matrix completion via nuclear norm regularization), CFE (complex fixed effects with structured covariates), and wrappers for modern DID estimators. Version 2.2.0. References: Liu, Wang, and Xu (2024); Chiu et al. (2025).


Module Structure

%%{init: {'theme': 'neutral'}}%%
graph TD
    subgraph API["API Layer"]
        A1["default.R — fect() entry"]
        A2["interFE.R — interFE()"]
        A3["did_wrapper.R — DID wraps"]
        A4["fect_mspe.R — MSPE comp"]
    end

    subgraph Est["Estimation Layer"]
        E1["fe.R — fect_fe() IFE"]
        E2["mc.R — fect_mc() MC"]
        E3["cfe.R — fect_cfe() CFE"]
        E4["fect_nevertreated.R"]
    end

    subgraph CV["Cross-Validation"]
        V1["cv.R — fect_cv()"]
        V2["cv_binary.R — binary CV"]
    end

    subgraph Inf["Inference Layer"]
        I1["boot.R — fect_boot()"]
    end

    subgraph Diag["Diagnostics & Sensitivity"]
        D1["diagtest.R — pre-trend"]
        D2["fittest.R — fitness test"]
        D3["fect_sens.R — sensitivity"]
        D4["fect_iden.R — identification"]
    end

    subgraph Viz["Visualization Layer"]
        P1["plot.R — plot.fect()"]
        P2["esplot.R — esplot()"]
    end

    subgraph Cpp["C++ Core (RcppArmadillo)"]
        C1["ife.cpp / ife_sub.cpp"]
        C2["mc.cpp"]
        C3["cfe.cpp / cfe_sub.cpp"]
        C4["fe_sub.cpp — shared utils"]
        C5["auxiliary.cpp — EM helpers"]
        C6["binary_*.cpp — probit"]
    end

    subgraph Util["Support & Data"]
        U1["support.R — data helpers"]
        U2["polynomial.R — trends"]
        U3["effect.R / cumu.R — ATT"]
        U4["score.R / permutation.R"]
        U5["getcohort.R — cohorts"]
        U6["print.R — S3 print"]
    end

    A1 --> E1
    A1 --> E2
    A1 --> E3
    A1 --> E4
    A1 --> V1
    A1 --> I1
    A1 --> D1
    A1 --> P1
    V1 --> E1
    V1 --> E2
    I1 --> E1
    I1 --> E2
    I1 --> E3
    E1 --> C1
    E2 --> C2
    E3 --> C3
    C1 --> C4
    C2 --> C4
    C3 --> C4
    C1 --> C5
    E1 --> U1
    E2 --> U1
    E3 --> U1
    A1 --> U1
    A1 --> U2

    style A1 fill:#1e90ff,stroke:#1565c0,color:#fff
    style V1 fill:#1e90ff,stroke:#1565c0,color:#fff
    style E1 fill:#1e90ff,stroke:#1565c0,color:#fff
    style U1 fill:#1e90ff,stroke:#1565c0,color:#fff
    style C1 fill:#1e90ff,stroke:#1565c0,color:#fff
Loading

Module Reference

Module / File Layer Purpose Key Exports Changed
R/default.R (2,919 lines) API Main entry point, parameter validation, method routing; added n.init parameter for multi-start initialization fect(), fect.formula(), fect.default() yes
R/interFE.R (515 lines) API Standalone interactive fixed effects estimator interFE() no
R/did_wrapper.R (656 lines) API Modern DID estimator wrappers (did, DIDmultiplegtDYN) did_wrapper() no
R/fect_mspe.R (344 lines) API MSPE computation for model comparison fect_mspe() no
R/fe.R (954 lines) Estimation Interactive Fixed Effects / factor model estimation; added convergence warning on non-convergence fect_fe() yes
R/mc.R (804 lines) Estimation Matrix Completion via nuclear norm regularization fect_mc() no
R/cfe.R (1,172 lines) Estimation Complex Fixed Effects with structured covariates fect_cfe() no
R/fect_nevertreated.R (3,166 lines) Estimation Never-treated comparison group variant fect_nevertreated() no
R/cv.R (1,526 lines) Cross-Validation Hyperparameter selection (r, lambda) via MSPE/PC; added warm-start CV, multi-start initialization, convergence warning fect_cv() yes
R/cv_binary.R (421 lines) Cross-Validation Cross-validation for binary/probit models fect_cv_binary() no
R/boot.R (4,884 lines) Inference Bootstrap/jackknife/parametric inference with parallel support fect_boot() no
R/diagtest.R (215 lines) Diagnostics Pre-trend F-test, equivalence (TOST), placebo, carryover tests diagtest() no
R/fittest.R (636 lines) Diagnostics Fitness/wild bootstrap test fect_test() no
R/fect_sens.R (232 lines) Diagnostics Sensitivity analysis via HonestDiDFEct fect_sens() no
R/fect_iden.R (224 lines) Diagnostics Identification analysis fect_iden() no
R/plot.R (5,019 lines) Visualization Comprehensive ggplot2 plotting (gap, equiv, status, exit, factors, loadings, calendar, counterfactual, heterogeneous) plot.fect() no
R/esplot.R (1,118 lines) Visualization Standalone event-study plots esplot() no
R/plot_return.R (9 lines) Visualization Plot return object class definition (internal) no
R/support.R (676 lines) Utilities Data manipulation, initial FE fit, helper functions; added perturbedFit() for multi-start initialization get_term(), align_beta0(), perturbedFit() (internal) yes
R/polynomial.R (844 lines) Utilities Polynomial/B-spline trend specification fect_polynomial() no
R/effect.R (397 lines) Utilities Treatment effect decomposition by sub-group effect() no
R/cumu.R (206 lines) Utilities Cumulative ATT computation att.cumu() no
R/score.R (105 lines) Utilities Score-based inference (internal) no
R/permutation.R (264 lines) Utilities Permutation test for treatment effects (internal) no
R/getcohort.R (264 lines) Utilities Treatment cohort identification get.cohort() no
R/print.R (111 lines) Utilities S3 print methods for fect and interFE objects print.fect(), print.interFE() no
R/RcppExports.R (191 lines) Utilities Auto-generated Rcpp function bindings (auto-generated) no
src/ife.cpp (534 lines) C++ Core IFE algorithm: inter_fe(), inter_fe_ub(), inter_fe_d(); added converged flag propagation (Rcpp exports) yes
src/ife_sub.cpp (577 lines) C++ Core IFE sub-routines: SVD factor estimation, EM iterations, alternating minimization; burn-in fix preserves converged fit, converged flag in all 5 iteration functions (internal) yes
src/mc.cpp (223 lines) C++ Core Matrix completion: inter_fe_mc(), nuclear norm penalization (Rcpp exports) no
src/cfe.cpp (203 lines) C++ Core Complex FE: complex_fe_ub() (Rcpp exports) no
src/cfe_sub.cpp (564 lines) C++ Core Complex FE sub-routines: cfe_iter(), structured covariate handling (internal) no
src/fe_sub.cpp (291 lines) C++ Core Shared FE utilities: Y_demean(), panel_beta(), panel_factor(), panel_FE(), XXinv() (internal) no
src/binary_sub.cpp (539 lines) C++ Core Probit model sub-routines for binary outcomes (internal) no
src/binary_qr.cpp (347 lines) C++ Core QR-based probit estimation (internal) no
src/binary_svd.cpp (302 lines) C++ Core SVD-based probit estimation (internal) no
src/auxiliary.cpp (396 lines) C++ Core EM helpers, matrix utilities, log-likelihood computation (internal) no
src/fect.h (60 lines) C++ Core Header file with all C++ function declarations (header) no

Function Call Graph

Main Estimation Pipeline

%%{init: {'theme': 'neutral'}}%%
graph TD
    F1["fect()"]
    F2["fect.formula()"]
    F3["fect.default()"]
    F4["fect_cv()"]
    F5["fect_fe()"]
    F6["fect_mc()"]
    F7["fect_cfe()"]
    F8["fect_nevertreated()"]
    C1["inter_fe_ub() [C++]"]
    C2["inter_fe_mc() [C++]"]
    C3["complex_fe_ub() [C++]"]
    C4["inter_fe_d_qr_ub() [C++]"]
    S1["panel_factor() [C++]"]
    S2["panel_FE() [C++]"]
    S3["Y_demean() [C++]"]
    S4["cfe_iter() [C++]"]

    F1 --> F2
    F2 --> F3
    F3 -->|"CV=TRUE"| F4
    F3 -->|"method=ife/fe"| F5
    F3 -->|"method=mc"| F6
    F3 -->|"method=cfe"| F7
    F3 -->|"nevertreated"| F8
    F4 --> F5
    F4 --> F6
    F8 --> F5
    F8 --> F6
    F8 --> F7
    F5 --> C1
    F5 -->|"binary=TRUE"| C4
    F6 --> C2
    F7 --> C3
    C1 --> S1
    C1 --> S3
    C2 --> S2
    C2 --> S3
    C3 --> S4
    C3 --> S3
    F4 -.->|"warm-start"| C1
    U4["perturbedFit()"]
    F4 -->|"n.init > 1"| U4
    U4 --> C1

    style F1 fill:#1e90ff,stroke:#1565c0,color:#fff
    style F3 fill:#1e90ff,stroke:#1565c0,color:#fff
    style F4 fill:#1e90ff,stroke:#1565c0,color:#fff
    style F5 fill:#1e90ff,stroke:#1565c0,color:#fff
    style C1 fill:#1e90ff,stroke:#1565c0,color:#fff
    style U4 fill:#1e90ff,stroke:#1565c0,color:#fff
Loading

Inference and Diagnostics

%%{init: {'theme': 'neutral'}}%%
graph TD
    F3["fect.default()"]
    B1["fect_boot()"]
    D1["diagtest()"]
    D2["fittest()"]
    D3["fect_sens()"]
    D4["fect_iden()"]
    F5["fect_fe()"]
    F6["fect_mc()"]
    F7["fect_cfe()"]
    PL["plot.fect()"]
    ES["esplot()"]

    F3 -->|"se=TRUE"| B1
    F3 --> D1
    F3 --> D2
    B1 --> F5
    B1 --> F6
    B1 --> F7
    F3 --> PL
    F3 --> ES
    D3 -.->|"optional"| F3
    D4 -.->|"optional"| F3
Loading

Function Reference

Function Defined In Called By Calls Changed Purpose
fect() R/default.R user / exported UseMethod("fect") yes S3 generic entry point; added n.init parameter
fect.formula() R/default.R fect() fect.default() yes Parse formula, added n.init pass-through
fect.default() R/default.R fect.formula(), user fect_cv(), fect_fe(), fect_mc(), fect_cfe(), fect_boot(), diagtest() yes Added n.init validation and threading
fect_fe() R/fe.R fect.default(), fect_cv(), fect_boot() inter_fe_ub(), inter_fe_d_qr_ub() (C++) yes IFE estimation; added convergence warning check
fect_mc() R/mc.R fect.default(), fect_cv(), fect_boot() inter_fe_mc() (C++) no Matrix completion estimation (nuclear norm regularization)
fect_cfe() R/cfe.R fect.default(), fect_boot() complex_fe_ub() (C++) no Complex FE with structured covariates (Z, Q, gamma, kappa)
fect_nevertreated() R/fect_nevertreated.R fect.default() fect_fe(), fect_mc(), fect_cfe() no Wrapper for never-treated-only estimation sample
fect_cv() R/cv.R fect.default() fect_fe(), fect_mc(), perturbedFit() yes CV with warm-start across r/lambda candidates, multi-start init, convergence warning
perturbedFit() R/support.R fect_cv() rnorm(), sd() yes Generate perturbed initial values for multi-start robustness (internal)
fect_boot() R/boot.R fect.default() fect_fe(), fect_mc(), fect_cfe() no Bootstrap/jackknife inference engine with parallel support
interFE() R/interFE.R user / exported inter_fe() (C++) no Standalone interactive fixed effects estimator
did_wrapper() R/did_wrapper.R user / exported fixest::feols(), did::att_gt() no Modern DID estimator wrappers
plot.fect() R/plot.R user / exported ggplot2 functions no Comprehensive visualization with 10+ plot types
esplot() R/esplot.R user / exported ggplot2 functions no Standalone event-study plot
effect() R/effect.R user / exported (internal helpers) no Treatment effect decomposition by sub-group
att.cumu() R/cumu.R user / exported (internal helpers) no Cumulative ATT computation
diagtest() R/diagtest.R fect.default() (statistical computations) no Pre-trend, placebo, carryover, equivalence tests
fect_sens() R/fect_sens.R user / exported HonestDiDFEct functions no Sensitivity analysis
fect_iden() R/fect_iden.R user / exported (internal helpers) no Identification analysis
inter_fe_ub() src/ife.cpp fect_fe() panel_factor(), fe_ub(), Y_demean(), fe_ad_inter_iter() yes C++ IFE with unbalanced panels; now returns converged flag
inter_fe_mc() src/mc.cpp fect_mc() panel_FE(), Y_demean() no C++ matrix completion with nuclear norm
complex_fe_ub() src/cfe.cpp fect_cfe() cfe_iter(), Y_demean() no C++ complex FE estimation
panel_factor() src/fe_sub.cpp inter_fe_ub(), others SVD routines no Extract latent factors via SVD
panel_FE() src/fe_sub.cpp inter_fe_mc(), others soft-thresholding no Nuclear norm regularization / soft-thresholding
Y_demean() src/fe_sub.cpp most C++ estimators (arma operations) no Remove unit and/or time fixed effects

Data Flow

%%{init: {'theme': 'neutral'}}%%
graph TD
    IN["User Input (formula/data + params)"]
    FP["Formula Parsing (fect.formula)"]
    PV["Parameter Validation (fect.default)"]
    DP["Data Preprocessing (long to T x N matrices)"]
    INIT["initialFit() + perturbedFit()"]
    MI{{"n.init > 1?"}}
    MS["Multi-Start: trial runs, select best sigma2"]
    CV{{"CV=TRUE?"}}
    CVR["CV with Warm-Start (fect_cv)"]
    OPT["Optimal r/lambda selected"]
    NT{{"nevertreated?"}}
    NTW["fect_nevertreated() wrapper"]
    MR{{"Method?"}}
    IFE["fect_fe() -> inter_fe_ub() C++"]
    MC["fect_mc() -> inter_fe_mc() C++"]
    CFE["fect_cfe() -> complex_fe_ub() C++"]
    CI["Counterfactual Imputation (Y.ct)"]
    ATT["ATT = Y.obs - Y.ct"]
    SE{{"se=TRUE?"}}
    BOOT["fect_boot() — resample + re-estimate"]
    SECI["SEs, CIs, p-values"]
    DIAG["Diagnostic Tests (diagtest)"]
    OBJ["S3 Object Assembly (class fect)"]
    OUT["Output (print / plot / esplot)"]

    IN --> FP
    FP --> PV
    PV --> DP
    DP --> INIT
    INIT --> MI
    MI -- yes --> MS
    MS --> CV
    MI -- no --> CV
    CV -- yes --> CVR
    CVR --> OPT
    OPT --> NT
    CV -- no --> NT
    NT -- yes --> NTW
    NTW --> MR
    NT -- no --> MR
    MR -- ife/fe --> IFE
    MR -- mc --> MC
    MR -- cfe --> CFE
    IFE --> CC{{"converged?"}}
    CC -- no --> WARN["Emit convergence warning"]
    CC -- yes --> CI
    WARN --> CI
    MC --> CI
    CFE --> CI
    CI --> ATT
    ATT --> SE
    SE -- yes --> BOOT
    BOOT --> SECI
    SECI --> DIAG
    SE -- no --> DIAG
    DIAG --> OBJ
    OBJ --> OUT

    style INIT fill:#1e90ff,stroke:#1565c0,color:#fff
    style MI fill:#1e90ff,stroke:#1565c0,color:#fff
    style MS fill:#1e90ff,stroke:#1565c0,color:#fff
    style CVR fill:#1e90ff,stroke:#1565c0,color:#fff
    style CC fill:#1e90ff,stroke:#1565c0,color:#fff
    style WARN fill:#1e90ff,stroke:#1565c0,color:#fff
Loading

Architectural Patterns

  • S3 Dispatch with Formula Interface: fect() uses UseMethod() to support both formula and direct (Y, D, X) interfaces. fect.formula() parses the formula into variable names, fect.default() does the computation. Same pattern for interFE().

  • R/C++ Layered Computation: All numerically intensive operations (SVD, EM iterations, demeaning, matrix factorization) are implemented in C++ via RcppArmadillo. R handles data wrangling, parameter validation, control flow, and result assembly. The boundary is at the estimation functions: R fect_fe() calls C++ inter_fe_ub().

  • Method-Agnostic Pipeline: fect.default() provides a single preprocessing, CV, estimation, inference, diagnostics pipeline. Method-specific logic is encapsulated in fect_fe(), fect_mc(), fect_cfe(). Adding a new estimation method requires only a new estimation function and a routing entry.

  • Matrix-Oriented Data Representation: Panel data is converted from long-form data frames to T x N matrices early in fect.default(). Covariates become T x N x p arrays. All downstream computation operates on these matrix forms, enabling efficient C++ computation.

  • Two-Tier Tolerance: Cross-validation uses a looser tolerance (max(tol, 1e-3)) for speed during hyperparameter search, while final estimation uses the user-specified tolerance for precision.

  • Warm-Start CV: When sweeping over consecutive r candidates (IFE) or lambda candidates (MC), the fitted values from the previous candidate are reused as the starting point for the next. Per-fold and full-data caches (warm_fit_cv, warm_fit_full) store the $fit matrix between iterations, reducing EM iteration counts for adjacent hyperparameter values. Unobserved entries are zeroed before reuse to prevent stale value leakage.

  • Multi-Start Initialization: The n.init parameter (default 1, preserving existing behavior) controls the number of perturbed starting points. When n.init > 1, perturbedFit() generates Gaussian-perturbed copies of the base Y0 (5% of data SD) and beta0 (10% of coefficient magnitude), runs trial estimations, and selects the initialization with the lowest residual variance (sigma2). This mitigates local optima sensitivity in the EM algorithm.

  • Burn-in Warm-Start: In the weighted IFE estimation, the burn-in phase progressively reduces the rank from d down to r. Previously, upon convergence during burn-in, both fit and fit_old were reset to the initial Y0, discarding the converged solution. Now only fit_old is reset (to the current fit), preserving the converged state as the starting point for the real estimation phase.

  • Convergence Diagnostics: All five C++ iteration functions (fe_ad_iter, fe_ad_covar_iter, fe_ad_inter_iter, fe_ad_inter_covar_iter, beta_iter) return a converged flag (1 if dif <= tol, 0 if max_iter reached). This flag propagates through inter_fe_ub() to R, where fect_cv() and fect_fe() emit a warning() on non-convergence. CV inner-loop calls do not warn (non-convergence at loose tolerance is expected).

  • Parallel Bootstrap via foreach: fect_boot() uses foreach with doParallel/doFuture backends for parallel bootstrap replication. Includes trim_closure_env() optimization to reduce serialization overhead by keeping only referenced symbols in function environments.

  • Counterfactual Imputation as Core Abstraction: All methods share the same conceptual framework: impute Y(0) for treated units using untreated observations, compute ATT as the gap. FE uses additive fixed effects, IFE adds latent factors (F * L'), MC uses nuclear norm regularization, CFE adds structured covariates.

  • Never-Treated vs Not-Yet-Treated Estimation Samples: The package supports two estimation sample strategies. "notyettreated" includes not-yet-treated observations (requiring EM for missing data), "nevertreated" uses only never-treated units (allowing direct SVD). The fect_nevertreated() wrapper handles the latter.

  • Comprehensive Diagnostic Suite: Built-in tests (F-test, TOST equivalence, placebo, carryover) allow users to validate the parallel trends assumption without external tools. Sensitivity analysis via optional HonestDiDFEct integration.


Notes

  • FE is internally treated as IFE with r = 0 (zero latent factors). The code sets method = "ife" when method = "fe" and r = 0.
  • The gsynth method is a compatibility alias that forces time.component.from = "nevertreated" and em = FALSE, matching the behavior of the gsynth package.
  • boot.R (4,884 lines) and plot.R (5,019 lines) are the two largest files. Both could benefit from modular decomposition in future refactors.
  • The binary option (probit models) is only available with method = "ife" and has dedicated C++ implementations (binary_qr.cpp, binary_svd.cpp, binary_sub.cpp).
  • The package uses fixest::feols() for initial OLS regression to obtain starting values for iterative estimation.
  • Vignettes are organized as a Quarto book (vignettes/_quarto.yml) with 9 chapters covering getting started, FE, IFE/MC, CFE, heterogeneous effects, plots, gsynth compatibility, panel diagnostics, and sensitivity analysis.
  • 10 bundled datasets (simdata, sim_base, sim_gsynth, sim_linear, sim_region, sim_trend, turnout, gs2020, hh2019, simgsynth) support examples and testing.
  • 11 exported functions and 8 S3 methods registered in NAMESPACE.
  • Total R source: 27,872 lines across 27 files. Total C++ source: 4,848 lines across 12 files (plus header).