Skip to content

statsclaw/example-probit

Repository files navigation

exampleProbit

Three Probit Estimation Methods via Rcpp/RcppArmadillo

exampleProbit implements three classical probit estimators entirely in C++ for speed:

  • MLE -- Newton-Raphson maximum likelihood
  • Gibbs -- Bayesian data augmentation (Albert-Chib 1993)
  • Metropolis-Hastings -- Random-walk MH with Gaussian proposals

All core routines use RcppArmadillo for linear algebra, yielding fast estimation even at large sample sizes.

This entire package was built by StatsClaw from a single PDF specification: probit_spec.pdf. StatsClaw's Planner comprehended the mathematics (log-likelihood, gradient, Hessian for MLE; Albert-Chib truncated normal augmentation for Gibbs; random-walk proposal with tuned acceptance for MH), produced three independent specifications for its Builder, Tester, and Simulator pipelines, and delivered a complete R package with C++ backends, 28 unit tests, and a 6,000-run Monte Carlo simulation — from a single prompt.

Installation

# install.packages("remotes")
remotes::install_github("statsclaw/example-probit")

Quick Example

library(exampleProbit)

set.seed(42)
N <- 500
X <- cbind(1, rnorm(N))
beta_true <- c(-1, 0.5)
y <- rbinom(N, 1, pnorm(X %*% beta_true))

# MLE (Newton-Raphson)
mle_fit <- probit_mle(X, y)

# Bayesian Gibbs Sampler (Albert-Chib)
gibbs_fit <- probit_gibbs(X, y, n_iter = 3500, burn_in = 500)

# Random-Walk Metropolis-Hastings
mh_fit <- probit_mh(X, y, n_iter = 10000, burn_in = 2000)

Monte Carlo Simulation Results

Per-method comparison (3 columns × 4 metrics):

Monte Carlo Results — 3 Methods Side by Side

Each column is one estimator. Rows: |Bias|, RMSE, Coverage, Time. Solid = intercept (β₀ = −1), dashed = slope (β₁ = 0.5). All three methods converge to the truth as N grows: bias → 0, RMSE ∝ 1/√N, coverage → 0.95. MLE (blue) runs in microseconds; Gibbs (red) and MH (green) require orders of magnitude more time but provide full posterior inference. MH acceptance rates are 0.396–0.401, near the theoretical optimum of ~0.40. 500 replications per scenario, 6,000 total estimator fits, zero failures.

Design

Parameter Value
DGP Pr(y = 1 | x) = \Phi(\beta_0 + \beta_1 x_1),   x_1 \sim N(0, 1)
True parameters \beta_0 = -1,   \beta_1 = 0.5
Sample sizes N \in {200, 500, 1000, 5000}
Replications R = 500 per scenario
MLE max 100 iterations, tolerance 10^{-8}
Gibbs 3,500 iterations, 500 burn-in
MH 10,000 iterations, 2,000 burn-in, scale = 2.4

Results

Table 1: Intercept (\beta_0 = -1)

N Method Bias RMSE Coverage Time (s)
200 MLE -0.0159 0.1257 0.944 0.0001
200 Gibbs -0.0244 0.1293 0.930 0.0219
200 MH -0.0244 0.1293 0.928 0.0593
500 MLE -0.0046 0.0717 0.962 0.0001
500 Gibbs -0.0078 0.0725 0.960 0.0551
500 MH -0.0079 0.0724 0.960 0.1469
1000 MLE -0.0042 0.0524 0.954 0.0002
1000 Gibbs -0.0056 0.0528 0.948 0.1114
1000 MH -0.0059 0.0527 0.952 0.2951
5000 MLE -0.0023 0.0237 0.950 0.0010
5000 Gibbs -0.0026 0.0238 0.950 0.5677
5000 MH -0.0026 0.0237 0.950 1.4849

Table 2: Slope (\beta_1 = 0.5)

N Method Bias RMSE Coverage Time (s)
200 MLE 0.0178 0.1370 0.924 0.0001
200 Gibbs 0.0257 0.1409 0.920 0.0219
200 MH 0.0257 0.1409 0.924 0.0593
500 MLE 0.0120 0.0797 0.940 0.0001
500 Gibbs 0.0150 0.0808 0.940 0.0551
500 MH 0.0149 0.0807 0.942 0.1469
1000 MLE 0.0040 0.0554 0.948 0.0002
1000 Gibbs 0.0053 0.0559 0.940 0.1114
1000 MH 0.0056 0.0558 0.942 0.2951
5000 MLE 0.0011 0.0239 0.954 0.0010
5000 Gibbs 0.0014 0.0239 0.946 0.5677
5000 MH 0.0014 0.0240 0.946 1.4849

MH Acceptance Rates

N Acceptance Rate
200 0.401
500 0.398
1000 0.397
5000 0.396

Key Findings

  1. Consistency. All three estimators are consistent -- bias vanishes as N grows.
  2. Root-N convergence. RMSE decreases at the 1/sqrt(N) rate across all methods.
  3. Nominal coverage. 95% confidence/credible interval coverage falls in [0.92, 0.96] for every scenario.
  4. MLE speed advantage. C++ Newton-Raphson is 342--564x faster than C++ Gibbs sampling.
  5. Bayesian-frequentist agreement. Gibbs and MH posterior means converge to MLE as N increases.
  6. Optimal MH tuning. Acceptance rate is stable at ~0.40, near the theoretical optimum for a 2-dimensional random walk.

Acceptance Criteria

# Criterion Result
C1 MLE matches glm(probit) within 0.05 max 0.018
C2 Bayesian approximates MLE for N >= 500 within 0.1 max 0.003
C3 MH acceptance rate in [0.20, 0.50] 0.396--0.401
C4 95% CI coverage in [0.90, 0.99] 0.920--0.962
C5 MLE at least 5x faster than Gibbs 342--564x
C6 R CMD check: 0 errors, 0 warnings 0 errors, 0 warnings
C7 Package installs and runs verified

Run the Simulation Yourself

From R:

library(exampleProbit)

results <- run_probit_simulation(
  N_vec  = c(200L, 500L, 1000L, 5000L),
  R      = 500L,
  seed   = 2026L,
  verbose = TRUE
)

print(results$summary_table)

From the command line:

Rscript inst/simulation/run_simulation.R

Dependencies

License

MIT

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors