exampleProbit

Three Probit Estimation Methods via Rcpp/RcppArmadillo

exampleProbit implements three classical probit estimators entirely in C++ for speed:

MLE -- Newton-Raphson maximum likelihood
Gibbs -- Bayesian data augmentation (Albert-Chib 1993)
Metropolis-Hastings -- Random-walk MH with Gaussian proposals

All core routines use RcppArmadillo for linear algebra, yielding fast estimation even at large sample sizes.

This entire package was built by StatsClaw from a single PDF specification: probit_spec.pdf. StatsClaw's Planner comprehended the mathematics (log-likelihood, gradient, Hessian for MLE; Albert-Chib truncated normal augmentation for Gibbs; random-walk proposal with tuned acceptance for MH), produced three independent specifications for its Builder, Tester, and Simulator pipelines, and delivered a complete R package with C++ backends, 28 unit tests, and a 6,000-run Monte Carlo simulation — from a single prompt.

Installation

# install.packages("remotes")
remotes::install_github("statsclaw/example-probit")

Quick Example

library(exampleProbit)

set.seed(42)
N <- 500
X <- cbind(1, rnorm(N))
beta_true <- c(-1, 0.5)
y <- rbinom(N, 1, pnorm(X %*% beta_true))

# MLE (Newton-Raphson)
mle_fit <- probit_mle(X, y)

# Bayesian Gibbs Sampler (Albert-Chib)
gibbs_fit <- probit_gibbs(X, y, n_iter = 3500, burn_in = 500)

# Random-Walk Metropolis-Hastings
mh_fit <- probit_mh(X, y, n_iter = 10000, burn_in = 2000)

Monte Carlo Simulation Results

Per-method comparison (3 columns × 4 metrics):

Each column is one estimator. Rows: |Bias|, RMSE, Coverage, Time. Solid = intercept (β₀ = −1), dashed = slope (β₁ = 0.5). All three methods converge to the truth as N grows: bias → 0, RMSE ∝ 1/√N, coverage → 0.95. MLE (blue) runs in microseconds; Gibbs (red) and MH (green) require orders of magnitude more time but provide full posterior inference. MH acceptance rates are 0.396–0.401, near the theoretical optimum of ~0.40. 500 replications per scenario, 6,000 total estimator fits, zero failures.

Design

Parameter	Value
DGP	Pr(y = 1 \| x) = \Phi(\beta_0 + \beta_1 x_1), x_1 \sim N(0, 1)
True parameters	\beta_0 = -1, \beta_1 = 0.5
Sample sizes	N \in {200, 500, 1000, 5000}
Replications	R = 500 per scenario
MLE	max 100 iterations, tolerance 10^{-8}
Gibbs	3,500 iterations, 500 burn-in
MH	10,000 iterations, 2,000 burn-in, scale = 2.4

Results

Table 1: Intercept (\beta_0 = -1)

N	Method	Bias	RMSE	Coverage	Time (s)
200	MLE	-0.0159	0.1257	0.944	0.0001
200	Gibbs	-0.0244	0.1293	0.930	0.0219
200	MH	-0.0244	0.1293	0.928	0.0593
500	MLE	-0.0046	0.0717	0.962	0.0001
500	Gibbs	-0.0078	0.0725	0.960	0.0551
500	MH	-0.0079	0.0724	0.960	0.1469
1000	MLE	-0.0042	0.0524	0.954	0.0002
1000	Gibbs	-0.0056	0.0528	0.948	0.1114
1000	MH	-0.0059	0.0527	0.952	0.2951
5000	MLE	-0.0023	0.0237	0.950	0.0010
5000	Gibbs	-0.0026	0.0238	0.950	0.5677
5000	MH	-0.0026	0.0237	0.950	1.4849

Table 2: Slope (\beta_1 = 0.5)

N	Method	Bias	RMSE	Coverage	Time (s)
200	MLE	0.0178	0.1370	0.924	0.0001
200	Gibbs	0.0257	0.1409	0.920	0.0219
200	MH	0.0257	0.1409	0.924	0.0593
500	MLE	0.0120	0.0797	0.940	0.0001
500	Gibbs	0.0150	0.0808	0.940	0.0551
500	MH	0.0149	0.0807	0.942	0.1469
1000	MLE	0.0040	0.0554	0.948	0.0002
1000	Gibbs	0.0053	0.0559	0.940	0.1114
1000	MH	0.0056	0.0558	0.942	0.2951
5000	MLE	0.0011	0.0239	0.954	0.0010
5000	Gibbs	0.0014	0.0239	0.946	0.5677
5000	MH	0.0014	0.0240	0.946	1.4849

MH Acceptance Rates

N	Acceptance Rate
200	0.401
500	0.398
1000	0.397
5000	0.396

Key Findings

Consistency. All three estimators are consistent -- bias vanishes as N grows.
Root-N convergence. RMSE decreases at the 1/sqrt(N) rate across all methods.
Nominal coverage. 95% confidence/credible interval coverage falls in [0.92, 0.96] for every scenario.
MLE speed advantage. C++ Newton-Raphson is 342--564x faster than C++ Gibbs sampling.
Bayesian-frequentist agreement. Gibbs and MH posterior means converge to MLE as N increases.
Optimal MH tuning. Acceptance rate is stable at ~0.40, near the theoretical optimum for a 2-dimensional random walk.

Acceptance Criteria

#	Criterion	Result
C1	MLE matches `glm(probit)` within 0.05	max 0.018
C2	Bayesian approximates MLE for N >= 500 within 0.1	max 0.003
C3	MH acceptance rate in [0.20, 0.50]	0.396--0.401
C4	95% CI coverage in [0.90, 0.99]	0.920--0.962
C5	MLE at least 5x faster than Gibbs	342--564x
C6	R CMD check: 0 errors, 0 warnings	0 errors, 0 warnings
C7	Package installs and runs	verified

Run the Simulation Yourself

From R:

library(exampleProbit)

results <- run_probit_simulation(
  N_vec  = c(200L, 500L, 1000L, 5000L),
  R      = 500L,
  seed   = 2026L,
  verbose = TRUE
)

print(results$summary_table)

From the command line:

Rscript inst/simulation/run_simulation.R

Dependencies

R (>= 3.5.0)
Rcpp
RcppArmadillo

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
R		R
figures		figures
inst/simulation		inst/simulation
man		man
src		src
tests		tests
.Rbuildignore		.Rbuildignore
.gitignore		.gitignore
ARCHITECTURE.md		ARCHITECTURE.md
DESCRIPTION		DESCRIPTION
LICENSE		LICENSE
NAMESPACE		NAMESPACE
README.md		README.md
probit_spec.pdf		probit_spec.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

exampleProbit

Installation

Quick Example

Monte Carlo Simulation Results

Design

Results

Table 1: Intercept (\beta_0 = -1)

Table 2: Slope (\beta_1 = 0.5)

MH Acceptance Rates

Key Findings

Acceptance Criteria

Run the Simulation Yourself

Dependencies

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

exampleProbit

Installation

Quick Example

Monte Carlo Simulation Results

Design

Results

Table 1: Intercept (\beta_0 = -1)

Table 2: Slope (\beta_1 = 0.5)

MH Acceptance Rates

Key Findings

Acceptance Criteria

Run the Simulation Yourself

Dependencies

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages