Author: Mitchell Valdes-Bobes Institution: University of Wisconsin-Madison Status: Working Paper Data Coverage: 2013-2025
This project investigates why remote work adoption increased dramatically during COVID-19 and then stabilized at levels significantly higher than pre-pandemic, rather than returning to baseline. Using a structural search-and-matching model with worker heterogeneity in remote work preferences and job heterogeneity in teleworkability, I decompose the persistence of remote work into: (1) technology improvements, (2) preference shifts, and (3) sorting mechanisms.
Remote work persistence is primarily driven by technology improvements in teleworkability (60%) rather than preference changes (30%) or improved sorting (10%). This suggests policy interventions targeting workplace infrastructure may be more effective than those targeting worker preferences.
- Statistical Matching: ML-based imputation to merge SIPP remote work intensity with CPS earnings data
- Structural Estimation: Simulated Method of Moments (SMM) with genetic algorithm optimization
- Data Pipeline: Integrated processing of 73GB multi-source data (SIPP, CPS, ATUS, O*NET)
- High-Performance Computing: Parallel estimation using 128 cores, 512 population GA
Data Sources Integrated:
- SIPP (Survey of Income and Program Participation): 131MB processed, remote work intensity measures
- CPS (Current Population Survey): 65GB processed, main analysis sample with earnings
- ATUS (American Time Use Survey): Pre-2022 telework validation
- O*NET: Occupation-level teleworkability scores
Statistical Matching:
- LightGBM-based machine learning imputation
- Harmonized variables across datasets (education, age, occupation, geography)
- Cross-validation: R² = 0.68 for remote work intensity prediction
Model Features:
- Search and matching framework with worker-job heterogeneity
- Workers differ in remote work preferences (z ~ LogNormal)
- Jobs differ in teleworkability (ψ ~ distribution)
- Optimal choice of remote work intensity (α ∈ [0,1])
Estimation Method:
- Genetic algorithm optimization (100 generations, 512 population)
- Simulated Method of Moments with bootstrap variance weighting
- 11 empirical moments matched (wage distribution, remote work shares, sorting patterns)
- ~30 minutes per estimation run on HPC cluster
Languages Used:
- Julia: Structural model, optimization, moment computation (47 files)
- Python: Data processing, statistical matching, ML imputation (20 files)
- Stata: Wage regressions, robustness checks (18 files)
Key Modules:
src/structural_model/: Economic model implementation (9 modules)src/optimization/: GA and multi-start local searchsrc/empirical/: Moments estimation and statistical matchingsrc/data/: Multi-source data acquisition and processing
why_remote_work_stuck/
│
├── docs/
│ ├── technical/
│ │ ├── EMPIRICAL_PIPELINE.md # Complete data processing documentation
│ │ ├── DATA_DICTIONARY.md # Variable definitions and sources
│ │ └── STRUCTURAL_MODEL.md # Model specification and estimation
│ ├── development_logs/
│ │ └── optimization_oct2025.md # Parameter estimation session notes
│ └── outputs/
│ ├── main.pdf # Latest paper
│ └── slides.pdf # Latest presentation
│
├── src/
│ ├── data/ # Data acquisition & processing (Python)
│ │ ├── sipp_process.py # SIPP remote work metrics
│ │ ├── ipums_process.py # CPS/ATUS processing
│ │ └── get_fred_q_theta.py # Labor market tightness
│ ├── empirical/ # Moments & statistical matching
│ │ ├── stat_matching/ # SIPP→CPS ML imputation
│ │ ├── data_moments_core.jl # Moment computation
│ │ └── wfh_wage_facts/ # Wage-remote work regressions
│ ├── structural_model/ # Economic model (Julia)
│ │ ├── ModelSetup.jl # Parameter initialization
│ │ ├── ModelSolver.jl # Equilibrium computation
│ │ ├── ModelEconomics.jl # Production & matching
│ │ └── GeneticAlgorithm.jl # Custom GA implementation
│ ├── optimization/ # Parameter estimation
│ │ ├── OptimizationObjective.jl # SMM objective function
│ │ └── multi_start_local_search.jl
│ └── reporting/ # Results & decomposition
│
├── data/ # 73GB processed data (see below)
│ ├── processed/
│ │ ├── empirical/ # Estimation-ready datasets
│ │ ├── cps/ # CPS with imputed remote work
│ │ ├── sipp/ # SIPP person-year files
│ │ └── harmonized/ # Cross-dataset harmonization
│ └── aux/ # Crosswalks and auxiliary data
│
├── results/
│ ├── global_optimization/ # GA estimation outputs
│ │ └── final/
│ │ ├── *_3144287_best.yaml # Best parameters (2019)
│ │ └── *_3144287.json # Full optimization history
│ └── bootstrap_moments/ # Bootstrap variance estimates
│
├── manuscript/
│ └── final_document/
│ ├── main.pdf # Compiled paper
│ ├── main.tex # LaTeX source
│ └── figures/ # Publication figures
│
└── presentations/
└── QMW_10202025/
├── slides.pdf # Queen Mary Workshop slides
└── slides.tex # Beamer source
📄 Paper PDF - Latest manuscript version 📊 Slides PDF - Queen Mary Workshop presentation 📖 Technical Documentation - Pipeline and data documentation 💾 Estimation Results - Parameter estimates and diagnostics
Processed data (73GB) is included in this repository at data/processed/. This ensures full reproducibility without re-running expensive data acquisition steps.
| File | Size | Description |
|---|---|---|
data/processed/empirical/simulation_scaffolding_all_years.feather |
5.3GB | Estimation-ready simulation data |
data/processed/cps/cps_processed.csv |
65GB | CPS with imputed ALPHA (remote work intensity) |
data/processed/sipp/sipp_py_B.csv.gz |
131MB | SIPP person-year (worked-mass weights) |
data/processed/empirical/cps_mi_ready.dta |
- | Stata-ready CPS for analysis |
data/aux/fred_q_theta.csv |
- | Quarterly labor market tightness |
Note: For GitHub deployment, consider using Git LFS or hosting data separately (Zenodo, institutional server). See docs/DATA_AVAILABILITY.md for options.
Preference Parameters:
- μ_z = -1.926 (mean taste for remote work, LogNormal)
- σ_z = 0.744 (preference dispersion)
Technology Parameters:
- ψ₀ = 0.772 (baseline teleworkability)
- A₁ = 24.60 (skill productivity)
Model Fit:
- mean_alpha: 0.0697 vs data 0.0675 (3% error) ✅
- in-person share: 0.9187 vs data 0.8867
- Objective value: 6.97 (SMM weighted distance)
Estimation Details:
- Method: Genetic algorithm (100 generations, 512 population)
- Runtime: 30 minutes on 128 cores
- Convergence: 511/512 individuals per generation
Python (3.10+):
pip install -r requirements/python_requirements.txt
# Key packages: polars, pandas, scikit-learn, lightgbmJulia (1.8+):
using Pkg
Pkg.activate(".")
Pkg.instantiate()
# Key packages: DataFrames, Arrow, Optim, StatsBaseStata (17+):
- See
requirements/stata_requirements.txtfor user-written packages reghdfe,estout,gtools
- Storage: 73GB for processed data
- Memory: 32GB RAM recommended for full sample analysis
- Compute: HPC cluster recommended for estimation (uses SLURM)
using Arrow, DataFrames
# Load estimation-ready simulation data
scaff = Arrow.Table("data/processed/simulation_scaffolding_all_years.feather") |> DataFrame
# Check remote work shares by year
by_year = combine(groupby(scaff, :year), :alpha => mean, :remote => mean)# View best parameters from optimization
cat results/global_optimization/final/global_optimization_modified_2019_3144287_best.yaml# Load model and estimated parameters
include("src/structural_model/ModelInterface.jl")
# Quick model test
include("test_model_quick.jl")# Compute empirical moments
include("src/empirical/data_moments_core.jl")- Large-scale data pipeline (73GB multi-source integration)
- Statistical matching across incompatible surveys
- ETL workflows with data validation
- Efficient storage formats (Arrow/Feather, compressed CSV)
- Simulated Method of Moments estimation
- Bootstrap variance estimation
- Machine learning imputation (LightGBM)
- Regression analysis with high-dimensional fixed effects
- Genetic algorithm implementation
- Parallel computing (multi-core optimization)
- High-performance Julia programming
- SLURM job scheduling and monitoring
- Modular code architecture
- Version control (Git)
- Reproducible research practices
- Comprehensive documentation
- Search and matching models
- Worker-job heterogeneity
- Equilibrium computation
- Counterfactual analysis
- 2023-Q1: Data acquisition and processing pipeline
- 2023-Q2: Statistical matching methodology development
- 2023-Q3: Structural model specification
- 2024-Q1: Initial parameter estimation
- 2024-Q2: Model refinement and robustness checks
- 2024-Q3: Decomposition analysis
- 2024-Q4: Manuscript preparation
- 2025-Q1: Optimization refinement (latest)
If you use this code or data, please cite:
@unpublished{valdesbobes2025remote,
author = {Valdes-Bobes, Mitchell},
title = {Why Remote Work Stuck: A Structural Analysis of Remote Work Persistence},
institution = {University of Wisconsin-Madison},
year = {2025},
note = {Working Paper}
}Or use the included CITATION.cff file.
- Code: MIT License (see
LICENSE) - Data: Original data sources have separate licenses (IPUMS, Census Bureau)
- Paper: All rights reserved
Mitchell Valdes-Bobes Department of Economics University of Wisconsin-Madison
For questions about the code or data, please open an issue in this repository.
This research uses data from:
- IPUMS CPS (Flood et al. 2024)
- Survey of Income and Program Participation (U.S. Census Bureau)
- O*NET (U.S. Department of Labor)
- FRED (Federal Reserve Economic Data)
Computational resources provided by the Center for High Throughput Computing at UW-Madison.
Version: 1.0 Last Updated: November 2, 2025 Repository: why_remote_work_stuck