Why Remote Work Stuck: Structural Analysis of Remote Work Persistence

Author: Mitchell Valdes-Bobes Institution: University of Wisconsin-Madison Status: Working Paper Data Coverage: 2013-2025

Research Overview

This project investigates why remote work adoption increased dramatically during COVID-19 and then stabilized at levels significantly higher than pre-pandemic, rather than returning to baseline. Using a structural search-and-matching model with worker heterogeneity in remote work preferences and job heterogeneity in teleworkability, I decompose the persistence of remote work into: (1) technology improvements, (2) preference shifts, and (3) sorting mechanisms.

Key Finding

Remote work persistence is primarily driven by technology improvements in teleworkability (60%) rather than preference changes (30%) or improved sorting (10%). This suggests policy interventions targeting workplace infrastructure may be more effective than those targeting worker preferences.

Methods

Statistical Matching: ML-based imputation to merge SIPP remote work intensity with CPS earnings data
Structural Estimation: Simulated Method of Moments (SMM) with genetic algorithm optimization
Data Pipeline: Integrated processing of 73GB multi-source data (SIPP, CPS, ATUS, O*NET)
High-Performance Computing: Parallel estimation using 128 cores, 512 population GA

Technical Highlights

1. Large-Scale Data Engineering (73GB Pipeline)

Data Sources Integrated:

SIPP (Survey of Income and Program Participation): 131MB processed, remote work intensity measures
CPS (Current Population Survey): 65GB processed, main analysis sample with earnings
ATUS (American Time Use Survey): Pre-2022 telework validation
O*NET: Occupation-level teleworkability scores

Statistical Matching:

LightGBM-based machine learning imputation
Harmonized variables across datasets (education, age, occupation, geography)
Cross-validation: R² = 0.68 for remote work intensity prediction

2. Structural Model Estimation

Model Features:

Search and matching framework with worker-job heterogeneity
Workers differ in remote work preferences (z ~ LogNormal)
Jobs differ in teleworkability (ψ ~ distribution)
Optimal choice of remote work intensity (α ∈ [0,1])

Estimation Method:

Genetic algorithm optimization (100 generations, 512 population)
Simulated Method of Moments with bootstrap variance weighting
11 empirical moments matched (wage distribution, remote work shares, sorting patterns)
~30 minutes per estimation run on HPC cluster

3. Code Architecture

Languages Used:

Julia: Structural model, optimization, moment computation (47 files)
Python: Data processing, statistical matching, ML imputation (20 files)
Stata: Wage regressions, robustness checks (18 files)

Key Modules:

src/structural_model/: Economic model implementation (9 modules)
src/optimization/: GA and multi-start local search
src/empirical/: Moments estimation and statistical matching
src/data/: Multi-source data acquisition and processing

Repository Structure

why_remote_work_stuck/
│
├── docs/
│   ├── technical/
│   │   ├── EMPIRICAL_PIPELINE.md      # Complete data processing documentation
│   │   ├── DATA_DICTIONARY.md         # Variable definitions and sources
│   │   └── STRUCTURAL_MODEL.md        # Model specification and estimation
│   ├── development_logs/
│   │   └── optimization_oct2025.md    # Parameter estimation session notes
│   └── outputs/
│       ├── main.pdf                   # Latest paper
│       └── slides.pdf                 # Latest presentation
│
├── src/
│   ├── data/                          # Data acquisition & processing (Python)
│   │   ├── sipp_process.py            # SIPP remote work metrics
│   │   ├── ipums_process.py           # CPS/ATUS processing
│   │   └── get_fred_q_theta.py        # Labor market tightness
│   ├── empirical/                     # Moments & statistical matching
│   │   ├── stat_matching/             # SIPP→CPS ML imputation
│   │   ├── data_moments_core.jl       # Moment computation
│   │   └── wfh_wage_facts/            # Wage-remote work regressions
│   ├── structural_model/              # Economic model (Julia)
│   │   ├── ModelSetup.jl              # Parameter initialization
│   │   ├── ModelSolver.jl             # Equilibrium computation
│   │   ├── ModelEconomics.jl          # Production & matching
│   │   └── GeneticAlgorithm.jl        # Custom GA implementation
│   ├── optimization/                  # Parameter estimation
│   │   ├── OptimizationObjective.jl   # SMM objective function
│   │   └── multi_start_local_search.jl
│   └── reporting/                     # Results & decomposition
│
├── data/                              # 73GB processed data (see below)
│   ├── processed/
│   │   ├── empirical/                 # Estimation-ready datasets
│   │   ├── cps/                       # CPS with imputed remote work
│   │   ├── sipp/                      # SIPP person-year files
│   │   └── harmonized/                # Cross-dataset harmonization
│   └── aux/                           # Crosswalks and auxiliary data
│
├── results/
│   ├── global_optimization/           # GA estimation outputs
│   │   └── final/
│   │       ├── *_3144287_best.yaml    # Best parameters (2019)
│   │       └── *_3144287.json         # Full optimization history
│   └── bootstrap_moments/             # Bootstrap variance estimates
│
├── manuscript/
│   └── final_document/
│       ├── main.pdf                   # Compiled paper
│       ├── main.tex                   # LaTeX source
│       └── figures/                   # Publication figures
│
└── presentations/
    └── QMW_10202025/
        ├── slides.pdf                 # Queen Mary Workshop slides
        └── slides.tex                 # Beamer source

Quick Access

📄 Paper PDF - Latest manuscript version 📊 Slides PDF - Queen Mary Workshop presentation 📖 Technical Documentation - Pipeline and data documentation 💾 Estimation Results - Parameter estimates and diagnostics

Data Availability

Processed data (73GB) is included in this repository at data/processed/. This ensures full reproducibility without re-running expensive data acquisition steps.

Key Processed Files

File	Size	Description
`data/processed/empirical/simulation_scaffolding_all_years.feather`	5.3GB	Estimation-ready simulation data
`data/processed/cps/cps_processed.csv`	65GB	CPS with imputed ALPHA (remote work intensity)
`data/processed/sipp/sipp_py_B.csv.gz`	131MB	SIPP person-year (worked-mass weights)
`data/processed/empirical/cps_mi_ready.dta`	-	Stata-ready CPS for analysis
`data/aux/fred_q_theta.csv`	-	Quarterly labor market tightness

Note: For GitHub deployment, consider using Git LFS or hosting data separately (Zenodo, institutional server). See docs/DATA_AVAILABILITY.md for options.

Estimation Results Summary

2019 Baseline Parameters (Job 3144287)

Preference Parameters:

μ_z = -1.926 (mean taste for remote work, LogNormal)
σ_z = 0.744 (preference dispersion)

Technology Parameters:

ψ₀ = 0.772 (baseline teleworkability)
A₁ = 24.60 (skill productivity)

Model Fit:

mean_alpha: 0.0697 vs data 0.0675 (3% error) ✅
in-person share: 0.9187 vs data 0.8867
Objective value: 6.97 (SMM weighted distance)

Estimation Details:

Method: Genetic algorithm (100 generations, 512 population)
Runtime: 30 minutes on 128 cores
Convergence: 511/512 individuals per generation

Requirements

Software Dependencies

Python (3.10+):

pip install -r requirements/python_requirements.txt
# Key packages: polars, pandas, scikit-learn, lightgbm

Julia (1.8+):

using Pkg
Pkg.activate(".")
Pkg.instantiate()
# Key packages: DataFrames, Arrow, Optim, StatsBase

Stata (17+):

See requirements/stata_requirements.txt for user-written packages
reghdfe, estout, gtools

System Requirements

Storage: 73GB for processed data
Memory: 32GB RAM recommended for full sample analysis
Compute: HPC cluster recommended for estimation (uses SLURM)

Quick Start

1. Explore Processed Data

using Arrow, DataFrames

# Load estimation-ready simulation data
scaff = Arrow.Table("data/processed/simulation_scaffolding_all_years.feather") |> DataFrame

# Check remote work shares by year
by_year = combine(groupby(scaff, :year), :alpha => mean, :remote => mean)

2. View Estimation Results

# View best parameters from optimization
cat results/global_optimization/final/global_optimization_modified_2019_3144287_best.yaml

3. Run Model with Estimated Parameters

# Load model and estimated parameters
include("src/structural_model/ModelInterface.jl")

# Quick model test
include("test_model_quick.jl")

4. Reproduce Moments

# Compute empirical moments
include("src/empirical/data_moments_core.jl")

Skills Demonstrated

Data Engineering

Large-scale data pipeline (73GB multi-source integration)
Statistical matching across incompatible surveys
ETL workflows with data validation
Efficient storage formats (Arrow/Feather, compressed CSV)

Statistical & Econometric Methods

Simulated Method of Moments estimation
Bootstrap variance estimation
Machine learning imputation (LightGBM)
Regression analysis with high-dimensional fixed effects

Computational Methods

Genetic algorithm implementation
Parallel computing (multi-core optimization)
High-performance Julia programming
SLURM job scheduling and monitoring

Software Engineering

Modular code architecture
Version control (Git)
Reproducible research practices
Comprehensive documentation

Economic Modeling

Search and matching models
Worker-job heterogeneity
Equilibrium computation
Counterfactual analysis

Project Timeline

2023-Q1: Data acquisition and processing pipeline
2023-Q2: Statistical matching methodology development
2023-Q3: Structural model specification
2024-Q1: Initial parameter estimation
2024-Q2: Model refinement and robustness checks
2024-Q3: Decomposition analysis
2024-Q4: Manuscript preparation
2025-Q1: Optimization refinement (latest)

Citation

If you use this code or data, please cite:

@unpublished{valdesbobes2025remote,
  author = {Valdes-Bobes, Mitchell},
  title = {Why Remote Work Stuck: A Structural Analysis of Remote Work Persistence},
  institution = {University of Wisconsin-Madison},
  year = {2025},
  note = {Working Paper}
}

Or use the included CITATION.cff file.

License

Code: MIT License (see LICENSE)
Data: Original data sources have separate licenses (IPUMS, Census Bureau)

Contact

Mitchell Valdes-Bobes Department of Economics University of Wisconsin-Madison

For questions about the code or data, please open an issue in this repository.

Acknowledgments

This research uses data from:

IPUMS CPS (Flood et al. 2024)
Survey of Income and Program Participation (U.S. Census Bureau)
O*NET (U.S. Department of Labor)
FRED (Federal Reserve Economic Data)

Computational resources provided by the Center for High Throughput Computing at UW-Madison.

Version: 1.0 Last Updated: November 2, 2025 Repository: why_remote_work_stuck

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.history		.history
data		data
docs		docs
figures/output		figures/output
manuscript		manuscript
presentations		presentations
requirements		requirements
results		results
scripts		scripts
src		src
.gitignore		.gitignore
CITATION.cff		CITATION.cff
DATA_AVAILABILITY.md		DATA_AVAILABILITY.md
GITHUB_DEPLOYMENT_GUIDE.md		GITHUB_DEPLOYMENT_GUIDE.md
LICENSE		LICENSE
README.md		README.md

Folders and files

Latest commit

History

Repository files navigation

Why Remote Work Stuck: Structural Analysis of Remote Work Persistence

Research Overview

Key Finding

Methods

Technical Highlights

1. Large-Scale Data Engineering (73GB Pipeline)

2. Structural Model Estimation

3. Code Architecture

Repository Structure

Quick Access

Data Availability

Key Processed Files

Estimation Results Summary

2019 Baseline Parameters (Job 3144287)

Requirements

Software Dependencies

System Requirements

Quick Start

1. Explore Processed Data

2. View Estimation Results

3. Run Model with Estimated Parameters

4. Reproduce Moments

Skills Demonstrated

Data Engineering

Statistical & Econometric Methods

Computational Methods

Software Engineering

Economic Modeling

Project Timeline

Citation

License

Contact

Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages