NY Post Gen Z News Consumption — Python Rebuild

A Python rebuild of a 2025 MS Business Analytics capstone analysis at St. John's University, sponsored by the New York Post. The original analysis (logistic regression on survey data, descriptive cross-tabs) was done in R; this rebuild reproduces it end-to-end in Python and adds out-of-sample evaluation along the way.

The result is a faithful replication of the original findings — plus two statistically significant effects the R analysis treated as null.

What this is

The original capstone asked one question: among Gen Z respondents in the NY Metro area, who would be open to "creator-style" news content — short, engaging, presented like their favorite social media creator? A team of four MS students surveyed 520 St. John's students aged 18–26 and modeled the binary outcome with logistic regression. The deliverable was a 16-slide executive deck for the New York Post.

This rebuild reproduces the analysis in Python from scratch. The motivation is direct: the R analysis is good, but R skills don't transfer easily to industry analytics roles. Rebuilding in pandas / statsmodels / scikit-learn proves the bilingual fluency, and the rebuild process is a chance to add the engineering rigor the original timeline didn't allow.

Headline findings

Two effects the original deck missed. The Python regression flags two predictors as statistically significant that the R version reported as null:

Distrust of mainstream outlets is associated with less interest in creator-style news (OR = 0.53, p = 0.044), not more. The intuitive hypothesis is that anti-mainstream respondents would be exactly the segment most open to creators. The data says the opposite — distrust appears to be a rejection of all current formats, not a gateway to alternatives. Substantive read: these respondents don't perceive creators as more trustworthy than traditional outlets.
Age shows a positive linear trend (OR = 1.43 per bucket, p = 0.026). Older Gen Z (24–26) are roughly 2× more likely than 18–20s to want creator-style news, controlling for engagement type, barriers, and platform usage. This reframes the strategic recommendation: the under-served audience is the older end of Gen Z, not the youngest.

The original analysis's main story holds. The two strongest positive predictors — barrier_format ("current news is inconvenient", OR = 2.07, p = 0.001) and barrier_social ("social media already covers my news", OR = 1.56, p = 0.048) — replicate cleanly.

Short-form video dominates across every engagement archetype. Even respondents who classified themselves as "News Avoiders" prefer short-form video (56%) over any other format. This was the deck's signature insight, and the Python heatmap reproduces it cell-for-cell.

How the analysis is structured

The pipeline is three modules with strict separation of concerns:

src/
├── cleaning.py    # Load → drop text columns → filter to Gen Z → flag NY metro
├── features.py    # Recode 27 binary flags + dependent variable + engagement typology
└── modeling.py    # Design matrix → statsmodels Logit → coefficient table with ORs

Every function takes a DataFrame and returns a DataFrame. No global state. The R original repeated the same if_else(!is.na(x) & x != "" & x != "0", 1L, 0L) block 27 times; here that's a single _is_selected helper plus three dictionaries mapping output names to source columns — roughly 30 lines instead of 150.

The two notebooks live in notebooks/:

notebooks/
├── 01_eda.ipynb       # Reproduces deck slides 5, 6, 9 (platforms, format-by-age, heatmap)
└── 02_modeling.ipynb  # Reproduces slide 7 + adds train/test ROC curve

Both render fully inline on GitHub — open them in the browser to see the analysis without running anything.

Tech stack

Data wrangling: pandas 2.x
Modeling: statsmodels (inference: coefficients, p-values, CIs), scikit-learn (train/test split, ROC, AUC)
Visualization: matplotlib, seaborn
Environment: uv for dependency management, Python 3.12
Notebooks: Jupyter via the VS Code Jupyter extension

Reproducing this analysis

git clone https://github.com/[FILL IN: your username]/nypost-gen-z-python.git
cd nypost-gen-z-python
uv sync                              # Installs all dependencies into .venv

# Run the modeling pipeline end-to-end
uv run python -m src.modeling

# Or open the notebooks
uv run jupyter notebook

Original analysis

The 2025 capstone was a team effort by Mya Lamadrid, Mohammed Ahmed, Anthony Onwugbenu, and Paul Rodriguez, advised by our capstone faculty advisor at St. John's University, The Peter J. Tobin College of Business, sponsored by the New York Post. The original R analysis and executive deck are the team's and sponsor's work; this repository is a personal Python rebuild for portfolio purposes.

What this demonstrates

Bilingual R/Python fluency on the same dataset — same model, same findings, plus more
Modular package structure with separation of concerns (cleaning → features → modeling)
Dictionary-driven feature engineering replacing repeated R boilerplate
Statistical inference with statsmodels (publication-quality coefficient tables with CIs)
Held-out test methodology with scikit-learn that the R version doesn't include
Analytical communication: notebooks that walk a reader through the findings with deck-faithful visualizations

Future work

K-means clustering for data-driven personas. The deck's two personas were synthesized in PowerPoint; a proper Python version would derive them with elbow + silhouette validation.
5-fold cross-validation for a more stable AUC estimate. The single-split ROC point estimate has wide uncertainty around it.
Sensitivity analysis on the inclusive outcome (top-3-box of Q11). Strict and inclusive outcomes should produce consistent coefficient signs.
Unit tests in tests/ for the _is_selected helper and the design matrix construction.
Streamlit dashboard that lets a user input a hypothetical respondent profile and see the predicted probability + the factors driving it.

About the author

Built by Paul Rodriguez — finance and analytics professional with a background in federal grants management, Big Four tax, and international finance consulting. LinkedIn · GitHub

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
data		data
notebooks		notebooks
outputs/figures		outputs/figures
src		src
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

NY Post Gen Z News Consumption — Python Rebuild

What this is

Headline findings

How the analysis is structured

Tech stack

Reproducing this analysis

Original analysis

What this demonstrates

Future work

About the author

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

NY Post Gen Z News Consumption — Python Rebuild

What this is

Headline findings

How the analysis is structured

Tech stack

Reproducing this analysis

Original analysis

What this demonstrates

Future work

About the author

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages