Skip to content

Conversation

@MaxGhenis
Copy link
Contributor

Summary

  • Adds ETL pipeline for state-level individual income tax collections from Census Bureau's Annual Survey of State Government Tax Collections (STC)
  • Uses hardcoded FY2023 data for all 50 states + DC ($531B total)
  • Creates calibration targets using stratum_group_id 7 for state income tax

Supersedes #493 (rebased onto main; original targeted stale db-work branch).

Test plan

  • Run make database to verify ETL executes successfully
  • Check database contains state income tax strata and targets
  • Verify validation passes (state_income_tax variable exists in policyengine-us)

Closes #492

🤖 Generated with Claude Code

Adds ETL pipeline for state-level individual income tax collections
from Census Bureau's Annual Survey of State Government Tax Collections
(STC) using FY2023 data for all 50 states + DC ($531B total).

Recreated from PR #493 rebased onto main.

Closes #492

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
@MaxGhenis
Copy link
Contributor Author

Reviewed. CI passes, code structure is clean.

⚠️ Data accuracy concern: The hardcoded FY2023 values may not match Census STC. Spot-checking via FRED (CAINCTAX, NYINCTAX):

  • CA: PR has $115.8B, FRED shows $96.4B for FY2023
  • NY: PR has $63.2B, FRED shows $58.8B for FY2023

This could be a vintage/table mismatch (STC Table 1 vs FRED series), or the PR values may be from a different fiscal year. Since this is database-only (not consumed by build_loss_matrix() yet), there's no production impact, but the values should be verified against the actual Census STC download before this feeds into calibration.

Otherwise ready to merge.

@MaxGhenis MaxGhenis merged commit 46b3609 into main Jan 31, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add state income tax revenue as calibration target

2 participants