Area Code Analysis: Demographic and Geographic Patterns in US Telecommunications Infrastructure

This project investigates the historical allocation of area codes under the 1947 North American Numbering Plan (NANP) to uncover whether the design of telecommunications infrastructure reflected systemic social and demographic biases. Using spatial joins, historical census data from the 1950s, and statistical modeling, the analysis maps original area code assignments to U.S. counties and analyzes patterns based on population size, race, and geographic distribution.

Key Findings

Population-Based Assignments: More populated areas received easier-to-dial codes, especially in states with multiple area codes to choose from.
Racial Disparities: In high-density areas, codes serving predominantly Black populations were disproportionately harder to dial, suggesting potential bias in infrastructure design.
Predictive Factors: Birth rates in the 1950s were strongly predictive of future area code splits, demonstrating how demographic indicators shaped long-term telecommunications demand.
Population and Dial Effort Correlation: A strong negative correlation (r = -0.688) was found between population and dialing effort in multi-code states, while single-code states showed a much weaker relationship (r = -0.164).

Project Components

The analysis is structured in five components:

Assigning Area Codes to Counties: Using spatial joins to map cities to counties and determine original area code assignments.
Summarizing Regional Characteristics: Aggregating demographic and telephone usage data at the area code level.
Dialing Effort and Population Analysis: Quantifying the relationship between population density and the physical effort required to dial area codes on rotary phones.
Racial Disparities in Dialing Effort: Investigating whether regions with higher Black populations received harder-to-dial codes.
Predicting Area Code Growth: Building models to predict future area code splits based on 1950s demographic and infrastructural features.

Implementation

This project has been implemented in two programming environments:

R/Quarto: The original implementation using R statistical packages, tidyverse, and spatial analysis libraries. See Project.qmd.
Python/Jupyter: A parallel implementation reproducing the same analysis using Python data science libraries (pandas, geopandas, scikit-learn, etc.). See Area_Code_Analysis.ipynb.

Both implementations produce equivalent results, confirming the robustness of the findings.

Project Structure

Project.qmd: R-based Quarto document containing the analysis code and documentation
Project.html: HTML output of the R-based analysis
Area_Code_Analysis.ipynb: Python implementation of the analysis
Area_Code_Analysis_executed.ipynb: Python implementation with execution results
final_project_data/: Directory containing datasets used in the analysis
- cities_area_codes.csv: Dataset linking cities to their area codes
- county_census_info.csv: Census information for counties
- merged_counties_since_1950.csv: Information on counties that have merged since 1950
- new_counties_since_1950.csv: Information on new counties formed since 1950
- splits_overlays.xlsx: Data on area code splits and overlays
- co99_d00_shp/: Shapefile data for county boundaries
images/: Visualizations for the README
Project_files/figure-html/: Visualizations generated from the R analysis

How to Use

R/Quarto Version

Open the Project.html file to view the complete analysis with visualizations.
The Project.qmd file can be opened and modified in RStudio or any Quarto-compatible editor to reproduce or extend the analysis.

Python Version

Ensure you have all the required dependencies installed:

pip install pandas numpy matplotlib seaborn geopandas scikit-learn statsmodels xgboost

Open Area_Code_Analysis.ipynb in Jupyter Notebook or JupyterLab to run the analysis.
For proper execution with geopandas, you may need to install it with:
```
python -m pip install geopandas
```

Technologies Used

R Version: R for statistical analysis, ggplot2 for visualization, sf for spatial operations, tidymodels for predictive modeling, and Quarto for document preparation.
Python Version: pandas for data manipulation, geopandas for spatial operations, matplotlib/seaborn for visualization, scikit-learn and statsmodels for statistical modeling.

Conclusion

This project demonstrates how design decisions—even numeric ones like area code assignments—can have long-lasting social consequences and reflect underlying societal biases. The findings provide a data-driven lens on infrastructure fairness and reveal patterns in how telecommunications resources were allocated in mid-century America.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
.vscode		.vscode
Project_files		Project_files
final_project_data		final_project_data
images		images
.DS_Store		.DS_Store
.Rhistory		.Rhistory
Area_Code_Analysis.ipynb		Area_Code_Analysis.ipynb
Area_Code_Analysis_executed.ipynb		Area_Code_Analysis_executed.ipynb
Project.html		Project.html
Project.qmd		Project.qmd
Project.zip		Project.zip
README.md		README.md
final_project_prompt.html		final_project_prompt.html

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Area Code Analysis: Demographic and Geographic Patterns in US Telecommunications Infrastructure

Key Findings

Project Components

Implementation

Project Structure

How to Use

R/Quarto Version

Python Version

Technologies Used

Conclusion

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Area Code Analysis: Demographic and Geographic Patterns in US Telecommunications Infrastructure

Key Findings

Project Components

Implementation

Project Structure

How to Use

R/Quarto Version

Python Version

Technologies Used

Conclusion

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages