Last Updated: 2025-11-15 Project: Access - Spatial Accessibility Analysis for Conservation Lands
Recent Completions:
- ✅ TD-001: Python 3.10 Version Lock (2025-01-XX)
- ✅ TD-002: Outdated OSMnx Version (2025-01-XX)
- ✅ TD-009: Dependency Security Scanning (2025-11-15)
- ✅ IMP-005: Code Quality Tooling (2025-11-15)
- ✅ IMP-009: Enhanced Print Layouts (2025-11-15)
- ✅ IMP-006: Webmap Enhancements (2025-11-09)
- ✅ FR-003: Mobile-Friendly Webmap (2025-11-09)
- 🔄 TD-007: Error Handling Strategy - Partial (2025-11-15)
- 🔄 IMP-004: Improved Logging and Monitoring - Partial (2025-11-15)
- 🔄 IMP-003: Documentation Improvements - Partial (2025-11-15)
This document consolidates technical debt, feature requests, and improvements identified through comprehensive project analysis. Items are categorized by type, priority, and estimated effort.
Priority: High Effort: Medium (16-24 hours) Status: ✅ COMPLETED (2025-01-XX) Category: Dependencies
Description:
The project is currently locked to Python 3.10 (requires-python = ">=3.10,<3.11"). This restriction prevents:
- Using Python 3.11+ performance improvements (20-25% faster)
- Access to newer language features (PEP 657 error locations, exception groups)
- Security updates and bug fixes in newer Python versions
Impact:
- Missing significant performance gains for CPU-intensive walk time calculations
- Inability to leverage newer Python ecosystem features
- Potential security vulnerabilities as Python 3.10 approaches end-of-life (October 2026)
Completed Implementation:
- ✅ Updated
pyproject.tomlto require Python>=3.11 - ✅ Updated tool configurations (Black, Ruff, mypy) to support Python 3.11+
- ✅ Updated
.python-versionfile to 3.11 - ✅ Updated CI/CD workflows (code-quality.yml, security.yml) to use Python 3.11
- ✅ Tested compatibility - all dependencies support Python 3.11+
- ✅ Verified with
uv sync- successfully installed Python 3.11.14 and all dependencies
Dependencies:
- All package dependencies support Python 3.11+ ✅
- OSMnx 2.0.6 supports Python 3.11+ ✅
References:
Priority: Medium Effort: Medium (12-16 hours) Status: ✅ COMPLETED (2025-01-XX) Category: Dependencies
Description: Project uses OSMnx 1.3.0 (pinned), but latest stable version is 2.0+ (as of 2025). Newer versions include:
- Performance optimizations for large graphs
- Better error handling and logging
- Improved graph simplification algorithms
- Enhanced coordinate system handling
- Better integration with modern GeoDataFrames
Impact:
- Missing performance improvements for graph operations
- Potential compatibility issues with newer geopandas/networkx versions
- Missing bug fixes and security updates
Completed Implementation:
- ✅ Updated OSMnx version in
pyproject.tomlfrom==1.3.0to>=2.0.0 - ✅ Verified latest version is 2.0.6 (installed successfully)
- ✅ Tested API compatibility - all functions used in codebase are available:
ox.load_graphml()✅ox.project_graph()✅ox.graph_from_place()✅ox.save_graphml()✅ox.graph_to_gdfs()✅ox.settings.cache_folder✅ox.settings.log_console✅
- ✅ Verified imports work correctly with OSMnx 2.0.6
- ✅ No code changes required - API is backward compatible
Note: Existing cached .graphml files should be compatible, but may benefit from regeneration with the newer version for optimal performance.
References:
Priority: Medium Effort: Small (4-8 hours) Status: ✅ COMPLETED (2025-11-15) Category: Code Quality
Description:
The src/h3/ module used an inconsistent import pattern due to naming conflict with the installed h3 library. This has been resolved by renaming the module to src/h3_utils/.
Completed Implementation:
- ✅ Renamed
src/h3/tosrc/h3_utils/ - ✅ Updated all imports throughout codebase (
src/run_pipeline.py,run_pipeline.sh,README.md) - ✅ Updated
pyproject.tomlto includeh3_utilsin packages list - ✅ Updated documentation (
README.md) - ✅ Removed mypy exclude for h3 module (no longer needed)
- ✅ Updated pre-commit configuration
Note: Some legacy notebooks still use from h3utils import * (referring to src/h3utils.py, a separate utility file). The src/h3_utils/ package directory is properly renamed and used throughout the main codebase.
Files Modified:
src/h3_utils/(renamed fromsrc/h3/)src/run_pipeline.py- Updated importrun_pipeline.sh- Updated importREADME.md- Updated documentationpyproject.toml- Added to packages, removed exclude.pre-commit-config.yaml- Removed h3 exclude
Priority: High Effort: Large (40-60 hours) Category: Testing
Description: Current test suite has significant gaps:
- Only 4 test files exist (
test_walk_times.py,test_merging.py,test_config.py,test_analysis.py) - No tests for visualization module
- No tests for H3 module
- No tests for data update/validation scripts
- No integration tests for full pipeline
- No tests for PMTiles conversion
- Missing edge case testing
Current Coverage Gaps:
src/visualization/- 0% coveragesrc/h3_utils/- 0% coveragesrc/update_data_sources.py- 0% coveragesrc/validate_data.py- 0% coveragesrc/convert_to_pmtiles.py- 0% coveragesrc/crop_cejst_to_state.py- 0% coveragesrc/probe_data_sources.py- 0% coverage- Integration/end-to-end tests - 0% coverage
Impact:
- High risk of regressions when making changes
- Difficult to refactor with confidence
- Hard to validate bug fixes
- No automated quality gates for CI/CD
Solution:
- Add tests for visualization module (figures.py)
- Add tests for H3 module (relationship.py, joins.py, h3j.py)
- Add tests for data management scripts
- Add integration tests for pipeline
- Set up pytest-cov reporting
- Establish minimum coverage threshold (e.g., 80%)
- Add tests to CI/CD pipeline
Priority Tasks:
- Test critical path: walk time calculations (expand existing)
- Test data validation and schema checking
- Test H3 relationship file generation
- Integration test for
run_pipeline.py
Priority: Medium Effort: Medium (16-24 hours) Category: Code Quality
Description: Many scripts contain hard-coded paths and magic strings that make them brittle and hard to maintain:
- File paths like
data/graphs/maine_walk.graphmlrepeated across multiple files - Maine-specific logic (should use
RegionConfig) - Magic strings for column names (
"GEOID20","osmid","AC_10") - No centralized configuration for defaults
Examples:
# run_pipeline.sh line 42-44
graph_path='data/graphs/maine_walk.graphml'
conserved_lands_path='data/conserved_lands/Maine_Conserved_Lands_with_nodes.shp.zip'Impact:
- Difficult to extend to other states
- Error-prone when paths change
- Hard to test with different configurations
- Code duplication
Solution:
- Extend
RegionConfigto include all data paths - Create configuration module for column name constants
- Remove hard-coded "Maine" references
- Use configuration throughout all scripts
- Update documentation with configuration examples
Files to Refactor:
run_pipeline.shsrc/run_pipeline.py- All processing scripts (
find_centroids.py,convert_to_pmtiles.py, etc.) - Notebooks that reference specific files
Priority: Medium Effort: Large (30-40 hours) Category: Data Format / Technical Architecture
Description:
Project heavily relies on shapefile format (.shp, .shp.zip) which is:
- Legacy format with known limitations (10-char field names, 2GB file size limit)
- Slower to read/write compared to modern formats
- Multiple files per dataset (.shp, .shx, .dbf, .prj, etc.)
- Less efficient for large datasets
Modern alternatives exist:
- GeoParquet: Columnar format, fast, supports complex types
- GeoPackage: SQLite-based, single-file, OGC standard
- FlatGeobuf: Streaming format, cloud-optimized
Current State:
- ✅
src/migrate_to_geoparquet.pyexists with conversion functionality - ❌ Migration utility not integrated into pipeline
- ❌ All processing still uses shapefiles
- ✅ PMTiles conversion works from shapefiles
Impact:
- Slower I/O performance for large datasets
- Field name truncation issues
- Multiple files to manage per dataset
- Not cloud-optimized
Solution:
- Complete GeoParquet migration utility
- Add support for reading GeoParquet in all processing functions
- Update pipeline to use GeoParquet internally
- Keep shapefile support for backward compatibility
- Update documentation
- Benchmark performance improvements
Migration Path:
- Phase 1: Support both formats (read/write)
- Phase 2: Default to GeoParquet for new data
- Phase 3: Migrate existing datasets
- Phase 4: Deprecate shapefile as primary format
References:
- GeoParquet Specification
src/migrate_to_geoparquet.py(exists with basic conversion functionality)
Priority: High Effort: Medium (20-30 hours) → 12-18 hours remaining Status: 🔄 IN PROGRESS (2025-11-15) Category: Error Handling / Logging
Description: Inconsistent error handling and logging across the codebase:
- Some functions log errors, others don't
- No centralized exception handling
- Unclear error messages for users
- No error recovery mechanisms
- Failed operations may leave partial data
Progress (2025-11-15):
- ✅ Fixed empty except blocks in
changelog.py(2 locations) - ✅ Fixed empty except blocks in
probe_data_sources.py(2 locations) - ✅ Added proper error logging with context messages
- ✅ Consistent logging patterns established (see DEVELOPMENT.md)
- ❌ Custom exception hierarchy not yet created
- ❌ Retry logic for network operations not yet implemented
- ❌ Pipeline validation checkpoints not yet added
Examples of Issues:
- What happens if OSMnx graph download fails mid-process?
- How are missing geometries handled in walk time calculations?
- What if Census API rate limit is hit?
- No validation of intermediate outputs
Impact:
- Hard to debug failures
- Users don't know why operations failed
- Data corruption risks
- Poor user experience
Remaining Work:
- ❌ Create custom exception hierarchy
- ❌ Add validation checkpoints in pipeline
- ❌ Implement retry logic for network operations
- ❌ Add data validation before/after processing steps
- ❌ Create error recovery guide for common failures
- ❌ Add structured logging (JSON) for monitoring
Specific Improvements:
- Add transaction-like behavior for data updates
- Validate schemas before/after transformations
- Add progress checkpoints and resume capability
- Create troubleshooting guide
Priority: Medium Effort: Medium (16-24 hours) Category: DevOps / Automation
Description: Partial CI/CD pipeline exists but lacks critical automation:
- GitHub Actions workflow exists for webmap deployment (
.github/workflows/static.yml) - Tests must be run manually (no automated test execution)
- No automated quality checks (linting, type checking)
- No test coverage reporting
- No pre-commit hooks
Current State:
- ✅ GitHub Actions workflow for Pages deployment exists and is functional
- ❌ No automated test execution on PR/push
- ❌ No code quality checks in CI
- ❌ No pre-commit hooks configured
Impact:
- Higher risk of breaking changes
- Manual testing burden
- Inconsistent code quality
- Slower development cycle
Solution:
- Extend existing GitHub Actions workflow to include:
- Running tests on PR/push (pytest)
- Code quality checks (linting, type checking)
- Test coverage reporting (pytest-cov)
- Keep existing webmap deployment automation
- Add pre-commit hooks for:
- Code formatting (black, isort)
- Linting (ruff or pylint)
- Type checking (mypy)
- Set up branch protection rules
- Add status badges to README
Priority Tasks:
- Add test automation to existing GitHub Actions workflow (pytest on push/PR)
- ✅ Webmap deployment automation (already implemented)
- Add code quality checks to CI pipeline
- Set up pre-commit hooks
Priority: Medium Effort: Small (4-8 hours) Status: ✅ COMPLETED (2025-11-15) Category: Security
Description: No automated security scanning for dependencies:
- Old dependency versions may have vulnerabilities
- No alerts for security updates
- Manual tracking of CVEs
Completed Implementation:
- ✅ Added Dependabot configuration for automated dependency updates
- ✅ Added
pip-auditfor vulnerability scanning in dev dependencies - ✅ Created security scanning GitHub Actions workflow
- ✅ Configured weekly automated scans
- ✅ Added security documentation to CONTRIBUTING.md
Files Created/Modified:
.github/dependabot.yml- Automated dependency update configuration.github/workflows/security.yml- Security scanning CI/CD workflowpyproject.toml- Added pip-audit and bandit to dev dependenciesCONTRIBUTING.md- Added security best practices section
Priority: Low Effort: Large (30-40 hours) Category: Code Quality
Description: Jupyter notebooks contain duplicated logic that should be in modules:
- Walk time calculation code duplicated between notebooks
- Visualization code not fully migrated to
visualization/module - Data loading patterns repeated
- Analysis patterns repeated
Impact:
- Harder to maintain and update
- Inconsistent results across notebooks
- Code drift between notebook and module implementations
Solution:
- Audit notebooks for duplicated code
- Extract common patterns to modules
- Update notebooks to use module functions
- Add notebook testing (nbconvert + papermill)
- Document notebook → module workflow
Note:
README mentions: "Core data processing logic has been migrated to standalone Python modules in src/ for better maintainability"
This suggests migration is ongoing but incomplete.
Priority: Medium Effort: Large (72-104 hours) Category: Architecture / Analysis Methodology
Description: H3 hexagon infrastructure was built to replace census blocks as standardized geographic units, but the original goal has not been achieved. H3 is currently only used for post-processing aggregation and visualization, while all core analysis still uses census blocks.
Current State:
- ✅ H3 relationship file generation exists (
src/h3_utils/relationship.py) - ✅ H3 join utilities exist (
src/h3_utils/joins.py) - ✅ H3 visualization functions exist
- ✅ H3J format conversion exists
- ❌ Walk times calculated at census block centroids (not H3 hexagon centroids)
- ❌ Access metrics calculated per census block (not per H3 hexagon)
- ❌ Statistical analysis uses census blocks (not H3 hexagons)
- ❌ No H3-centroid mapping to OSMnx nodes
Impact:
- Still subject to uneven census block granularity (urban vs. rural)
- Blocks don't represent meaningful geographic areas
- Blocks can be very small (parks, parking lots) or very large (rural areas)
- H3 benefits (standardized sizes, better comparisons) not realized
- Post-processing aggregation loses precision and accuracy
Root Cause: The original intent was to use H3 hexagons as standardized geographic units instead of census blocks, which have uneven granularity. However, the implementation stopped at building infrastructure for aggregation rather than making H3 the primary analysis unit.
Solution: See FR-004 for complete implementation plan. This technical debt item tracks the gap between original intent and current state.
References:
H3_PROGRESS_ASSESSMENT.md- Detailed assessment of H3 progress- Original goal: Use H3 hexagons instead of census blocks for standardized geographic detail
src/h3_utils/relationship.py- Existing H3 relationship file generationsrc/h3_utils/joins.py- Existing H3 join utilities
Priority: High Effort: Large (60-80 hours) Category: Geographic Expansion
Description: Extend analysis from Maine to all New England states (NH, VT, MA, RI, CT).
Current State:
RegionConfigclass exists insrc/config/regions.py- All 6 New England states defined in
NEW_ENGLAND_STATESdict - Most scripts still hard-coded for Maine
Benefits:
- Comparative analysis across states
- Larger dataset for statistical analysis
- Greater research impact
- Reusable framework for other regions
Requirements:
-
Data Acquisition:
- Conserved lands datasets for each state
- OSMnx graphs for each state
- Census data (already multi-state capable)
- CEJST data (already national)
-
Code Updates:
- Remove Maine-specific hard-coding
- Use
RegionConfigthroughout - Update pipeline scripts for multi-state
- Parallel processing for multiple states
-
Webmap:
- Multi-state layer switching
- State boundary overlay
- Comparative statistics view
-
Documentation:
- State-specific setup guides
- Data source documentation per state
- Comparison methodology
Implementation Phases:
- Phase 1: Single additional state (NH) as proof-of-concept
- Phase 2: All New England states
- Phase 3: State comparison analysis
- Phase 4: Regional webmap
Dependencies:
- TD-005 (Hard-coded paths)
- TD-006 (Data format optimization for larger datasets)
Priority: Medium Effort: Large (50-70 hours) Category: Visualization / UI
Description: Create an interactive dashboard (Dash/Streamlit/Panel) for exploring analysis results without running notebooks.
Features:
-
Data Exploration:
- Filter by demographic variables
- Select trip time thresholds
- Geographic selection (state, county, tract)
-
Visualizations:
- Interactive maps (Folium/Plotly)
- Statistical charts and graphs
- Comparison views
-
Export:
- Download filtered data
- Export publication-ready figures
- Generate reports
-
Analysis Tools:
- Custom access calculations
- What-if scenarios
- Demographic comparisons
Technology Options:
- Streamlit: Easiest, Python-native
- Dash: More powerful, Plotly integration
- Panel: Flexible, supports Jupyter widgets
- Shiny for Python: R-like reactive programming
Implementation Phases:
- Phase 1: Basic data exploration (10-15 hours)
- Phase 2: Interactive visualizations (15-20 hours)
- Phase 3: Analysis tools (20-25 hours)
- Phase 4: Deployment and hosting (5-10 hours)
Benefits:
- Accessible to non-technical stakeholders
- Real-time data exploration
- Supports policy decision-making
- Broader research impact
Priority: Medium Effort: Medium (20-30 hours) Status: ✅ COMPLETED (2025-11-09) Category: Webmap / UI
Description: Current webmap may not be fully optimized for mobile devices.
Completed Requirements:
-
Responsive Design:
- ✅ Mobile-first layout (responsive CSS with media queries)
- ✅ Touch-friendly controls (minimum 44px touch targets)
- ✅ Optimized map interactions
- ✅ Responsive positioning of controls for different screen sizes
-
Performance:
- ✅ PMTiles format (efficient tile delivery)
- ✅ Progressive loading (tiles load as needed)
- ✅ Reduced data transfer (vector tiles, not raster)
- ❌ Smaller initial load (could be further optimized)
-
Features:
- ✅ Location services integration (geolocation button in search)
- ✅ "Find nearest conserved land" feature (locate button)
- ❌ Offline capability (PWA) (not implemented)
-
Accessibility:
- ✅ Screen reader support (ARIA labels, semantic HTML)
- ✅ High contrast mode support (CSS media queries)
- ✅ Keyboard navigation (Tab navigation, Enter/Space activation)
- ✅ Focus indicators for keyboard users
Testing:
- ✅ Cross-browser testing (basic)
⚠️ Device testing (iOS, Android) (recommended for production)⚠️ Performance benchmarking (recommended)⚠️ Accessibility audit (WCAG 2.1) (recommended for production)
Notes:
- All controls are accessible via keyboard
- Screen reader announcements implemented
- High contrast mode styles added
- Mobile-specific CSS adjustments for smaller screens
Priority: Medium Effort: Large (72-104 hours) Category: Analysis Methodology / Architecture
Description: Complete the original goal of using H3 hexagons as standardized geographic units instead of census blocks. Currently, H3 infrastructure exists but is only used for post-processing aggregation. This feature request would make H3 the primary analysis unit throughout the pipeline.
Current State:
- H3 relationship files can be generated (maps blocks to hexagons)
- H3 joins can aggregate block-level results to hexagons
- But walk times, access metrics, and analysis all still use census blocks
Benefits:
- Standardized hexagon sizes provide consistent geographic detail
- Better for cross-regional comparisons
- More intuitive for visualization and analysis
- Avoids uneven census block granularity (urban vs. rural)
- Blocks don't represent meaningful geographic areas
Implementation Phases:
Phase 1: H3-Centroid Mapping (Prerequisite) - 8-12 hours
- Generate H3 hexagons for the region
- Calculate centroid for each hexagon
- Find nearest OSMnx node for each centroid (similar to
find_centroids.pyfor blocks) - Create
h3_hexagons.shp.zipwithh3idandosmidcolumns
Phase 2: H3-Based Walk Time Calculations - 16-24 hours
- Add
geography_type="hexagons"option to walk time functions - Use H3 hexagon centroids instead of block centroids
- Calculate walk times per H3 hexagon
- Output:
walk_times_hexagon_df.csv
Phase 3: H3-Based Merging and Analysis - 24-32 hours
- Create
create_ejhexagons()function (H3 equivalent ofcreate_ejblocks()) - Aggregate demographics to H3 hexagons
- Calculate access metrics per hexagon
- Join CEJST data at H3 level
Phase 4: H3-Based Statistical Analysis - 16-24 hours
- Update analysis modules to work with H3 hexagons
- Create H3-based visualization functions
- Update notebooks to use H3 as primary unit
Phase 5: Pipeline Integration - 8-12 hours
- Make H3 the default geographic unit in pipeline
- Keep block-level as optional/legacy mode
- Update documentation
Dependencies:
- TD-011 (H3 Not Used as Primary Geographic Unit) - This is the technical debt being addressed
- Existing H3 infrastructure provides foundation
Alternative Approach:
- Hybrid: Keep blocks for walk time calculations (more precise), use H3 for aggregation/visualization
- Pros: Less effort, maintains precision
- Cons: Doesn't fully achieve original goal
- Effort: ~40-50 hours
References:
H3_PROGRESS_ASSESSMENT.md- Detailed assessment and implementation plansrc/h3_utils/relationship.py- Existing H3 relationship file generationsrc/h3_utils/joins.py- Existing H3 join utilities
Priority: High Effort: Large (40-60 hours) Category: Performance
Description: Walk time calculations are the most computationally intensive part of the pipeline. Several optimization opportunities exist.
Current State:
- Uses rustworkx for graph operations (already optimized)
- Bounded Dijkstra algorithm implemented
- Parallel processing support added (n_jobs parameter)
- Processing ~100K+ blocks for Maine takes significant time
Optimization Opportunities:
-
Algorithm Improvements:
- Bidirectional search for specific source-target pairs
- A* algorithm with heuristic for targeted searches
- Precompute and cache common subgraphs
- Early termination optimization
-
Data Structure Optimization:
- Graph compression techniques
- Spatial indexing for node lookups
- Memory-mapped graph storage
-
Parallel Processing Enhancement:
- Better chunk size optimization
- Multi-level parallelism (state → county → tract)
- GPU acceleration exploration (CuGraph)
- Distributed computing (Dask)
-
Caching Strategy:
- Cache intermediate results
- Incremental updates only for changed data
- Persistent result caching
Benchmarking:
# Current performance (approximate):
# - Maine blocks (~100K): ~4-8 hours (single core)
# - Maine blocks: ~1-2 hours (8 cores)
# Target: <30 minutes for Maine, <4 hours for all New EnglandImplementation:
- Benchmark current performance
- Implement and test each optimization
- Measure impact
- Document performance characteristics
- Add performance testing to CI
Dependencies:
- TD-002 (OSMnx version)
- TD-006 (Data format)
Priority: High Effort: Medium (24-32 hours) Category: Data Quality
Description: Strengthen data validation throughout the pipeline.
Current State:
- ✅
src/validate_data.pyexists with basic validation checks - ✅ Schema validation implemented
- ✅ Quality metrics calculated
- ❌ No automated validation integrated into pipeline
Enhancements:
-
Input Validation:
- Geometry validation (topology, area, completeness)
- CRS consistency checks
- Required fields validation
- Value range checks
- Missing data analysis
-
Intermediate Validation:
- Walk time reasonableness checks
- Join completeness verification
- Calculation sanity checks
- Progress checkpoints
-
Output Validation:
- Statistical distribution checks
- Comparison with previous runs
- Known-good test cases
- Publication-ready checks
-
Reporting:
- Validation reports
- Data quality dashboard
- Trend analysis
- Alert system
Validation Rules:
- Walk times should be positive and within reasonable bounds
- Access percentages should sum correctly
- Geographic coverage should be complete
- Demographic totals should match Census
- Conserved land areas should match source data
Implementation:
- Define validation rule set
- Implement validation functions
- Integrate into pipeline
- Create validation reports
- Add to CI/CD
Priority: Medium Effort: Medium (20-30 hours) → 16-25 hours remaining Status: 🔄 IN PROGRESS (2025-11-15) Category: Documentation
Description: Enhance documentation for users, developers, and researchers.
Current State:
- Good README with setup instructions
- DATA_DICTIONARY.md with comprehensive data documentation
- NOTES.md with references
- README_CEJST.md for CEJST workflow
- Test README
- Notebooks demonstrate workflows
Progress (2025-11-15):
- ✅ Created DEVELOPMENT.md with developer guidelines
- ✅ Documented logging best practices with code examples
- ✅ Documented library vs entry point patterns
- ✅ Documented TQDM integration
- ❌ .env.example not yet created (mentioned but file doesn't exist)
- ❌ No API documentation yet
- ❌ No auto-generated docs yet
- ❌ Contributing guidelines not yet created
Improvements Needed:
-
API Documentation:
- ❌ Auto-generated API docs (Sphinx/MkDocs)
- ❌ Module documentation
- ❌ Function signatures and examples
- ❌ Type hints throughout
-
User Guides:
- ❌ Step-by-step tutorials
- ❌ Common workflows
- ❌ Troubleshooting guide (expand existing)
- ❌ FAQ section
-
Developer Guides:
- ✅ Development best practices (DEVELOPMENT.md)
- ❌ Contributing guidelines (CONTRIBUTING.md)
- ❌ Code style guide
- ❌ Testing guide
- ❌ Release process
-
Research Documentation:
- ❌ Methodology documentation
- ❌ Algorithm descriptions
- ❌ Validation approach
- ❌ Reproducibility guide
-
Architecture Documentation:
- ❌ System design
- ❌ Data flow diagrams (expand existing Mermaid)
- ❌ Module dependencies
- ❌ Extension points
Tools:
- Sphinx: Python standard, autodoc
- MkDocs: Modern, Markdown-based
- Jupyter Book: Integrate notebooks
- Mermaid: Diagrams (already used)
Remaining Work:
- Choose documentation tool
- Set up documentation structure
- Add docstrings throughout code
- Write CONTRIBUTING.md
- Write guides and tutorials
- Deploy documentation site
Priority: Medium Effort: Medium (16-24 hours) → 8-12 hours remaining Status: 🔄 IN PROGRESS (2025-11-15) Category: Observability
Description: Enhance logging for better debugging and monitoring.
Current State:
- Basic logging in most modules
- Logs to
pipeline_log.txt,processing_log.txt, etc. - No structured logging
- No centralized log aggregation
Progress (2025-11-15):
- ✅ Replaced print() statements with proper logging in library modules
⚠️ CLI scripts (probe_data_sources.py,changelog.py) still use print() for user-facing output (acceptable for CLI)- ✅ Established consistent logging patterns:
- Entry scripts use
logging.basicConfig()with handlers - Library modules use
logger = logging.getLogger(__name__)
- Entry scripts use
- ✅ Created DEVELOPMENT.md with logging guidelines and examples
- ✅ Documented integration with TQDM progress bars
- ✅ Proper log levels used (DEBUG, INFO, WARNING, ERROR)
- ❌ No structured logging (JSON) yet
- ❌ No centralized log aggregation yet
- ❌ No monitoring dashboards yet
Improvements:
-
Structured Logging:
- ❌ JSON format for machine parsing
- ✅ Consistent log levels
- ❌ Context information (user, region, operation)
- ❌ Request IDs for tracing
-
Log Levels:
- ✅ Properly applied throughout codebase
DEBUG: Detailed diagnostic info INFO: General informational messages WARNING: Warning messages (degraded but functional) ERROR: Error messages (operation failed) CRITICAL: Critical failures (system/data integrity)
-
Performance Logging:
- ❌ Operation timing
- ❌ Resource usage
- ❌ Progress tracking
- ❌ Bottleneck identification
-
Log Management:
- ❌ Log rotation
- ❌ Compression
- ❌ Retention policy
- ❌ Search and analysis
-
Monitoring:
- ❌ Metrics collection (Prometheus)
- ❌ Dashboards (Grafana)
- ❌ Alerting
- ❌ Health checks
Remaining Work:
- Add
structloglibrary for structured logging - Add performance/timing logging
- Set up log rotation and management
- Create monitoring dashboards (optional)
Priority: Medium Effort: Small (8-16 hours) Status: ✅ COMPLETED (2025-11-15) Category: Development Tools
Description: Set up code quality tools for consistent style and best practices.
Completed Implementation:
-
Formatting:
- ✅ Black: Opinionated code formatter (line length: 100)
- ✅ isort: Import sorting (Black profile)
- ✅ nbQA: Notebook formatting integration
-
Linting:
- ✅ Ruff: Fast modern linter with multiple rule sets
- ✅ mypy: Static type checking
- ✅ bandit: Security linting
-
Pre-commit Hooks:
- ✅ Automatic formatting (Black, isort)
- ✅ Linting checks (Ruff)
- ✅ Type checking (mypy)
- ✅ Security scanning (Bandit)
- ✅ File checks (trailing whitespace, EOF, YAML/JSON validation)
- ✅ Notebook formatting (nbQA integration)
-
IDE Configuration:
- ✅
.editorconfigfor cross-IDE consistency
- ✅
Configuration Files Created:
- ✅
.pre-commit-config.yaml- Pre-commit hooks configuration - ✅
pyproject.toml- All tool configurations (Black, isort, Ruff, mypy, Bandit, coverage) - ✅
.editorconfig- Editor configuration for multiple file types - ✅
.github/workflows/code-quality.yml- CI/CD workflow for code quality checks - ✅
CONTRIBUTING.md- Developer guidelines and tool usage documentation
Tools Added to Dev Dependencies:
- ✅ black>=24.0.0
- ✅ isort>=5.13.0
- ✅ ruff>=0.6.0
- ✅ mypy>=1.11.0
- ✅ pre-commit>=3.8.0
CI/CD Integration:
- ✅ Automated formatting checks on push/PR
- ✅ Linting with Ruff
- ✅ Type checking with mypy
- ✅ Test execution with coverage
- ✅ Pre-commit hook validation
Priority: Medium Effort: Large (30-40 hours) Status: ✅ COMPLETED (2025-11-09) Category: Webmap / Visualization
Description: Enhance the interactive webmap with additional features and improvements.
Completed Features:
-
Map Features:
- ✅ Search by address/location (Nominatim geocoding)
- ✅ Print/export functionality (print button, PNG export)
- ✅ Bookmarkable views (URL hash state)
- ❌ Measurement tools (removed - not necessary)
-
Data Layers:
- ✅ Toggle layers on/off (integrated into legend with eye icons)
- ✅ Remove census block outlines (reduces visual clutter)
- ✅ Collapsible legend
- ❌ Layer opacity control (removed - complicates legend interpretation)
- ❌ Base map selection (removed - not needed)
-
Interactive Analysis:
- ✅ Click for detailed info popup with comprehensive block data
- ✅ Only census blocks are clickable (conserved lands and CEJST are visual layers only)
- ❌ Enhanced hover tooltips (removed - only show details on click)
- ❌ Buffer analysis (not implemented)
- ❌ Demographic charts (not implemented)
- ❌ Access comparisons (not implemented)
-
Performance:
- ✅ PMTiles-based tile format (optimized rendering)
- ✅ Mobile optimization (see FR-003)
- ❌ Lazy loading (not implemented)
- ❌ Tile caching (handled by browser)
-
User Experience:
- ✅ Enhanced legend showing full spectrum of walk times (complete color scale)
- ✅ Integrated controls with MapLibre native styling
- ✅ Clean, compact popups
- ✅ Proper spacing and positioning of controls
- ❌ Tutorial/help overlay (not implemented)
- ❌ Share functionality (not implemented)
- ❌ Embed code for external sites (not implemented)
Files:
docs/index.htmldocs/js/map.jsdocs/js/scripts.jsdocs/css/(styles)
Notes:
- Controls integrated with MapLibre's native control system
- Search positioned in top-left, legend in bottom-left
- Print and export buttons in top-right with navigation controls
- Removed site menu for single-page site
Priority: Low Effort: Medium (12-16 hours) Category: Dependencies
Description: Improve dependency management and update strategy.
Current Issues:
- OSMnx pinned to old version (see TD-002)
- Mix of
>=and==version specifications - No automated dependency updates
- No security scanning (see TD-009)
Improvements:
-
Version Strategy:
- Define when to pin (
==) vs allow updates (>=) - Document version decision rationale
- Regular dependency reviews
- Define when to pin (
-
Update Process:
- Automated dependency update PRs (Dependabot/Renovate)
- Testing strategy for updates
- Changelog for dependency changes
- Rollback procedure
-
Security:
- Vulnerability scanning
- Security advisories monitoring
- Timely security updates
- Security policy
-
Documentation:
- Dependency justification
- Known issues/workarounds
- Alternative packages considered
Tools:
- Dependabot (GitHub native)
- Renovate (more powerful)
pip-auditorsafetypipdeptreefor dependency visualization
Priority: Low Effort: Medium (12-20 hours) Category: Performance / Data Management
Priority: Medium Effort: Medium (12-16 hours) Status: ✅ COMPLETED (2025-11-15) Category: Webmap / Visualization
Description: Improve print layouts for the webmap to create publication-ready printed maps.
Current State:
- ✅ Enhanced print functionality with optimized layout
- ✅ Print styles show properly formatted legend, title, metadata, scale bar, and north arrow
- ✅ Publication-ready print layout with proper styling
- ✅ Dynamic metadata population (date, coordinates, zoom, scale)
Completed Enhancements:
-
Print Layout Options:
- ✅ Landscape and portrait orientation support with CSS @page rules
- ✅ Letter page size optimized
- ✅ Proper margin controls (10mm landscape, 15mm portrait)
-
Map Styling for Print:
- ✅ Enhanced legend for print (larger, clearer, always visible)
- ✅ Print-optimized styling with borders and shadows
- ✅ Title and metadata inclusion (map title, date, center, zoom)
- ✅ Scale bar with dynamic calculation
- ✅ North arrow indicator
- ✅ Attribution and data source information
-
Layout Customization:
- ✅ Title block with map name and subtitle
- ✅ Legend placement (bottom-right)
- ✅ Metadata panel (bottom-left)
- ✅ Scale bar and north arrow (top-right)
-
Export Formats:
- ✅ PNG export functionality (already existed, maintained)
- ✅ Browser print to PDF support
-
Dynamic Updates:
- ✅ Print metadata updates on print button click
- ✅ Scale calculation based on current zoom level
- ✅ Map center coordinates display
- ✅ Current date display
Potential Future Enhancements:
- Multiple page size options (A4, Legal) via print dialog
- Custom page size configuration
- Higher resolution rendering for print
- Advanced PDF export with multi-page support
- Print preview dialog before printing
- Print templates for common use cases
- Inset maps
- Custom header/footer options
Implementation Summary:
- ✅ Created comprehensive print-specific CSS styles with @media print
- ✅ Added print layout HTML elements (title, metadata, scale, north arrow, attribution)
- ✅ Implemented dynamic metadata population in JavaScript
- ✅ Added scale calculation based on map zoom and latitude
- ✅ Integrated print functionality with existing print button
Benefits:
- ✅ Publication-ready maps with professional appearance
- ✅ Comprehensive map information for documentation
- ✅ Properly scaled and oriented print output
- ✅ Clear attribution and data sources
Files Modified:
docs/css/styles.css(enhanced print media queries at lines 13437-13716)docs/js/map.js(print metadata functions at lines 1029-1128)docs/index.html(print-only HTML elements at lines 40-67)
Dependencies:
- IMP-006 (Webmap Enhancements) - Print functionality already existed
Priority: Low Effort: Medium (12-20 hours) Category: Performance / Data Management
Description: Implement caching for Census API calls to improve performance and reduce API usage.
Current State:
- Census API calls made each pipeline run
- No caching of Census data
- Rate limiting risks
- Dependency on API availability
Improvements:
-
Response Caching:
- Cache API responses locally
- TTL-based cache invalidation
- Cache key based on query parameters
- Cache versioning
-
Smart Updates:
- Check data freshness before API calls
- Incremental updates only
- Batch API requests
- Parallel requests with rate limiting
-
Cache Management:
- Cache statistics
- Cache cleaning
- Manual cache refresh
- Cache sharing across runs
-
Fallback Strategy:
- Use cached data if API unavailable
- Stale data warnings
- Manual data provision
Benefits:
- Faster pipeline runs
- Reduced API dependency
- Lower risk of rate limiting
- Better offline capability
High Priority:
- Critical for core functionality
- Security concerns
- Blocking other work
- High user impact
Medium Priority:
- Important but not urgent
- Quality improvements
- Nice-to-have features
- Moderate user impact
Low Priority:
- Future enhancements
- Minor improvements
- Can be deferred
- Low immediate impact
- Small: 4-8 hours (< 1 day)
- Medium: 12-32 hours (1-4 days)
- Large: 40-80 hours (1-2 weeks)
- ✅
TD-009: No Dependency Security Scanning- COMPLETED (2025-11-15) - ✅
IMP-005: Code Quality Tooling- COMPLETED (2025-11-15) - TD-007: Error Handling Strategy - Medium effort (20-30 hours), critical for reliability
- TD-003: Mixed Import Patterns for H3 Module - Small effort (4-8 hours), improves developer experience
- TD-001: Python 3.10 Version Lock - Medium effort (16-24 hours), enables future improvements
- TD-004: Incomplete Test Coverage - Large effort (40-60 hours), critical for long-term maintainability
- IMP-001: Performance Optimization - Large effort (40-60 hours), core functionality improvement
- FR-001: Multi-State Support - Large effort (60-80 hours), major feature expansion
- TD-002: Outdated OSMnx Version - Medium effort (12-16 hours), performance and compatibility improvements
- TD-005: Hard-coded File Paths - Medium effort (16-24 hours), prerequisite for FR-001
- TD-008: Incomplete CI/CD Pipeline - Medium effort (16-24 hours), extends existing deployment automation
- TD-011: H3 Not Used as Primary Geographic Unit - Large effort (72-104 hours), addresses original analysis methodology goal
- IMP-002: Enhanced Data Validation - Medium effort (24-32 hours), improves data quality
- FR-002: Interactive Dashboard - Large effort (50-70 hours), improves accessibility for non-technical users
- FR-003: Mobile-Friendly Webmap - Medium effort (20-30 hours), improves user experience
- FR-004: Complete H3 Implementation - Large effort (72-104 hours), completes original H3 standardization goal
- IMP-003: Documentation Improvements - Medium effort (20-30 hours), improves maintainability
- IMP-004: Improved Logging and Monitoring - Medium effort (16-24 hours), improves debugging
- IMP-006: Webmap Enhancements - Large effort (30-40 hours), improves webmap functionality
Focus: Code Quality, Testing, Security
Quick Wins:
- ✅
TD-009: Dependency Security Scanning- COMPLETED (2025-11-15) - ✅
IMP-005: Code Quality Tooling- COMPLETED (2025-11-15) - TD-003: H3 Module Import Pattern (Small, 4-8 hours)
Core Infrastructure: 4. TD-008: CI/CD Pipeline Extension (Medium, 16-24 hours) - Add test automation to existing deployment workflow 5. TD-007: Error Handling Strategy (Medium, 20-30 hours) 6. TD-004: Test Coverage - Priority Areas (Large, 40-60 hours) - Focus on critical path first
Total Phase 1 Effort: ~88-134 hours remaining (2-3.5 weeks full-time) Completed: ~12-16 hours (TD-009 + IMP-005)
Focus: Optimization, Reliability
Dependency Updates:
- TD-001: Python Version Upgrade (Medium, 16-24 hours)
- TD-002: OSMnx Update (Medium, 12-16 hours)
Core Improvements: 3. IMP-001: Performance Optimization (Large, 40-60 hours) - Critical for scalability 4. IMP-002: Enhanced Data Validation (Medium, 24-32 hours) 5. IMP-004: Improved Logging and Monitoring (Medium, 16-24 hours)
Total Phase 2 Effort: ~110-150 hours (2.75-4 weeks full-time)
Focus: Maintainability, Multi-State Preparation
Code Quality:
- TD-005: Hard-coded Paths Refactoring (Medium, 16-24 hours) - Prerequisite for FR-001
- TD-010: Notebook Code Duplication (Large, 30-40 hours)
- IMP-003: Documentation Improvements (Medium, 20-30 hours)
Optional: 4. TD-006: Data Format Migration (Large, 30-40 hours) - Can be deferred if not blocking
Total Phase 3 Effort: ~65-95 hours (1.5-2.5 weeks full-time)
Focus: New Capabilities
Major Feature:
- FR-001: Multi-State Support (Large, 60-80 hours) - Requires TD-005 completion
Webmap Improvements: 2. FR-003: Mobile-Friendly Webmap (Medium, 20-30 hours) 3. IMP-006: Webmap Enhancements (Large, 30-40 hours)
Total Phase 4 Effort: ~110-150 hours (2.75-4 weeks full-time)
Focus: Enhanced User Experience
Optional Enhancements:
- FR-002: Interactive Dashboard (Large, 50-70 hours) - Improves accessibility for non-technical users
- IMP-007: Dependency Management Improvements (Medium, 12-16 hours)
- IMP-008: Census Data Caching (Medium, 12-20 hours)
Total Phase 5 Effort: ~75-105 hours (2-2.5 weeks full-time)
Must Have (High Priority):
- Phase 1: ✅ TD-009 (COMPLETED), ✅ IMP-005 (COMPLETED), TD-007, TD-004, TD-008
- Phase 2: TD-001, IMP-001, IMP-002
- Phase 3: TD-005 (prerequisite for FR-001)
- Phase 4: FR-001
Should Have (Medium Priority):
- Phase 2: TD-002, IMP-004
- Phase 3: TD-010, IMP-003
- Phase 4: FR-003, IMP-006
- Phase 5: FR-002
Nice to Have (Low Priority):
- Phase 3: TD-006 (can be deferred)
- Phase 5: IMP-007, IMP-008
-
Complexity:
- Algorithm changes
- Architecture changes
- Integration requirements
-
Dependencies:
- Other tasks must complete first
- External dependencies
- Team coordination
-
Testing:
- Test development time
- Integration testing
- User acceptance testing
-
Documentation:
- Code documentation
- User documentation
- Tutorial creation
-
Deployment:
- Migration planning
- Deployment automation
- Rollback procedures
- High Confidence: Well-understood, similar to past work
- Medium Confidence: Some unknowns, new technology
- Low Confidence: Significant unknowns, research needed
Most estimates in this document are medium confidence and should be refined during implementation planning.
Some items have dependencies and should be implemented in order:
Critical Dependencies:
TD-005 → FR-001(Hard-coded paths must be fixed before multi-state expansion)TD-004 → All refactoring work(Testing enables confident refactoring)TD-001 → IMP-001(Python upgrade enables performance improvements)
Recommended Order:
TD-008 → TD-004(CI/CD should include test automation before expanding test coverage)TD-002 → IMP-001(OSMnx update may provide performance improvements)TD-006 → IMP-001(Data format affects performance, but can be deferred)
Optional Dependencies:
FR-003 → IMP-006(Mobile optimization can inform webmap enhancements)IMP-002 → FR-001(Data validation helps ensure multi-state data quality)TD-011 → FR-004(Technical debt item tracks the gap, FR-004 addresses it)
For high-risk changes:
- Create feature branch
- Implement with comprehensive tests
- Performance benchmarking
- Document rollback procedure
- Phased rollout
Recommended allocation:
- 40% New features
- 30% Technical debt
- 20% Improvements
- 10% Bug fixes / Security updates
This backlog should be reviewed and updated:
- Monthly: Priority adjustments, new items
- Quarterly: Roadmap revision, effort calibration
- Annually: Strategic direction, major initiatives
Last Review: 2025-11-09 Next Review: 2025-12-09
For questions or to contribute:
- Project Lead: Philip Mathieu (mathieu.p@northeastern.edu)
- Documentation: See README.md and DATA_DICTIONARY.md
- Issues: GitHub Issues (if repository is public)
Document Version: 1.4.1 Last Updated: 2025-11-15 Previous Version: 1.4 (2025-11-15) Analysis Method: Comprehensive codebase review, dependency analysis, and best practices research
Revision Notes:
v1.4.1 (2025-11-15):
- Accuracy verification: Reviewed all status indicators against actual codebase
- Corrected IMP-003: .env.example not yet created (was incorrectly marked as completed)
- Clarified IMP-004: Print statements in CLI scripts are acceptable for user-facing output
- Updated TD-003: Fixed completion date placeholder and added note about legacy notebooks
- Verified TD-009, IMP-005, IMP-006, FR-003, IMP-009 completion status (all accurate)
v1.4 (2025-11-15):
- Updated TD-007 (Error Handling Strategy) - marked as IN PROGRESS
- Fixed 4 empty except blocks with proper error logging
- Documented progress and remaining work
- Updated IMP-004 (Improved Logging and Monitoring) - marked as IN PROGRESS
- Replaced print() statements with proper logging in library modules
- CLI scripts still use print() for user-facing output (acceptable)
- Established consistent logging patterns
- Created DEVELOPMENT.md with logging guidelines
- Updated IMP-003 (Documentation Improvements) - marked as IN PROGRESS
- Created DEVELOPMENT.md with developer best practices
- Corrected: .env.example not yet created (was incorrectly marked as completed)
- Updated TD-003 (H3 Module Import Pattern) - corrected completion date from placeholder
- Added note about legacy notebooks using separate h3utils.py file
- Updated effort estimates for in-progress items
- Added recent completions section
- Verified accuracy of all status indicators against codebase
v1.3 (2025-11-09):
- Added TD-011: H3 Not Used as Primary Geographic Unit (technical debt)
- Added FR-004: Complete H3 Implementation as Primary Geographic Unit (feature request)
- Integrated findings from H3_PROGRESS_ASSESSMENT.md
- Updated Medium Priority Items section to include H3-related items
- Added dependency relationship between TD-011 and FR-004
v1.2 (2025-11-09):
- Rewrote prioritization sections to reflect streamlined backlog
- Removed references to deleted items (FR-004, FR-005, FR-006, FR-007, FR-008, IMP-009, IMP-010)
- Updated roadmap to focus on remaining items only
- Enhanced Quick Wins and Strategic Items sections with effort estimates
- Added Medium Priority Items section for better visibility
- Restructured roadmap with clearer phase descriptions and effort estimates
- Updated dependencies section to reflect current backlog structure
v1.1 (2025-11-09):
- Corrected TD-008 status: deployment pipeline exists, but test automation is missing
- Updated TD-002 to reflect current year (2025)
- Verified TD-006: GeoParquet migration file exists
- Updated status indicators (✅/❌/
⚠️ ) throughout for clarity - Clarified current state of various tools and scripts