Skip to content

feat: Add pisd-shape submodule for PFISD attendance boundary shapefil…#8

Merged
jreakin merged 1 commit intomainfrom
claude/pisd-shape-module-ENBA3
Mar 19, 2026
Merged

feat: Add pisd-shape submodule for PFISD attendance boundary shapefil…#8
jreakin merged 1 commit intomainfrom
claude/pisd-shape-module-ENBA3

Conversation

@jreakin
Copy link
Member

@jreakin jreakin commented Mar 19, 2026

…e extraction

Extracts all layers from the Pflugerville ISD Attendance Boundaries ArcGIS WebMap and saves them as ESRI Shapefiles (WGS84/EPSG:4326). Supports both remote fetch and local JSON file via --local flag.

https://claude.ai/code/session_01QADmzhMETTRRbEo8D8pjzj

Summary by CodeRabbit

Release Notes

  • New Features
    • Added Pflugerville ISD Attendance Boundary Shapefile Extractor tool for downloading and processing ArcGIS WebMap data.
    • Extract feature layers from online or local sources with automatic geometry conversion and coordinate reprojection.
    • Export processed features directly to Shapefile format for use in GIS applications.

…e extraction

Extracts all layers from the Pflugerville ISD Attendance Boundaries
ArcGIS WebMap and saves them as ESRI Shapefiles (WGS84/EPSG:4326).
Supports both remote fetch and local JSON file via --local flag.

https://claude.ai/code/session_01QADmzhMETTRRbEo8D8pjzj
Copilot AI review requested due to automatic review settings March 19, 2026 19:53
@jreakin jreakin merged commit 74bf3d8 into main Mar 19, 2026
5 of 10 checks passed
@coderabbitai
Copy link

coderabbitai bot commented Mar 19, 2026

Caution

Review failed

The pull request is closed.

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 3d03b7d1-2b37-4d65-b5ea-4b08e439adfd

📥 Commits

Reviewing files that changed from the base of the PR and between 33d5011 and 82c5131.

📒 Files selected for processing (3)
  • src/pisd_shape/__init__.py
  • src/pisd_shape/export/.gitkeep
  • src/pisd_shape/pfisd_extract_shapefiles.py

Walkthrough

This PR adds a new Python script to the pisd_shape package that downloads ArcGIS WebMap JSON data, extracts feature geometries, reprojects them from Web Mercator to WGS84, and exports them as ESRI Shapefiles to a local directory, along with a package identifier comment.

Changes

Cohort / File(s) Summary
Package Initialization
src/pisd_shape/__init__.py
Added module-level comment identifying the package as "Pflugerville ISD Attendance Boundary Shapefile Extractor".
Feature Extraction and Conversion
src/pisd_shape/pfisd_extract_shapefiles.py
New script that downloads or loads ArcGIS WebMap JSON, extracts features from operational layers, converts ESRI geometries to Shapely, reprojects from EPSG:3857 to EPSG:4326, constructs GeoDataFrames, and exports shapefiles with sanitized layer names and column truncation.

Sequence Diagram(s)

sequenceDiagram
    actor User
    participant Script as pfisd_extract_shapefiles.py
    participant DataSource as ArcGIS WebMap<br/>(JSON)
    participant Converter as Geometry Converter<br/>(Shapely/GeoPandas)
    participant FileSystem as Export Directory<br/>(Shapefiles)

    User->>Script: Execute (--local flag optional)
    Script->>DataSource: Fetch/Load WebMap JSON
    DataSource-->>Script: operationalLayers data
    
    Script->>Script: Enumerate operational layers
    
    loop For each layer's featureCollection
        Script->>Converter: Extract featureSet.features
        Converter->>Converter: Convert ESRI geometries to Shapely
        Converter->>Converter: Reproject EPSG:3857 → EPSG:4326
        Converter->>Converter: Create GeoDataFrame + sanitize columns
        Converter-->>Script: GeoDataFrame ready
        Script->>FileSystem: Write .shp file (sanitized name)
    end
    
    Script->>User: Print completion summary
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Poem

🐰 A rabbit's verse on shapefiles and maps...

Web Mercator bends, but we make it right,
Converting coordinates with GeoPandas might,
From JSON to shapes, the features take flight,
Exported to files in export's bright light,
PISD boundaries now mapped just right! 🗺️✨

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch claude/pisd-shape-module-ENBA3
📝 Coding Plan
  • Generate coding plan for human review comments

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jreakin jreakin deleted the claude/pisd-shape-module-ENBA3 branch March 19, 2026 19:54
out_path = OUTPUT_DIR / f"{fname}.shp"

# Shapefile field names are limited to 10 characters
gdf.columns = [c[:10] for c in gdf.columns]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Column names are truncated to 10 characters without checking for duplicates, which can cause silent data loss or corruption when writing the shapefile.
Severity: MEDIUM

Suggested Fix

Before writing the shapefile, check for duplicate column names after truncation. If duplicates are found, either raise a clear error to inform the user or implement a deduplication strategy (e.g., appending a number to duplicate names) to ensure all data is preserved correctly.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/pisd_shape/pfisd_extract_shapefiles.py#L220

Potential issue: The code at `pfisd_extract_shapefiles.py:220` truncates all
GeoDataFrame column names to 10 characters to meet the ESRI Shapefile format limit.
However, it does not check for or handle duplicate column names that may arise from this
truncation. If two or more original column names share the same first 10 characters, the
resulting shapefile may be written incorrectly. Depending on the underlying GDAL/Fiona
library version, this could cause one of the conflicting columns to be silently dropped
or renamed, leading to data loss or corruption. The surrounding `try-except` block would
mask any exceptions raised, preventing the user from being notified of the issue.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +66 to +74
# ESRI uses winding order to distinguish outer vs hole rings.
# For simplicity we treat each ring as its own polygon; Shapely's
# buffer(0) trick cleans up any self-intersections.
polys = [Polygon(r) for r in reprojected if len(r) >= 3]
if not polys:
return None
if len(polys) == 1:
return polys[0].buffer(0)
return MultiPolygon([p.buffer(0) for p in polys]).buffer(0)
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code incorrectly processes ESRI polygons with holes by ignoring winding order, causing holes to be rendered as filled-in areas in the output shapefile.
Severity: HIGH

Suggested Fix

Refactor the polygon processing logic to correctly handle ESRI's winding order. This involves identifying outer rings (clockwise) and their corresponding inner rings/holes (counter-clockwise) and constructing a single valid shapely.geometry.Polygon with both the exterior and interior rings before adding it to the GeoDataFrame. This will preserve the correct topology.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/pisd_shape/pfisd_extract_shapefiles.py#L66-L74

Potential issue: The code processes ESRI polygon rings by treating each ring as an
independent polygon, ignoring the winding order convention that distinguishes outer
boundaries from inner holes. ESRI specifies outer rings as clockwise and holes as
counter-clockwise, while the Shapely library uses the opposite convention. By creating a
`Polygon` from each ring individually, the code incorrectly interprets rings that should
be holes as solid outer boundaries. This results in geometrically incorrect shapefiles
where any areas that should be holes are instead rendered as filled polygons, leading to
incorrect spatial data for users.

Did we get this right? 👍 / 👎 to inform future reviews.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new pisd_shape submodule intended to extract all layers from the PFISD Attendance Boundaries ArcGIS WebMap (inline Feature Collections) and export each layer as an ESRI Shapefile reprojected to WGS84 (EPSG:4326).

Changes:

  • Introduces a CLI script to fetch (remote) or load (local JSON) WebMap data, convert ESRI geometries, and write per-layer shapefiles.
  • Adds an export/ directory placeholder for generated artifacts.
  • Adds a minimal package initializer for the new pisd_shape module.

Reviewed changes

Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.

File Description
src/pisd_shape/pfisd_extract_shapefiles.py Implements WebMap fetch/load, ESRI geometry conversion, and shapefile export logic.
src/pisd_shape/export/.gitkeep Keeps the export/ directory tracked in git.
src/pisd_shape/__init__.py Declares the new package (currently comment-only).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@@ -0,0 +1 @@
# pisd-shape: Pflugerville ISD Attendance Boundary Shapefile Extractor
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a new pisd_shape package, but the build config currently only includes src/ryandata_address_utils in the wheel/sdist. As-is, pisd_shape won’t be distributed/installed even though the PR description suggests adding a submodule; update packaging config if this is intended to be part of the published artifact.

Copilot uses AI. Check for mistakes.
Comment on lines +22 to +28
import os
import sys
from pathlib import Path

import requests
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon, shape
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os and shape are imported but not used. This will fail Ruff (unused-import) in CI; remove the unused imports (and any other unused imports if introduced later).

Suggested change
import os
import sys
from pathlib import Path
import requests
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon, shape
import sys
from pathlib import Path
import requests
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon

Copilot uses AI. Check for mistakes.
Comment on lines +50 to +53
def reproject_ring(ring):
"""Convert a list of [x, y] Web Mercator coords to (lon, lat) WGS84."""
return [transformer.transform(x, y) for x, y in ring]

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All newly added functions in this module are untyped, but the repo’s mypy config sets disallow_untyped_defs = true and CI runs mypy src/. Add type annotations for these functions (or explicitly opt this module out via a targeted mypy override) to avoid failing typecheck.

Copilot uses AI. Check for mistakes.
Comment on lines +40 to +42
# Output into the export/ folder inside this package
OUTPUT_DIR = Path(__file__).resolve().parent / "export"

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OUTPUT_DIR is set inside the installed package directory (.../site-packages/.../export). That location can be read-only in many environments and also encourages generating shapefile artifacts under src/ when developing locally. Consider adding an --output-dir CLI option (defaulting to a path under the current working directory) and writing outputs there instead.

Copilot uses AI. Check for mistakes.
Comment on lines +66 to +74
# ESRI uses winding order to distinguish outer vs hole rings.
# For simplicity we treat each ring as its own polygon; Shapely's
# buffer(0) trick cleans up any self-intersections.
polys = [Polygon(r) for r in reprojected if len(r) >= 3]
if not polys:
return None
if len(polys) == 1:
return polys[0].buffer(0)
return MultiPolygon([p.buffer(0) for p in polys]).buffer(0)
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esri_polygon_to_shapely currently treats every ring as an independent polygon. ESRI rings can represent holes (inner rings) as well as multiple outer shells, so this will produce incorrect geometries when holes are present. Use ring orientation to group holes with their outer ring and build Polygon(shell, holes) (and then MultiPolygon as needed) so boundaries are preserved correctly.

Copilot uses AI. Check for mistakes.
Comment on lines +219 to +221
# Shapefile field names are limited to 10 characters
gdf.columns = [c[:10] for c in gdf.columns]

Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncating column names to 10 characters can create duplicate field names (e.g., two attributes with the same first 10 chars), which can either error during write or silently drop/overwrite data. Consider implementing a de-duplication step that ensures unique <=10-char names (and avoid renaming the geometry column).

Copilot uses AI. Check for mistakes.
All geometry arrives in Web Mercator (EPSG:3857) and is reprojected to WGS84 (EPSG:4326).

Dependencies:
pip install requests geopandas shapely pyproj fiona
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring instructs users to pip install ... dependencies, but this repository is configured as a packaged project with dependencies managed via pyproject.toml/uv. Consider adding these GIS dependencies as an optional dependency group (e.g., pisd-shape) and documenting uv sync --extra ... instead, so installs are reproducible.

Suggested change
pip install requests geopandas shapely pyproj fiona
Managed via `pyproject.toml` as an optional extra, e.g.:
uv sync --extra pisd-shape

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants