feat: Add pisd-shape submodule for PFISD attendance boundary shapefil…#8
feat: Add pisd-shape submodule for PFISD attendance boundary shapefil…#8
Conversation
…e extraction Extracts all layers from the Pflugerville ISD Attendance Boundaries ArcGIS WebMap and saves them as ESRI Shapefiles (WGS84/EPSG:4326). Supports both remote fetch and local JSON file via --local flag. https://claude.ai/code/session_01QADmzhMETTRRbEo8D8pjzj
|
Caution Review failedThe pull request is closed. ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: ASSERTIVE Plan: Pro Run ID: 📒 Files selected for processing (3)
WalkthroughThis PR adds a new Python script to the pisd_shape package that downloads ArcGIS WebMap JSON data, extracts feature geometries, reprojects them from Web Mercator to WGS84, and exports them as ESRI Shapefiles to a local directory, along with a package identifier comment. Changes
Sequence Diagram(s)sequenceDiagram
actor User
participant Script as pfisd_extract_shapefiles.py
participant DataSource as ArcGIS WebMap<br/>(JSON)
participant Converter as Geometry Converter<br/>(Shapely/GeoPandas)
participant FileSystem as Export Directory<br/>(Shapefiles)
User->>Script: Execute (--local flag optional)
Script->>DataSource: Fetch/Load WebMap JSON
DataSource-->>Script: operationalLayers data
Script->>Script: Enumerate operational layers
loop For each layer's featureCollection
Script->>Converter: Extract featureSet.features
Converter->>Converter: Convert ESRI geometries to Shapely
Converter->>Converter: Reproject EPSG:3857 → EPSG:4326
Converter->>Converter: Create GeoDataFrame + sanitize columns
Converter-->>Script: GeoDataFrame ready
Script->>FileSystem: Write .shp file (sanitized name)
end
Script->>User: Print completion summary
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Poem
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
📝 Coding Plan
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
| out_path = OUTPUT_DIR / f"{fname}.shp" | ||
|
|
||
| # Shapefile field names are limited to 10 characters | ||
| gdf.columns = [c[:10] for c in gdf.columns] |
There was a problem hiding this comment.
Bug: Column names are truncated to 10 characters without checking for duplicates, which can cause silent data loss or corruption when writing the shapefile.
Severity: MEDIUM
Suggested Fix
Before writing the shapefile, check for duplicate column names after truncation. If duplicates are found, either raise a clear error to inform the user or implement a deduplication strategy (e.g., appending a number to duplicate names) to ensure all data is preserved correctly.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/pisd_shape/pfisd_extract_shapefiles.py#L220
Potential issue: The code at `pfisd_extract_shapefiles.py:220` truncates all
GeoDataFrame column names to 10 characters to meet the ESRI Shapefile format limit.
However, it does not check for or handle duplicate column names that may arise from this
truncation. If two or more original column names share the same first 10 characters, the
resulting shapefile may be written incorrectly. Depending on the underlying GDAL/Fiona
library version, this could cause one of the conflicting columns to be silently dropped
or renamed, leading to data loss or corruption. The surrounding `try-except` block would
mask any exceptions raised, preventing the user from being notified of the issue.
Did we get this right? 👍 / 👎 to inform future reviews.
| # ESRI uses winding order to distinguish outer vs hole rings. | ||
| # For simplicity we treat each ring as its own polygon; Shapely's | ||
| # buffer(0) trick cleans up any self-intersections. | ||
| polys = [Polygon(r) for r in reprojected if len(r) >= 3] | ||
| if not polys: | ||
| return None | ||
| if len(polys) == 1: | ||
| return polys[0].buffer(0) | ||
| return MultiPolygon([p.buffer(0) for p in polys]).buffer(0) |
There was a problem hiding this comment.
Bug: The code incorrectly processes ESRI polygons with holes by ignoring winding order, causing holes to be rendered as filled-in areas in the output shapefile.
Severity: HIGH
Suggested Fix
Refactor the polygon processing logic to correctly handle ESRI's winding order. This involves identifying outer rings (clockwise) and their corresponding inner rings/holes (counter-clockwise) and constructing a single valid shapely.geometry.Polygon with both the exterior and interior rings before adding it to the GeoDataFrame. This will preserve the correct topology.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/pisd_shape/pfisd_extract_shapefiles.py#L66-L74
Potential issue: The code processes ESRI polygon rings by treating each ring as an
independent polygon, ignoring the winding order convention that distinguishes outer
boundaries from inner holes. ESRI specifies outer rings as clockwise and holes as
counter-clockwise, while the Shapely library uses the opposite convention. By creating a
`Polygon` from each ring individually, the code incorrectly interprets rings that should
be holes as solid outer boundaries. This results in geometrically incorrect shapefiles
where any areas that should be holes are instead rendered as filled polygons, leading to
incorrect spatial data for users.
Did we get this right? 👍 / 👎 to inform future reviews.
There was a problem hiding this comment.
Pull request overview
Adds a new pisd_shape submodule intended to extract all layers from the PFISD Attendance Boundaries ArcGIS WebMap (inline Feature Collections) and export each layer as an ESRI Shapefile reprojected to WGS84 (EPSG:4326).
Changes:
- Introduces a CLI script to fetch (remote) or load (local JSON) WebMap data, convert ESRI geometries, and write per-layer shapefiles.
- Adds an
export/directory placeholder for generated artifacts. - Adds a minimal package initializer for the new
pisd_shapemodule.
Reviewed changes
Copilot reviewed 2 out of 3 changed files in this pull request and generated 7 comments.
| File | Description |
|---|---|
src/pisd_shape/pfisd_extract_shapefiles.py |
Implements WebMap fetch/load, ESRI geometry conversion, and shapefile export logic. |
src/pisd_shape/export/.gitkeep |
Keeps the export/ directory tracked in git. |
src/pisd_shape/__init__.py |
Declares the new package (currently comment-only). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| @@ -0,0 +1 @@ | |||
| # pisd-shape: Pflugerville ISD Attendance Boundary Shapefile Extractor | |||
There was a problem hiding this comment.
This PR introduces a new pisd_shape package, but the build config currently only includes src/ryandata_address_utils in the wheel/sdist. As-is, pisd_shape won’t be distributed/installed even though the PR description suggests adding a submodule; update packaging config if this is intended to be part of the published artifact.
| import os | ||
| import sys | ||
| from pathlib import Path | ||
|
|
||
| import requests | ||
| import geopandas as gpd | ||
| from shapely.geometry import Point, Polygon, MultiPolygon, shape |
There was a problem hiding this comment.
os and shape are imported but not used. This will fail Ruff (unused-import) in CI; remove the unused imports (and any other unused imports if introduced later).
| import os | |
| import sys | |
| from pathlib import Path | |
| import requests | |
| import geopandas as gpd | |
| from shapely.geometry import Point, Polygon, MultiPolygon, shape | |
| import sys | |
| from pathlib import Path | |
| import requests | |
| import geopandas as gpd | |
| from shapely.geometry import Point, Polygon, MultiPolygon |
| def reproject_ring(ring): | ||
| """Convert a list of [x, y] Web Mercator coords to (lon, lat) WGS84.""" | ||
| return [transformer.transform(x, y) for x, y in ring] | ||
|
|
There was a problem hiding this comment.
All newly added functions in this module are untyped, but the repo’s mypy config sets disallow_untyped_defs = true and CI runs mypy src/. Add type annotations for these functions (or explicitly opt this module out via a targeted mypy override) to avoid failing typecheck.
| # Output into the export/ folder inside this package | ||
| OUTPUT_DIR = Path(__file__).resolve().parent / "export" | ||
|
|
There was a problem hiding this comment.
OUTPUT_DIR is set inside the installed package directory (.../site-packages/.../export). That location can be read-only in many environments and also encourages generating shapefile artifacts under src/ when developing locally. Consider adding an --output-dir CLI option (defaulting to a path under the current working directory) and writing outputs there instead.
| # ESRI uses winding order to distinguish outer vs hole rings. | ||
| # For simplicity we treat each ring as its own polygon; Shapely's | ||
| # buffer(0) trick cleans up any self-intersections. | ||
| polys = [Polygon(r) for r in reprojected if len(r) >= 3] | ||
| if not polys: | ||
| return None | ||
| if len(polys) == 1: | ||
| return polys[0].buffer(0) | ||
| return MultiPolygon([p.buffer(0) for p in polys]).buffer(0) |
There was a problem hiding this comment.
esri_polygon_to_shapely currently treats every ring as an independent polygon. ESRI rings can represent holes (inner rings) as well as multiple outer shells, so this will produce incorrect geometries when holes are present. Use ring orientation to group holes with their outer ring and build Polygon(shell, holes) (and then MultiPolygon as needed) so boundaries are preserved correctly.
| # Shapefile field names are limited to 10 characters | ||
| gdf.columns = [c[:10] for c in gdf.columns] | ||
|
|
There was a problem hiding this comment.
Truncating column names to 10 characters can create duplicate field names (e.g., two attributes with the same first 10 chars), which can either error during write or silently drop/overwrite data. Consider implementing a de-duplication step that ensures unique <=10-char names (and avoid renaming the geometry column).
| All geometry arrives in Web Mercator (EPSG:3857) and is reprojected to WGS84 (EPSG:4326). | ||
|
|
||
| Dependencies: | ||
| pip install requests geopandas shapely pyproj fiona |
There was a problem hiding this comment.
The docstring instructs users to pip install ... dependencies, but this repository is configured as a packaged project with dependencies managed via pyproject.toml/uv. Consider adding these GIS dependencies as an optional dependency group (e.g., pisd-shape) and documenting uv sync --extra ... instead, so installs are reproducible.
| pip install requests geopandas shapely pyproj fiona | |
| Managed via `pyproject.toml` as an optional extra, e.g.: | |
| uv sync --extra pisd-shape |
…e extraction
Extracts all layers from the Pflugerville ISD Attendance Boundaries ArcGIS WebMap and saves them as ESRI Shapefiles (WGS84/EPSG:4326). Supports both remote fetch and local JSON file via --local flag.
https://claude.ai/code/session_01QADmzhMETTRRbEo8D8pjzj
Summary by CodeRabbit
Release Notes