-
Notifications
You must be signed in to change notification settings - Fork 0
feat: Add pisd-shape submodule for PFISD attendance boundary shapefil… #8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| # pisd-shape: Pflugerville ISD Attendance Boundary Shapefile Extractor | ||
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,241 @@ | ||||||||||||||||||||||||||||
| """ | ||||||||||||||||||||||||||||
| pfisd_extract_shapefiles.py | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Extracts all layers from the Pflugerville ISD Attendance Boundaries | ||||||||||||||||||||||||||||
| ArcGIS Experience Builder app and saves them as shapefiles. | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Source: https://experience.arcgis.com/experience/0bc78994af534cd1a703c8959abeac9d | ||||||||||||||||||||||||||||
| WebMap: https://Pflugervilleisd.maps.arcgis.com/sharing/rest/content/items/bb587c1043a949cca04f1b1904c235e3/data | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| The layers are embedded as inline Feature Collections (no FeatureServer endpoint). | ||||||||||||||||||||||||||||
| All geometry arrives in Web Mercator (EPSG:3857) and is reprojected to WGS84 (EPSG:4326). | ||||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||||
| Dependencies: | ||||||||||||||||||||||||||||
| pip install requests geopandas shapely pyproj fiona | ||||||||||||||||||||||||||||
|
||||||||||||||||||||||||||||
| pip install requests geopandas shapely pyproj fiona | |
| Managed via `pyproject.toml` as an optional extra, e.g.: | |
| uv sync --extra pisd-shape |
Copilot
AI
Mar 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
os and shape are imported but not used. This will fail Ruff (unused-import) in CI; remove the unused imports (and any other unused imports if introduced later).
| import os | |
| import sys | |
| from pathlib import Path | |
| import requests | |
| import geopandas as gpd | |
| from shapely.geometry import Point, Polygon, MultiPolygon, shape | |
| import sys | |
| from pathlib import Path | |
| import requests | |
| import geopandas as gpd | |
| from shapely.geometry import Point, Polygon, MultiPolygon |
Copilot
AI
Mar 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OUTPUT_DIR is set inside the installed package directory (.../site-packages/.../export). That location can be read-only in many environments and also encourages generating shapefile artifacts under src/ when developing locally. Consider adding an --output-dir CLI option (defaulting to a path under the current working directory) and writing outputs there instead.
Copilot
AI
Mar 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All newly added functions in this module are untyped, but the repo’s mypy config sets disallow_untyped_defs = true and CI runs mypy src/. Add type annotations for these functions (or explicitly opt this module out via a targeted mypy override) to avoid failing typecheck.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: The code incorrectly processes ESRI polygons with holes by ignoring winding order, causing holes to be rendered as filled-in areas in the output shapefile.
Severity: HIGH
Suggested Fix
Refactor the polygon processing logic to correctly handle ESRI's winding order. This involves identifying outer rings (clockwise) and their corresponding inner rings/holes (counter-clockwise) and constructing a single valid shapely.geometry.Polygon with both the exterior and interior rings before adding it to the GeoDataFrame. This will preserve the correct topology.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/pisd_shape/pfisd_extract_shapefiles.py#L66-L74
Potential issue: The code processes ESRI polygon rings by treating each ring as an
independent polygon, ignoring the winding order convention that distinguishes outer
boundaries from inner holes. ESRI specifies outer rings as clockwise and holes as
counter-clockwise, while the Shapely library uses the opposite convention. By creating a
`Polygon` from each ring individually, the code incorrectly interprets rings that should
be holes as solid outer boundaries. This results in geometrically incorrect shapefiles
where any areas that should be holes are instead rendered as filled polygons, leading to
incorrect spatial data for users.
Did we get this right? 👍 / 👎 to inform future reviews.
Copilot
AI
Mar 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
esri_polygon_to_shapely currently treats every ring as an independent polygon. ESRI rings can represent holes (inner rings) as well as multiple outer shells, so this will produce incorrect geometries when holes are present. Use ring orientation to group holes with their outer ring and build Polygon(shell, holes) (and then MultiPolygon as needed) so boundaries are preserved correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bug: Column names are truncated to 10 characters without checking for duplicates, which can cause silent data loss or corruption when writing the shapefile.
Severity: MEDIUM
Suggested Fix
Before writing the shapefile, check for duplicate column names after truncation. If duplicates are found, either raise a clear error to inform the user or implement a deduplication strategy (e.g., appending a number to duplicate names) to ensure all data is preserved correctly.
Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.
Location: src/pisd_shape/pfisd_extract_shapefiles.py#L220
Potential issue: The code at `pfisd_extract_shapefiles.py:220` truncates all
GeoDataFrame column names to 10 characters to meet the ESRI Shapefile format limit.
However, it does not check for or handle duplicate column names that may arise from this
truncation. If two or more original column names share the same first 10 characters, the
resulting shapefile may be written incorrectly. Depending on the underlying GDAL/Fiona
library version, this could cause one of the conflicting columns to be silently dropped
or renamed, leading to data loss or corruption. The surrounding `try-except` block would
mask any exceptions raised, preventing the user from being notified of the issue.
Did we get this right? 👍 / 👎 to inform future reviews.
Copilot
AI
Mar 19, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Truncating column names to 10 characters can create duplicate field names (e.g., two attributes with the same first 10 chars), which can either error during write or silently drop/overwrite data. Consider implementing a de-duplication step that ensures unique <=10-char names (and avoid renaming the geometry column).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR introduces a new
pisd_shapepackage, but the build config currently only includessrc/ryandata_address_utilsin the wheel/sdist. As-is,pisd_shapewon’t be distributed/installed even though the PR description suggests adding a submodule; update packaging config if this is intended to be part of the published artifact.