Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/pisd_shape/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
# pisd-shape: Pflugerville ISD Attendance Boundary Shapefile Extractor
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR introduces a new pisd_shape package, but the build config currently only includes src/ryandata_address_utils in the wheel/sdist. As-is, pisd_shape won’t be distributed/installed even though the PR description suggests adding a submodule; update packaging config if this is intended to be part of the published artifact.

Copilot uses AI. Check for mistakes.
Empty file added src/pisd_shape/export/.gitkeep
Empty file.
241 changes: 241 additions & 0 deletions src/pisd_shape/pfisd_extract_shapefiles.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,241 @@
"""
pfisd_extract_shapefiles.py

Extracts all layers from the Pflugerville ISD Attendance Boundaries
ArcGIS Experience Builder app and saves them as shapefiles.

Source: https://experience.arcgis.com/experience/0bc78994af534cd1a703c8959abeac9d
WebMap: https://Pflugervilleisd.maps.arcgis.com/sharing/rest/content/items/bb587c1043a949cca04f1b1904c235e3/data

The layers are embedded as inline Feature Collections (no FeatureServer endpoint).
All geometry arrives in Web Mercator (EPSG:3857) and is reprojected to WGS84 (EPSG:4326).

Dependencies:
pip install requests geopandas shapely pyproj fiona
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring instructs users to pip install ... dependencies, but this repository is configured as a packaged project with dependencies managed via pyproject.toml/uv. Consider adding these GIS dependencies as an optional dependency group (e.g., pisd-shape) and documenting uv sync --extra ... instead, so installs are reproducible.

Suggested change
pip install requests geopandas shapely pyproj fiona
Managed via `pyproject.toml` as an optional extra, e.g.:
uv sync --extra pisd-shape

Copilot uses AI. Check for mistakes.

Output:
One .shp file per layer, written to ./export/ (relative to this script)
"""

import argparse
import json
import os
import sys
from pathlib import Path

import requests
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon, shape
Comment on lines +22 to +28
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os and shape are imported but not used. This will fail Ruff (unused-import) in CI; remove the unused imports (and any other unused imports if introduced later).

Suggested change
import os
import sys
from pathlib import Path
import requests
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon, shape
import sys
from pathlib import Path
import requests
import geopandas as gpd
from shapely.geometry import Point, Polygon, MultiPolygon

Copilot uses AI. Check for mistakes.
from pyproj import Transformer

# ─────────────────────────────────────────────
# CONFIG
# ─────────────────────────────────────────────

WEBMAP_URL = (
"https://Pflugervilleisd.maps.arcgis.com/sharing/rest/content/items/"
"bb587c1043a949cca04f1b1904c235e3/data?f=json"
)

# Output into the export/ folder inside this package
OUTPUT_DIR = Path(__file__).resolve().parent / "export"

Comment on lines +40 to +42
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OUTPUT_DIR is set inside the installed package directory (.../site-packages/.../export). That location can be read-only in many environments and also encourages generating shapefile artifacts under src/ when developing locally. Consider adding an --output-dir CLI option (defaulting to a path under the current working directory) and writing outputs there instead.

Copilot uses AI. Check for mistakes.
# Source CRS (Web Mercator) → Target CRS (WGS84 lat/lon)
transformer = Transformer.from_crs("EPSG:3857", "EPSG:4326", always_xy=True)

# ─────────────────────────────────────────────
# GEOMETRY HELPERS
# ─────────────────────────────────────────────

def reproject_ring(ring):
"""Convert a list of [x, y] Web Mercator coords to (lon, lat) WGS84."""
return [transformer.transform(x, y) for x, y in ring]

Comment on lines +50 to +53
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All newly added functions in this module are untyped, but the repo’s mypy config sets disallow_untyped_defs = true and CI runs mypy src/. Add type annotations for these functions (or explicitly opt this module out via a targeted mypy override) to avoid failing typecheck.

Copilot uses AI. Check for mistakes.

def esri_polygon_to_shapely(esri_geom):
"""
Convert an ESRI polygon geometry dict (with 'rings') to a Shapely geometry.
Handles single polygons and multipolygons (multiple outer rings).
"""
rings = esri_geom.get("rings", [])
if not rings:
return None

reprojected = [reproject_ring(r) for r in rings]

# ESRI uses winding order to distinguish outer vs hole rings.
# For simplicity we treat each ring as its own polygon; Shapely's
# buffer(0) trick cleans up any self-intersections.
polys = [Polygon(r) for r in reprojected if len(r) >= 3]
if not polys:
return None
if len(polys) == 1:
return polys[0].buffer(0)
return MultiPolygon([p.buffer(0) for p in polys]).buffer(0)
Comment on lines +66 to +74
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: The code incorrectly processes ESRI polygons with holes by ignoring winding order, causing holes to be rendered as filled-in areas in the output shapefile.
Severity: HIGH

Suggested Fix

Refactor the polygon processing logic to correctly handle ESRI's winding order. This involves identifying outer rings (clockwise) and their corresponding inner rings/holes (counter-clockwise) and constructing a single valid shapely.geometry.Polygon with both the exterior and interior rings before adding it to the GeoDataFrame. This will preserve the correct topology.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/pisd_shape/pfisd_extract_shapefiles.py#L66-L74

Potential issue: The code processes ESRI polygon rings by treating each ring as an
independent polygon, ignoring the winding order convention that distinguishes outer
boundaries from inner holes. ESRI specifies outer rings as clockwise and holes as
counter-clockwise, while the Shapely library uses the opposite convention. By creating a
`Polygon` from each ring individually, the code incorrectly interprets rings that should
be holes as solid outer boundaries. This results in geometrically incorrect shapefiles
where any areas that should be holes are instead rendered as filled polygons, leading to
incorrect spatial data for users.

Did we get this right? 👍 / 👎 to inform future reviews.

Comment on lines +66 to +74
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

esri_polygon_to_shapely currently treats every ring as an independent polygon. ESRI rings can represent holes (inner rings) as well as multiple outer shells, so this will produce incorrect geometries when holes are present. Use ring orientation to group holes with their outer ring and build Polygon(shell, holes) (and then MultiPolygon as needed) so boundaries are preserved correctly.

Copilot uses AI. Check for mistakes.


def esri_point_to_shapely(esri_geom):
"""Convert an ESRI point geometry dict to a Shapely Point in WGS84."""
x = esri_geom.get("x")
y = esri_geom.get("y")
if x is None or y is None:
return None
lon, lat = transformer.transform(x, y)
return Point(lon, lat)


# ─────────────────────────────────────────────
# LAYER EXTRACTION
# ─────────────────────────────────────────────

def extract_layer(layer_data, layer_title):
"""
Given a single ESRI featureCollection layer dict, return a GeoDataFrame.
Handles both polygon and point geometry types.
"""
layer_def = layer_data.get("layerDefinition", {})
feature_set = layer_data.get("featureSet", {})
geom_type = layer_def.get("geometryType", "")
features = feature_set.get("features", [])

if not features:
print(f" [WARN] No features found in layer: {layer_title}")
return None

rows = []
skipped = 0

for feat in features:
esri_geom = feat.get("geometry", {})
attrs = feat.get("attributes", {})

if geom_type == "esriGeometryPolygon":
geom = esri_polygon_to_shapely(esri_geom)
elif geom_type == "esriGeometryPoint":
geom = esri_point_to_shapely(esri_geom)
else:
print(f" [WARN] Unsupported geometry type '{geom_type}' — skipping feature.")
skipped += 1
continue

if geom is None or geom.is_empty:
skipped += 1
continue

row = {"geometry": geom}
row.update(attrs)
rows.append(row)

if skipped:
print(f" [INFO] Skipped {skipped} invalid/empty features.")

if not rows:
print(f" [WARN] No valid geometries extracted from: {layer_title}")
return None

gdf = gpd.GeoDataFrame(rows, crs="EPSG:4326")
return gdf


# ─────────────────────────────────────────────
# MAIN
# ─────────────────────────────────────────────

def safe_filename(title):
"""Strip characters that are unsafe in filenames."""
keep = "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789_- "
name = "".join(c if c in keep else "_" for c in title)
return name.strip().replace(" ", "_")[:60]


def main():
parser = argparse.ArgumentParser(description="Extract PFISD attendance boundary shapefiles")
parser.add_argument(
"--local", "-l",
type=str,
default=None,
help="Path to a local WebMap JSON file (skip HTTP fetch)",
)
args = parser.parse_args()

OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
print(f"Output directory: {OUTPUT_DIR.resolve()}\n")

# ------------------------------------------------------------------
# 1. Fetch the WebMap JSON (or load from local file)
# ------------------------------------------------------------------
if args.local:
local_path = Path(args.local)
print(f"Loading WebMap data from local file: {local_path}")
try:
with open(local_path) as f:
webmap = json.load(f)
except Exception as e:
print(f"[ERROR] Failed to load local file: {e}")
sys.exit(1)
else:
print("Fetching WebMap data from ArcGIS Online...")
try:
resp = requests.get(WEBMAP_URL, timeout=30)
resp.raise_for_status()
webmap = resp.json()
except Exception as e:
print(f"[ERROR] Failed to fetch WebMap: {e}")
sys.exit(1)

operational_layers = webmap.get("operationalLayers", [])
if not operational_layers:
print("[ERROR] No operationalLayers found in WebMap JSON.")
sys.exit(1)

print(f"Found {len(operational_layers)} operational layer(s).\n")

# ------------------------------------------------------------------
# 2. Iterate layers and export each as a shapefile
# ------------------------------------------------------------------
exported = 0

for layer in operational_layers:
layer_title = layer.get("title", "unnamed_layer")
feature_collection = layer.get("featureCollection", {})
sub_layers = feature_collection.get("layers", [])

if not sub_layers:
print(f"[SKIP] '{layer_title}' — no featureCollection layers found.")
continue

print(f"Processing: {layer_title}")

for i, sub_layer in enumerate(sub_layers):
suffix = f"_{i}" if len(sub_layers) > 1 else ""
fname = safe_filename(layer_title) + suffix

gdf = extract_layer(sub_layer, layer_title)
if gdf is None:
continue

out_path = OUTPUT_DIR / f"{fname}.shp"

# Shapefile field names are limited to 10 characters
gdf.columns = [c[:10] for c in gdf.columns]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Column names are truncated to 10 characters without checking for duplicates, which can cause silent data loss or corruption when writing the shapefile.
Severity: MEDIUM

Suggested Fix

Before writing the shapefile, check for duplicate column names after truncation. If duplicates are found, either raise a clear error to inform the user or implement a deduplication strategy (e.g., appending a number to duplicate names) to ensure all data is preserved correctly.

Prompt for AI Agent
Review the code at the location below. A potential bug has been identified by an AI
agent.
Verify if this is a real issue. If it is, propose a fix; if not, explain why it's not
valid.

Location: src/pisd_shape/pfisd_extract_shapefiles.py#L220

Potential issue: The code at `pfisd_extract_shapefiles.py:220` truncates all
GeoDataFrame column names to 10 characters to meet the ESRI Shapefile format limit.
However, it does not check for or handle duplicate column names that may arise from this
truncation. If two or more original column names share the same first 10 characters, the
resulting shapefile may be written incorrectly. Depending on the underlying GDAL/Fiona
library version, this could cause one of the conflicting columns to be silently dropped
or renamed, leading to data loss or corruption. The surrounding `try-except` block would
mask any exceptions raised, preventing the user from being notified of the issue.

Did we get this right? 👍 / 👎 to inform future reviews.


Comment on lines +219 to +221
Copy link

Copilot AI Mar 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Truncating column names to 10 characters can create duplicate field names (e.g., two attributes with the same first 10 chars), which can either error during write or silently drop/overwrite data. Consider implementing a de-duplication step that ensures unique <=10-char names (and avoid renaming the geometry column).

Copilot uses AI. Check for mistakes.
try:
gdf.to_file(out_path, driver="ESRI Shapefile")
print(f" -> Saved {len(gdf)} features -> {out_path}")
exported += 1
except Exception as e:
print(f" [ERROR] Could not write {out_path}: {e}")

print()

# ------------------------------------------------------------------
# 3. Summary
# ------------------------------------------------------------------
print("-" * 50)
print(f"Done. {exported} shapefile(s) written to: {OUTPUT_DIR.resolve()}")
print("\nProjection: WGS84 (EPSG:4326)")
print("Open in QGIS, ArcGIS Pro, or any GIS tool that reads .shp files.")


if __name__ == "__main__":
main()
Loading