diff --git a/GUARDRAILS.md b/GUARDRAILS.md new file mode 100644 index 0000000..6244e29 --- /dev/null +++ b/GUARDRAILS.md @@ -0,0 +1,31 @@ +# GUARDRAILS.md — RyanData-Address-Utils + + +## Always + +- Add type hints to all function signatures +- Format with `ruff format` before committing +- Write or update tests for every code change +- Use structured logging (`logging` module) — never `print()` +- Raise domain errors (`RyanDataAddressError`, `RyanDataValidationError`) + +## Ask First + +- Adding a new package dependency +- Changing a Pydantic model field (may break downstream consumers) +- Changing the public API in `__init__.py` +- Altering shapefile schema or PISD boundary logic + +## Never + +- Store secrets, tokens, or credentials in source code +- Use bare `except:` without specifying the exception type +- Commit `.env` files or production data files +- Use `print()` as a substitute for logging +- Access `src/pisd_shape/data/` files outside the `pisd_shape` module + +## Data Sensitivity + +- Voter file data and shapefiles are **not** committed to git +- Test fixtures use synthetic or publicly available data only +- Production data paths are configured via environment variables diff --git a/RUNBOOK.md b/RUNBOOK.md new file mode 100644 index 0000000..d90146e --- /dev/null +++ b/RUNBOOK.md @@ -0,0 +1,58 @@ +# RUNBOOK.md — RyanData-Address-Utils + + +## Setup + +```bash +git clone +cd RyanData-Address-Utils +uv sync +uv run pytest # verify install +``` + +## Common Operations + +### Parse a batch of addresses + +```python +from ryandata_address_utils import AddressService +service = AddressService() +result = service.parse("123 Main St, Plano TX 75023") +``` + +### Parse a DataFrame column + +```python +df = service.parse_dataframe(df, address_col="RES_STREET", prefix="addr_") +``` + +### Run PISD shapefile extraction + +```bash +cd src/pisd_shape +uv run python -m pisd_shape.main +``` + +## Linting & Formatting + +```bash +uv run ruff check src tests # lint +uv run ruff format src tests # format +uv run mypy src # type check +``` + +## Dependency Updates + +```bash +uv lock --upgrade # update lock file +uv sync # reinstall +uv run pytest # verify nothing broke +``` + +## Troubleshooting + +| Symptom | Fix | +|---------|-----| +| `ModuleNotFoundError` | Run `uv sync` | +| Parser returns `None` | Check address format; try `usaddress` backend | +| Shapefile import fails | Ensure `geopandas` extras installed: `uv sync --extra geo` | diff --git a/TESTING.md b/TESTING.md new file mode 100644 index 0000000..a58ba87 --- /dev/null +++ b/TESTING.md @@ -0,0 +1,42 @@ +# TESTING.md — RyanData-Address-Utils + + +## Framework + +- **Runner:** pytest +- **Property-based:** Hypothesis +- **Coverage target:** 80%+ (src/) + +## Commands + +```bash +uv run pytest # all tests +uv run pytest --cov=src # with coverage +uv run pytest -x # stop on first failure +uv run pytest -k "test_parse" # filter by name +uv run pytest tests/unit/ # unit tests only +uv run pytest tests/property/ # Hypothesis tests only +``` + +## Test Layout + +``` +tests/ +├── unit/ # Pure unit tests — no I/O, no network +├── integration/ # Tests that hit the filesystem or pandas +├── property/ # Hypothesis property-based tests +└── conftest.py # Shared fixtures +``` + +## Test Standards + +- Every public function has at least one unit test +- Parsers and validators get Hypothesis `@given` tests +- Fixtures live in `conftest.py`, not in test files +- No `print()` — use `caplog` or `capsys` +- Mock external I/O at the boundary (file reads, HTTP) + +## CI + +Tests run automatically on every PR via GitHub Actions. +All checks must pass before merge.