A portfolio reimplementation of the ONS Supplementary Data Service (SDS), built with FastAPI, SQLAlchemy, and SQLite. SDS is a backend service used by the UK's Office for National Statistics to publish and serve supplementary reference data (survey schemas + per-period reporting-unit data) consumed by statistical survey workflows.
This project mirrors the public REST surface and domain model of SDS in a single, self-contained service that can be cloned and run in seconds — useful for learning the shape of a real production data API without the GCP footprint.
Built as a portfolio piece. Not affiliated with ONS.
The service models two core resources:
- Schema — a versioned JSON definition of a survey questionnaire. Each
survey_idaccumulates an auto-incrementing version on every publish. - Dataset — a snapshot of supplementary unit data for one survey period. A dataset contains many unit records keyed by
identifier(e.g. a reporting unit reference).
Clients publish schemas and datasets, then look them up by survey, period, or unit identifier.
| Method | Path | Description |
|---|---|---|
| GET | /status |
Health check + service version |
| POST | /v1/schema?survey_id=... |
Publish a new schema (auto-versioned per survey) |
| GET | /v1/schema?survey_id=&version= |
Fetch schema by survey (defaults to latest version) |
| GET | /v2/schema?guid=... |
Fetch schema by its unique GUID |
| GET | /v1/schema_metadata?survey_id= |
List schema metadata for a survey |
| GET | /v1/all_schema_metadata |
List all schema metadata |
| POST | /v1/dataset |
Publish a dataset with its unit records |
| GET | /v1/dataset_metadata?survey_id=&period_id= |
Filter dataset metadata |
| GET | /v1/all_dataset_metadata |
List all dataset metadata |
| GET | /v1/unit_data?dataset_id=&identifier= |
Fetch one unit record from a dataset |
| GET | /v1/survey_list |
Static survey-id → name mapping |
Interactive docs are auto-generated at /docs (Swagger UI) and /redoc.
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 3033Open http://localhost:3033/docs.
docker compose up --build# Publish a schema
curl -X POST "http://localhost:3033/v1/schema?survey_id=068" \
-H "Content-Type: application/json" \
-d @sample_data/sample_schema.json
# Publish a dataset
curl -X POST "http://localhost:3033/v1/dataset" \
-H "Content-Type: application/json" \
-d @sample_data/sample_dataset.json
# List dataset metadata
curl "http://localhost:3033/v1/dataset_metadata?survey_id=068"pytest -vapp/
├── main.py # FastAPI app + router wiring + lifespan
├── config.py # pydantic-settings configuration
├── database.py # SQLAlchemy engine + session factory
├── models.py # ORM models: Schema, Dataset, UnitData
├── schemas.py # Pydantic request/response models
├── routers/ # HTTP layer (one module per resource)
├── services/ # Business logic
└── repositories/ # Data-access layer
tests/ # pytest suite (in-memory SQLite per test)
sample_data/ # Example JSON payloads
The layered structure (router → service → repository) mirrors the separation of concerns in the upstream Java/Python-based SDS while keeping the dependency graph trivial.
The real SDS depends on Google Cloud Platform and several pieces of infrastructure that would obscure the API design in a portfolio context. This project keeps the domain model and HTTP contract faithful but swaps out the cloud plumbing.
| Aspect | Upstream SDS | This portfolio version |
|---|---|---|
| Persistence | Firestore (document DB) | SQLite + SQLAlchemy |
| Object storage | Google Cloud Storage | Inline JSON in SQLite |
| Dataset ingestion | CloudEvents → Cloud Function → API | Direct POST /v1/dataset |
| Authentication | GCP IAM + OAuth | None (open for local demo) |
| API gateway | GCP API Gateway with OpenAPI YAML | FastAPI's auto-generated OpenAPI |
| Survey list source | GitHub-hosted JSON, fetched on demand | Hard-coded list |
| Infra-as-code | Terraform | Dockerfile + docker-compose |
If extended further, the natural next steps would be:
- CI — GitHub Actions running
ruff,mypy, andpyteston every PR. - Pluggable storage backend — abstract the repository layer so Firestore (via the emulator) can be dropped in alongside SQLite.
- CloudEvents ingestion — accept
application/cloudevents+jsonon a dedicated endpoint, matching the upstream SDX → SDS flow. - Authentication — JWT bearer middleware (OAuth2 password flow in dev, IAM-equivalent in deployed envs).
- Schema validation — validate posted unit data against the registered JSON schema for that survey/version.
- Pagination — cursor-based pagination on the
*_metadatalist endpoints. - Object storage — move large schema bodies to a blob store (GCS / S3 / MinIO) and keep only metadata in the relational store.
- Observability — structured logging, OpenTelemetry traces,
/metricsendpoint for Prometheus. - Pub/Sub fan-out — publish a message on dataset creation so downstream consumers can react.
FastAPI · SQLAlchemy 2.x · Pydantic v2 · pytest · Docker · Python 3.13