Supplementary Data Service (FastAPI)

A portfolio reimplementation of the ONS Supplementary Data Service (SDS), built with FastAPI, SQLAlchemy, and SQLite. SDS is a backend service used by the UK's Office for National Statistics to publish and serve supplementary reference data (survey schemas + per-period reporting-unit data) consumed by statistical survey workflows.

This project mirrors the public REST surface and domain model of SDS in a single, self-contained service that can be cloned and run in seconds — useful for learning the shape of a real production data API without the GCP footprint.

Built as a portfolio piece. Not affiliated with ONS.

What it does

The service models two core resources:

Schema — a versioned JSON definition of a survey questionnaire. Each survey_id accumulates an auto-incrementing version on every publish.
Dataset — a snapshot of supplementary unit data for one survey period. A dataset contains many unit records keyed by identifier (e.g. a reporting unit reference).

Clients publish schemas and datasets, then look them up by survey, period, or unit identifier.

Endpoints

Method	Path	Description
GET	`/status`	Health check + service version
POST	`/v1/schema?survey_id=...`	Publish a new schema (auto-versioned per survey)
GET	`/v1/schema?survey_id=&version=`	Fetch schema by survey (defaults to latest version)
GET	`/v2/schema?guid=...`	Fetch schema by its unique GUID
GET	`/v1/schema_metadata?survey_id=`	List schema metadata for a survey
GET	`/v1/all_schema_metadata`	List all schema metadata
POST	`/v1/dataset`	Publish a dataset with its unit records
GET	`/v1/dataset_metadata?survey_id=&period_id=`	Filter dataset metadata
GET	`/v1/all_dataset_metadata`	List all dataset metadata
GET	`/v1/unit_data?dataset_id=&identifier=`	Fetch one unit record from a dataset
GET	`/v1/survey_list`	Static survey-id → name mapping

Interactive docs are auto-generated at /docs (Swagger UI) and /redoc.

Quick start

Option 1 — local Python

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 3033

Open http://localhost:3033/docs.

Option 2 — Docker

docker compose up --build

Try it

# Publish a schema
curl -X POST "http://localhost:3033/v1/schema?survey_id=068" \
  -H "Content-Type: application/json" \
  -d @sample_data/sample_schema.json

# Publish a dataset
curl -X POST "http://localhost:3033/v1/dataset" \
  -H "Content-Type: application/json" \
  -d @sample_data/sample_dataset.json

# List dataset metadata
curl "http://localhost:3033/v1/dataset_metadata?survey_id=068"

Run the tests

pytest -v

Project layout

app/
├── main.py              # FastAPI app + router wiring + lifespan
├── config.py            # pydantic-settings configuration
├── database.py          # SQLAlchemy engine + session factory
├── models.py            # ORM models: Schema, Dataset, UnitData
├── schemas.py           # Pydantic request/response models
├── routers/             # HTTP layer (one module per resource)
├── services/            # Business logic
└── repositories/        # Data-access layer
tests/                   # pytest suite (in-memory SQLite per test)
sample_data/             # Example JSON payloads

The layered structure (router → service → repository) mirrors the separation of concerns in the upstream Java/Python-based SDS while keeping the dependency graph trivial.

Differences from the upstream ONS SDS

The real SDS depends on Google Cloud Platform and several pieces of infrastructure that would obscure the API design in a portfolio context. This project keeps the domain model and HTTP contract faithful but swaps out the cloud plumbing.

Aspect	Upstream SDS	This portfolio version
Persistence	Firestore (document DB)	SQLite + SQLAlchemy
Object storage	Google Cloud Storage	Inline JSON in SQLite
Dataset ingestion	CloudEvents → Cloud Function → API	Direct `POST /v1/dataset`
Authentication	GCP IAM + OAuth	None (open for local demo)
API gateway	GCP API Gateway with OpenAPI YAML	FastAPI's auto-generated OpenAPI
Survey list source	GitHub-hosted JSON, fetched on demand	Hard-coded list
Infra-as-code	Terraform	Dockerfile + docker-compose

Future improvements

If extended further, the natural next steps would be:

CI — GitHub Actions running ruff, mypy, and pytest on every PR.
Pluggable storage backend — abstract the repository layer so Firestore (via the emulator) can be dropped in alongside SQLite.
CloudEvents ingestion — accept application/cloudevents+json on a dedicated endpoint, matching the upstream SDX → SDS flow.
Authentication — JWT bearer middleware (OAuth2 password flow in dev, IAM-equivalent in deployed envs).
Schema validation — validate posted unit data against the registered JSON schema for that survey/version.
Pagination — cursor-based pagination on the *_metadata list endpoints.
Object storage — move large schema bodies to a blob store (GCS / S3 / MinIO) and keep only metadata in the relational store.
Observability — structured logging, OpenTelemetry traces, /metrics endpoint for Prometheus.
Pub/Sub fan-out — publish a message on dataset creation so downstream consumers can react.

Tech stack

FastAPI · SQLAlchemy 2.x · Pydantic v2 · pytest · Docker · Python 3.13

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
app		app
sample_data		sample_data
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Supplementary Data Service (FastAPI)

What it does

Endpoints

Quick start

Option 1 — local Python

Option 2 — Docker

Try it

Run the tests

Project layout

Differences from the upstream ONS SDS

Future improvements

Tech stack

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Supplementary Data Service (FastAPI)

What it does

Endpoints

Quick start

Option 1 — local Python

Option 2 — Docker

Try it

Run the tests

Project layout

Differences from the upstream ONS SDS

Future improvements

Tech stack

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages