Skip to content

SriHV/sds-fastapi

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Supplementary Data Service (FastAPI)

A portfolio reimplementation of the ONS Supplementary Data Service (SDS), built with FastAPI, SQLAlchemy, and SQLite. SDS is a backend service used by the UK's Office for National Statistics to publish and serve supplementary reference data (survey schemas + per-period reporting-unit data) consumed by statistical survey workflows.

This project mirrors the public REST surface and domain model of SDS in a single, self-contained service that can be cloned and run in seconds — useful for learning the shape of a real production data API without the GCP footprint.

Built as a portfolio piece. Not affiliated with ONS.


What it does

The service models two core resources:

  • Schema — a versioned JSON definition of a survey questionnaire. Each survey_id accumulates an auto-incrementing version on every publish.
  • Dataset — a snapshot of supplementary unit data for one survey period. A dataset contains many unit records keyed by identifier (e.g. a reporting unit reference).

Clients publish schemas and datasets, then look them up by survey, period, or unit identifier.

Endpoints

Method Path Description
GET /status Health check + service version
POST /v1/schema?survey_id=... Publish a new schema (auto-versioned per survey)
GET /v1/schema?survey_id=&version= Fetch schema by survey (defaults to latest version)
GET /v2/schema?guid=... Fetch schema by its unique GUID
GET /v1/schema_metadata?survey_id= List schema metadata for a survey
GET /v1/all_schema_metadata List all schema metadata
POST /v1/dataset Publish a dataset with its unit records
GET /v1/dataset_metadata?survey_id=&period_id= Filter dataset metadata
GET /v1/all_dataset_metadata List all dataset metadata
GET /v1/unit_data?dataset_id=&identifier= Fetch one unit record from a dataset
GET /v1/survey_list Static survey-id → name mapping

Interactive docs are auto-generated at /docs (Swagger UI) and /redoc.


Quick start

Option 1 — local Python

python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
uvicorn app.main:app --reload --port 3033

Open http://localhost:3033/docs.

Option 2 — Docker

docker compose up --build

Try it

# Publish a schema
curl -X POST "http://localhost:3033/v1/schema?survey_id=068" \
  -H "Content-Type: application/json" \
  -d @sample_data/sample_schema.json

# Publish a dataset
curl -X POST "http://localhost:3033/v1/dataset" \
  -H "Content-Type: application/json" \
  -d @sample_data/sample_dataset.json

# List dataset metadata
curl "http://localhost:3033/v1/dataset_metadata?survey_id=068"

Run the tests

pytest -v

Project layout

app/
├── main.py              # FastAPI app + router wiring + lifespan
├── config.py            # pydantic-settings configuration
├── database.py          # SQLAlchemy engine + session factory
├── models.py            # ORM models: Schema, Dataset, UnitData
├── schemas.py           # Pydantic request/response models
├── routers/             # HTTP layer (one module per resource)
├── services/            # Business logic
└── repositories/        # Data-access layer
tests/                   # pytest suite (in-memory SQLite per test)
sample_data/             # Example JSON payloads

The layered structure (router → service → repository) mirrors the separation of concerns in the upstream Java/Python-based SDS while keeping the dependency graph trivial.


Differences from the upstream ONS SDS

The real SDS depends on Google Cloud Platform and several pieces of infrastructure that would obscure the API design in a portfolio context. This project keeps the domain model and HTTP contract faithful but swaps out the cloud plumbing.

Aspect Upstream SDS This portfolio version
Persistence Firestore (document DB) SQLite + SQLAlchemy
Object storage Google Cloud Storage Inline JSON in SQLite
Dataset ingestion CloudEvents → Cloud Function → API Direct POST /v1/dataset
Authentication GCP IAM + OAuth None (open for local demo)
API gateway GCP API Gateway with OpenAPI YAML FastAPI's auto-generated OpenAPI
Survey list source GitHub-hosted JSON, fetched on demand Hard-coded list
Infra-as-code Terraform Dockerfile + docker-compose

Future improvements

If extended further, the natural next steps would be:

  • CI — GitHub Actions running ruff, mypy, and pytest on every PR.
  • Pluggable storage backend — abstract the repository layer so Firestore (via the emulator) can be dropped in alongside SQLite.
  • CloudEvents ingestion — accept application/cloudevents+json on a dedicated endpoint, matching the upstream SDX → SDS flow.
  • Authentication — JWT bearer middleware (OAuth2 password flow in dev, IAM-equivalent in deployed envs).
  • Schema validation — validate posted unit data against the registered JSON schema for that survey/version.
  • Pagination — cursor-based pagination on the *_metadata list endpoints.
  • Object storage — move large schema bodies to a blob store (GCS / S3 / MinIO) and keep only metadata in the relational store.
  • Observability — structured logging, OpenTelemetry traces, /metrics endpoint for Prometheus.
  • Pub/Sub fan-out — publish a message on dataset creation so downstream consumers can react.

Tech stack

FastAPI · SQLAlchemy 2.x · Pydantic v2 · pytest · Docker · Python 3.13

About

FastAPI reimplementation of the ONS Supplementary Data Service - publish and serve versioned survey schemas and per-period reporting-unit datasets via a clean REST API (SQLAlchemy + SQLite, no GCP)

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors