OpenAI Privacy Filter API

A small, inspectable FastAPI service and Next.js sandbox for running openai/privacy-filter. It detects privacy-related spans in text, applies configurable redaction, and exposes a minimal API that can be deployed behind a server-side web proxy.

This project is intentionally narrow: it is a testable wrapper around a privacy-detection model, not a production policy engine, classifier benchmark, or complete data governance system.

Features

FastAPI API with /health and /v1/filter.
Next.js sandbox UI for testing sample text and inspecting detected spans.
Redaction modes: mask, remove, and annotate.
Optional internal-token protection between the web proxy and API.
Docker images for the API and web app.
AWS App Runner deployment workflow with path-aware API/web redeploys.
Unit tests for schemas, redaction, API behavior, Lambda adapter, web proxy, and UI behavior.
Model files are kept outside source control and can be restored from deployment artifacts.

Architecture

flowchart LR
  User["Browser"] --> Web["Next.js sandbox"]
  Web --> Proxy["/api/filter server route"]
  Proxy --> API["FastAPI /v1/filter"]
  API --> Model["openai/privacy-filter"]
  API --> Redaction["Redaction logic"]
  Redaction --> API
  API --> Proxy
  Proxy --> Web

The browser never calls the model API directly. The Next.js app proxies requests through its server-side route, optionally adding PRIVACY_FILTER_INTERNAL_TOKEN so the API can reject direct public traffic.

Repository Layout

apps/
  api/      FastAPI service, Lambda adapter, redaction logic, and tests
  web/      Next.js App Router sandbox, API proxy, and tests
infra/
  docker/   API and web Dockerfiles
docs/       API contract and deployment notes

Requirements

Python 3.11 or newer
Node.js 24 or newer
npm
Docker, optional but recommended for deployment parity
AWS CLI, only for managing the included Shiftbloom AWS deployment

The real model runtime requires torch and transformers. Unit tests do not download or load the model unless the explicit real-model smoke test is enabled.

Quick Start

Clone the repo and copy the example environment:

git clone https://github.com/shiftbloom-studio/openai-privacy-filter-api.git
cd openai-privacy-filter-api
cp .env.example .env

Start the API without inference dependencies:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install -e "apps/api[dev]"
uvicorn privacy_filter_api.main:app --app-dir apps/api/src --reload

Start the web sandbox in another shell:

npm install
PRIVACY_FILTER_API_URL=http://localhost:8000 npm --workspace apps/web run dev

Open http://localhost:3000.

Running the Real Model Locally

Install inference dependencies:

source .venv/bin/activate
python -m pip install -e "apps/api[dev,inference]"

Run the API with a Hugging Face cache directory:

HF_HOME=.hf-cache uvicorn privacy_filter_api.main:app --app-dir apps/api/src --reload

The first request that needs inference may download model files. To run the smoke test explicitly:

RUN_REAL_MODEL_TESTS=1 python -m pytest apps/api/tests/test_real_model_smoke.py

API Usage

Health check:

curl http://localhost:8000/health

Filter text:

curl -X POST http://localhost:8000/v1/filter \
  -H "content-type: application/json" \
  -d '{
    "text": "My name is Alice Smith and my email is alice@example.com.",
    "mode": "mask",
    "mask_token": "[REDACTED]",
    "include_spans": true
  }'

Example response:

{
  "original_text": "My name is Alice Smith and my email is alice@example.com.",
  "filtered_text": "My name is [REDACTED] and my email is [REDACTED].",
  "spans": [
    {
      "label": "private_person",
      "start": 11,
      "end": 22,
      "text": "Alice Smith",
      "score": 0.97
    }
  ],
  "model": "openai/privacy-filter"
}

Supported labels:

account_number
private_address
private_email
private_person
private_phone
private_url
private_date
secret

Supported modes:

mask: replace each accepted span with mask_token.
remove: remove each accepted span.
annotate: replace each accepted span with [label:value].

See docs/api.md for the API contract.

Configuration

Variable	Used by	Default	Description
`PRIVACY_FILTER_MODEL_ID`	API	`openai/privacy-filter`	Hugging Face model id reported by the service and used when no model path is set.
`PRIVACY_FILTER_MODEL_PATH`	API	empty	Local model directory. Use this for baked or mounted model files.
`PRIVACY_FILTER_RUNTIME`	API	`local`	Runtime label returned by `/health`.
`PRIVACY_FILTER_CORS_ORIGINS`	API	`http://localhost:3000,https://privacy.shiftbloom.studio`	Comma-separated CORS allowlist.
`PRIVACY_FILTER_INTERNAL_TOKEN`	API and web	empty	Optional shared token. The web proxy sends it to the API.
`PRIVACY_FILTER_DEVICE`	API	empty	Optional Transformers device setting.
`PRIVACY_FILTER_REVISION`	API	empty	Optional Hugging Face model revision.
`PRIVACY_FILTER_TRUST_REMOTE_CODE`	API	`false`	Enables remote model code if a future revision requires it.
`HF_HOME`	API	`.hf-cache` locally	Hugging Face cache directory.
`PRIVACY_FILTER_API_URL`	web	`http://localhost:8000`	API base URL used by the Next.js server-side proxy.

Verification

API:

source .venv/bin/activate
python -m ruff check apps/api
python -m pytest apps/api

Web:

npm --workspace apps/web run lint
npm --workspace apps/web run typecheck
npm --workspace apps/web run test
npm --workspace apps/web run build

Docker smoke build:

docker build -f infra/docker/api.Dockerfile --build-arg API_EXTRAS= -t privacy-filter-api:core .
docker build -f infra/docker/web.Dockerfile -t privacy-filter-web .

Docker

Run both services locally with Docker Compose:

docker compose up --build

Build the API image:

docker build -f infra/docker/api.Dockerfile -t privacy-filter-api .

Build the web image:

docker build -f infra/docker/web.Dockerfile -t privacy-filter-web .

The API Dockerfile copies privacy-filter-model/ into /models/privacy-filter. For production offline inference, place the required model files there before building and set PRIVACY_FILTER_MODEL_PATH=/models/privacy-filter.

Required model files:

config.json
model.safetensors
tokenizer.json
tokenizer_config.json
viterbi_calibration.json

Do not commit model files to the repository.

Deployment

The project can run anywhere that supports Docker containers. The included deployment notes cover Docker, AWS Lambda container images, Google Cloud Run, Cloudflare routing, and the current AWS App Runner setup. See docs/deployment.md.

Shiftbloom AWS App Runner

This repository includes a GitHub Actions workflow for the current Shiftbloom deployment:

API App Runner service: privacy-filter-api
Web App Runner service: privacy-filter-web
AWS region: eu-central-1
ECR repositories: privacy-filter-api, privacy-filter-web
Model artifact bucket: shiftbloom-privacy-filter-build-349744179866-eu-central-1

On pushes to main, deploy-aws.yml detects changed paths and deploys only the affected surface. It can also be run manually with all, api, web, or auto.

Forks should replace the AWS account id, service ARNs, ECR repositories, artifact bucket, and OIDC role with their own infrastructure.

Security and Privacy Notes

Treat model output as advisory. Validate behavior against your data and policy requirements.
Do not send sensitive production data to infrastructure you do not control.
Use PRIVACY_FILTER_INTERNAL_TOKEN when the API is reachable outside a private network.
Keep CORS origins narrow in production.
Keep model files, caches, credentials, and deployment artifacts out of source control.
Review upstream model terms and dependencies before production use.

Contributing

Contributions are welcome. Keep changes focused and include tests for behavior changes.

Suggested flow:

Open an issue or draft PR for larger changes.
Run the relevant verification commands locally.
Keep API contract changes documented in docs/api.md.
Keep deployment changes documented in docs/deployment.md.

Please avoid committing generated model files, local caches, credentials, or machine-specific build artifacts.

License

Apache License 2.0. See LICENSE.

Acknowledgements

This project wraps openai/privacy-filter through standard Python and web application tooling. Model behavior, supported labels, and runtime requirements may change with upstream model revisions.

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
.github		.github
apps		apps
docs		docs
infra/docker		infra/docker
privacy-filter-model		privacy-filter-model
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
package-lock.json		package-lock.json
package.json		package.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OpenAI Privacy Filter API

Features

Architecture

Repository Layout

Requirements

Quick Start

Running the Real Model Locally

API Usage

Configuration

Verification

Docker

Deployment

Shiftbloom AWS App Runner

Security and Privacy Notes

Contributing

License

Acknowledgements

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

OpenAI Privacy Filter API

Features

Architecture

Repository Layout

Requirements

Quick Start

Running the Real Model Locally

API Usage

Configuration

Verification

Docker

Deployment

Shiftbloom AWS App Runner

Security and Privacy Notes

Contributing

License

Acknowledgements

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages