Inference Engine

Production-grade, task-agnostic ML inference backend. Serve any trained model — PyTorch, sklearn, ONNX, or anything else — over HTTP without changing the engine's core.

Why Inference Engine?

Deploying a trained model to production means writing the same glue code every time: HTTP routing, job tracking, auth, rate limiting, async queues, observability. Inference Engine handles all of that so you only write the model logic.

Plug in a trained artifact. The engine handles the rest.

Key Features

HTTP inference serving — sync, batch, and async endpoints out of the box
LLM-assisted deployment CLI — deploy a .pkl, .onnx, or PyTorch model in one command
Model versioning + routing — static, canary, and A/B routing strategies
Async job queue — arq + Redis with graceful in-process fallback
Multiple execution backends — CPU thread pool, ONNX Runtime, Triton Inference Server
Authentication + scopes — API key auth with per-tenant rate limiting
Observability — Prometheus metrics, structured JSON logs, OpenTelemetry tracing
Zero-dependency quickstart — runs with SQLite + in-process async, no Docker required

Architecture

Quickstart

git clone <repo>
cd inference-engine
uv sync          # or: pip install -e .

uvicorn app.adapters.http.app:app --reload

curl -X POST http://localhost:8000/predict \
  -H "X-API-Key: dev-key" \
  -H "Content-Type: application/json" \
  -d '{"model": "echo", "version": "v1", "data": "hello"}'
# → {"result": "hello"}

No Docker required. SQLite and an in-process thread pool handle everything locally.

Deploy a Model in One Command

uv sync --extra cli
export GROQ_API_KEY=<your-key>

inference-engine deploy ./sentiment.pkl

The CLI inspects the artifact, generates load() and predict() via LLM, validates the pipeline, and writes the definition file — no boilerplate required.

Non-interactive (CI):

inference-engine deploy ./sentiment.pkl \
  --name sentiment --version v1 \
  --device cpu --routing static \
  --sample-input "this movie was great"

Full Stack (Postgres + Redis)

cp .env.example .env
bash dev.sh

Starts Docker services, runs the DB migration, launches the arq worker, and starts uvicorn — all in one command.

Documentation


Quickstart	Install, run, first request
Guides	Task-based workflows
CLI	Deploy and fix commands
API Reference	Endpoint schemas
Concepts	Architecture and design
Configuration	Environment variables
Integrations	Redis, Postgres, Triton, ONNX
Observability	Metrics, logs, tracing
Development	Contributing and testing

How It Compares

	Inference Engine	BentoML	Ray Serve	SageMaker
Self-hosted	✓	✓	✓	✗
LLM-assisted deploy	✓	✗	✗	✗
Zero-dependency quickstart	✓	✗	✗	✗
Built-in auth + rate limiting	✓	✗	✗	✓
Async job queue	✓	partial	✓	✓

Name		Name	Last commit message	Last commit date
Latest commit History 83 Commits
.github		.github
app		app
deploy		deploy
docs		docs
scripts		scripts
tests		tests
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.python-version		.python-version
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
dev.sh		dev.sh
docker-compose.override.yml		docker-compose.override.yml
docker-compose.yml		docker-compose.yml
mkdocs.yml		mkdocs.yml
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Inference Engine

Why Inference Engine?

Key Features

Architecture

Quickstart

Deploy a Model in One Command

Full Stack (Postgres + Redis)

Documentation

How It Compares

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Inference Engine

Why Inference Engine?

Key Features

Architecture

Quickstart

Deploy a Model in One Command

Full Stack (Postgres + Redis)

Documentation

How It Compares

About

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages