Skip to content

agent-axiom/aximo

Repository files navigation

Aximo

Coverage

aximo is a CPU-first STT microservice for Russian and English built as a Rust Cargo workspace. It exposes:

  • POST /v1/transcriptions for short audio
  • GET /v1/realtime for realtime WebSocket streaming
  • GET /openapi.json for the OpenAPI schema
  • GET /docs/ for Swagger UI

Workspace

  • crates/aximo: HTTP and WebSocket service binary
  • crates/aximo-core: scheduler and shared STT domain types
  • crates/aximo-inference: transcribe-rs adapters for local CPU models
  • crates/aximo-audio: audio helpers

Architecture and protocol details live in:

Models

Models are runtime artifacts and must live outside git. The service expects a model root directory configured via config/aximo.example.toml.

Compatible model bundles for the current transcribe-rs integration:

Example layout:

/var/lib/aximo/models/
├── parakeet-tdt-0.6b-v3-int8/
└── giga-am-v3/

Quick Start

Download the default Parakeet model bundle:

just setup-models

or directly:

./scripts/fetch-models.sh

Docker Compose

After the model is downloaded to ./var/models:

docker compose up --build

This uses docker-compose.yml, mounts ./var/models into the container, and serves the API on http://127.0.0.1:8080.

Local Run

For local non-Docker usage, use config/aximo.local.toml, which points to ./var/models:

AXIMO_CONFIG=config/aximo.local.toml cargo run -p aximo

For containerized usage, config/aximo.example.toml remains the default and expects models at /var/lib/aximo/models.

Short Audio Example

Short transcription currently accepts:

  • audio/wav
  • audio/pcm
  • application/octet-stream

audio/pcm and application/octet-stream are interpreted as raw pcm_s16le, 16 kHz, mono audio.

curl -X POST http://127.0.0.1:8080/v1/transcriptions \
  -H 'content-type: audio/wav' \
  --data-binary @sample.wav

Example response:

{
  "text": "hello world",
  "segments": [],
  "detected_language": "en",
  "engine": "fake",
  "duration_ms": 0,
  "processing_ms": 0
}

Realtime Example

Realtime uses WebSocket and raw pcm_s16le, 16 kHz, mono binary chunks.

const ws = new WebSocket("ws://127.0.0.1:8080/v1/realtime");
ws.binaryType = "arraybuffer";

ws.addEventListener("message", (event) => {
  console.log("server:", event.data);
});

ws.addEventListener("open", async () => {
  ws.send(JSON.stringify({ event: "start" }));

  const pcmChunk = new Uint8Array([0, 0, 1, 0, 2, 0, 3, 0]);
  ws.send(pcmChunk);

  ws.send(JSON.stringify({ event: "stop" }));
});

Expected server events:

  • session_started
  • partial
  • final
  • error

API Docs

After the service starts:

Development

Common checks:

just fmt
just lint
just test
just coverage
just setup-models

crates.io

The publishable library crates are:

  • aximo-core
  • aximo-audio
  • aximo-inference

The aximo service crate is intentionally marked publish = false.

Use just package-libs for the local pre-publish check of aximo-core and aximo-audio. aximo-inference must be dry-run published only after aximo-core is already available in the crates.io index.

Release workflow notes are documented in docs/publishing.md.

About

Offline-first speech-to-text API for local and private deployments

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors