Skip to content

pauldaywork/parakeet-realtime-server-linux

Repository files navigation

parakeet-realtime-server-linux (Linux x64 / CUDA)

Linux port of parakeet-realtime-server. Same Rust source, two packaging paths so you can pick whichever fits your host.

Path Best for Setup commands
Tarball Bare-metal servers, NixOS, distros without apt tar -xzf …; ./bin/parakeet-realtime-server
.deb package Ubuntu/Debian with apt sudo apt install ./parakeet-realtime-server_*.deb

Both paths run the same binary against CUDA 12.x + cuDNN 9.x.


Prerequisites (host)

The NVIDIA kernel driver must be installed. On a fresh Ubuntu 26 install:

sudo ubuntu-drivers install
sudo reboot

That's the only system-level requirement.

Verify:

nvidia-smi   # should print a table of GPUs

Install (pre-built release)

Pre-built artifacts are published on the GitHub Releases page with stable filenames, so the URLs below always point at the latest version. Pinning to a specific version: substitute latest with tag/vX.Y.Z in the URL.

One-line install (recommended)

curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh | bash

That one command downloads + sha256-verifies the .deb, runs sudo apt install, and downloads the model weights. Add --start to also enable + start the systemd service:

curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh | bash -s -- --start

Other flags: --no-models (skip the 480 MB download), --version vX.Y.Z (pin), --dry-run (print what would happen). --help for the full list.

The script does not run itself as root — it calls sudo only for the steps that need it. Don't pipe it to sudo bash. If you'd rather inspect before running:

curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh -o install.sh
less install.sh
bash install.sh

Manual .deb install

If you'd rather drive each step yourself:

curl -fL -o /tmp/parakeet.deb \
    https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/parakeet-realtime-server_amd64.deb
sudo apt install -y /tmp/parakeet.deb

# One-time: download the model weights (~480 MB, sha256-verified):
sudo env OUT_DIR=/var/lib/parakeet-realtime-server/models \
    /usr/share/parakeet-realtime-server/download-models.sh
sudo chown -R parakeet:parakeet /var/lib/parakeet-realtime-server/models

sudo systemctl enable --now parakeet-realtime-server
curl http://127.0.0.1:9005/health    # {"ready":true} once warm

Tarball (any glibc-≥2.36 distro)

curl -fL -o /tmp/parakeet.tgz \
    https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/parakeet-realtime-server-linux-x64.tar.gz
mkdir -p ~/parakeet && tar -xzf /tmp/parakeet.tgz -C ~/parakeet
cd ~/parakeet/parakeet-realtime-server-v*-linux-x64
OUT_DIR=./models ./scripts/download-models.sh
./bin/parakeet-realtime-server --model-dir ./models

Verify the download (optional)

Each release also ships a SHA256SUMS file:

curl -fLO https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing
# parakeet-realtime-server-linux-x64.tar.gz: OK
# parakeet-realtime-server_amd64.deb: OK

Build from source

The rest of this section is for anyone producing the artifacts themselves (forking, patching, or running an unreleased commit). End users should use Install above.

Build-host prerequisites

These are only needed on the machine producing the artifacts (tarball / .deb / image), not on hosts that consume them.

# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# uv — used by scripts/fetch-cuda-deps.sh to fetch NVIDIA wheels.
# Self-contained, no system python required.
curl -LsSf https://astral.sh/uv/install.sh | sh

# cargo-deb — only for the .deb path
cargo install cargo-deb

# jq + curl — used by scripts/test.sh
sudo apt install -y jq curl

Path 0: Run from a clone (local dev)

Fastest path to iterate on the source. Produces dist/bin/parakeet-realtime-server and populates dist/lib/ + dist/models/. Each fetch-* script is skip-if-present, so re-running is cheap.

git clone https://github.com/pauldaywork/parakeet-realtime-server-linux
cd parakeet-realtime-server-linux

# Build the binary.
scripts/build.sh

# Stage the runtime libs (onnxruntime + CUDA/cuDNN). One-time per repo.
scripts/fetch-onnxruntime.sh
scripts/fetch-cuda-deps.sh

# Download the model weights (~480 MB, sha256-verified). One-time.
scripts/download-models.sh    # writes dist/models/ by default

# Run it. /health flips to {"ready": true} after ~1 s warm-up.
./dist/bin/parakeet-realtime-server --model-dir ./dist/models

After the first run, the iterate-edit-rerun loop is just scripts/build.sh && ./dist/bin/parakeet-realtime-server --model-dir ./dist/models — the libs and models are already on disk.

The bundled smoke test (scripts/test.sh) and the examples/ demos all assume this layout (binary in dist/bin/, models in dist/models/).


Path 1: Build your own tarball

# Once: clone the repo and produce the asset.
git clone https://github.com/pauldaywork/parakeet-realtime-server-linux
cd parakeet-realtime-server-linux
scripts/release-tarball.sh v0.1.0
# -> ./parakeet-realtime-server-v0.1.0-linux-x64.tar.gz  (~1.4 GB)

# On any Linux x64 host with an NVIDIA driver:
tar -xzf parakeet-realtime-server-v0.1.0-linux-x64.tar.gz
cd parakeet-realtime-server-v0.1.0-linux-x64
OUT_DIR=./models ./scripts/download-models.sh   # fetches ~480 MB, sha256-verified
./bin/parakeet-realtime-server --model-dir ./models --port 9005

The binary has RPATH=$ORIGIN/../lib baked in, so it finds the bundled libonnxruntime.so + cuDNN/cuBLAS automatically. No LD_LIBRARY_PATH needed — the binary also pre-dlopens its bundled libs by SONAME at startup, so it's robust against a host's LD_LIBRARY_PATH pointing at a different CUDA toolkit (see How CUDA libs get pinned).


Path 2: Build your own .deb

# Once: build the .deb on a build host.
cargo install cargo-deb
scripts/build-deb.sh
# -> ./target/debian/parakeet-realtime-server_0.1.0-1_amd64.deb

# On any Ubuntu/Debian host:
sudo apt install ./parakeet-realtime-server_0.1.0_amd64.deb

# Download models (one-time, ~480 MB, sha256-verified):
sudo OUT_DIR=/var/lib/parakeet-realtime-server/models \
    /usr/share/parakeet-realtime-server/download-models.sh
sudo chown -R parakeet:parakeet /var/lib/parakeet-realtime-server/models

# Start the service:
sudo systemctl enable --now parakeet-realtime-server
sudo systemctl status parakeet-realtime-server

What the package installs:

  • /usr/bin/parakeet-realtime-server — the binary
  • /usr/lib/parakeet-realtime-server/*.so* — bundled onnxruntime + cuDNN
  • /lib/systemd/system/parakeet-realtime-server.service — systemd unit
  • /etc/default/parakeet-realtime-server — env-var overrides
  • /var/lib/parakeet-realtime-server/models/ — empty, populate per above

System user parakeet is created in the video and render groups so the service can talk to /dev/nvidia*.

To change host/port/device, edit /etc/default/parakeet-realtime-server and:

sudo systemctl restart parakeet-realtime-server

Health check (all paths)

curl http://127.0.0.1:9005/health
# {"ready": false}  during ~5–10s warm-up
# {"ready": true}   once the model has finished loading

CLI

parakeet-realtime-server --model-dir <DIR> [--host 127.0.0.1] [--port 9005]
                         [--device cuda|cpu] [--cuda-device-id N]
                         [--chunk-ms 160]
parakeet-realtime-server --version
Flag Default Notes
--model-dir required dir holding encoder.onnx, decoder_joint.onnx, tokenizer.json
--host 127.0.0.1 bind address. Use 0.0.0.0 only behind a trusted network or reverse proxy — see CORS below.
--port 9005 TCP port
--device cuda cuda or cpu. CUDA path runs a driver pre-flight at startup.
--cuda-device-id unset pin to a specific GPU on multi-GPU hosts. Sets CUDA_VISIBLE_DEVICES. Ignored if user already set that env var.
--chunk-ms 160 streaming chunk size in ms

Logging

Log level is controlled by RUST_LOG. ORT's default INFO logging is verbose; suppress it with:

RUST_LOG=ort=warn,parakeet_realtime_server=info \
    parakeet-realtime-server --model-dir ./models

For the systemd service, set RUST_LOG in /etc/default/parakeet-realtime-server.

API

  • GET /health — readiness check. {"ready": false} while warming up, {"ready": true} once the model is loaded.
  • POST /v1/audio/transcriptions — multipart WAV upload (file=@audio.wav).
    • Success: 200 + {"text": "..."}.
    • Missing file field: 400 + {"error": "..."}.
    • Server warming up: 503 + {"error": "..."}.
    • Inference failure: 500 + {"error": "..."}.
    • Concurrent requests are serialised by an internal semaphore (default 1) to bound GPU memory; extra requests queue rather than OOM the device.
  • WS /stream — binary frames of int16 LE PCM at 16 kHz; server emits {"type": "partial"|"final", "text": "..."} JSON frames; client sends {"type":"end"} to flush. Each WebSocket connection loads its own model.

CORS

The server returns permissive CORS headers (Access-Control-Allow-Origin: *) on all routes so the bundled browser demos work from file:// or a Vite dev server without proxying. For production deployments exposed beyond 127.0.0.1, put a reverse proxy in front and tighten the policy there.

See examples/mic-streaming.html and examples/web-app/ for working clients — covered in the Examples section below.


Examples

Both demos hit the server at http://127.0.0.1:9005 by default. Edit the URLs in the source if your server runs elsewhere.

The server commands below assume you've completed Path 0: Run from a clone — i.e. the binary is at dist/bin/parakeet-realtime-server and models are at dist/models/. If you installed the .deb, the binary is at /usr/bin/parakeet-realtime-server and models live under /var/lib/parakeet-realtime-server/models; substitute accordingly.

examples/mic-streaming.html

Single self-contained HTML file. Records the mic via Web Audio + AudioWorklet, opens WS /stream, and renders partial/final transcripts. No build step — but the file must be served over HTTP, not opened directly. Browsers reject AudioWorklet.addModule() calls from file:// documents.

# Terminal 1: parakeet server
./dist/bin/parakeet-realtime-server --model-dir ./dist/models

# Terminal 2: tiny static server for the HTML, bound to localhost only
python3 -m http.server -d examples 5000 --bind 127.0.0.1
# then open http://127.0.0.1:5000/mic-streaming.html in any modern browser

examples/web-app/ (Vite + React + TypeScript)

Slightly fancier demo with mic-device selection, an event log panel, and a health badge.

cd examples/web-app
npm install
npm run dev   # serves on http://localhost:5173

vite.config.ts ships a built-in proxy for /health, /v1/*, and /stream to 127.0.0.1:9005, so you can flip src/config.ts to use same-origin paths if you'd rather not rely on CORS. Edit upstream in vite.config.ts to point at a different host/port.


Build matrix

Script Output Use case
scripts/build.sh dist/bin/parakeet-realtime-server Local dev / iteration
scripts/release-tarball.sh vX.Y.Z parakeet-realtime-server-vX.Y.Z-linux-x64.tar.gz Distro-agnostic release asset
scripts/build-deb.sh target/debian/parakeet-realtime-server_X.Y.Z-1_amd64.deb Ubuntu/Debian packages
scripts/release.sh vX.Y.Z GitHub Release with both artifacts + SHA256SUMS One-shot publish to the Releases page

scripts/release.sh is the maintainer's "ship it" command — it builds the tarball and .deb, stages stable-named copies (so the URLs in Install keep working across releases), generates SHA256SUMS, and uploads everything to a new GitHub Release at the given tag. Pre-flight checks: gh authenticated, working tree clean, tag exists, Cargo.toml version matches, no existing release.

The fetch scripts (fetch-onnxruntime.sh, fetch-cuda-deps.sh) skip work if the SOs are already in dist/lib/, so iterating on the build doesn't re-download every time.


Tests

# After build + fetch + download-models, run all scenarios:
scripts/test.sh                                  # all scenarios, --device cuda
DEVICE=cpu scripts/test.sh                       # all scenarios, CPU
ONLY=baseline scripts/test.sh                    # just one
ONLY=multi-request scripts/test.sh
ONLY=error-response scripts/test.sh
ONLY=hostile-LD_LIBRARY_PATH scripts/test.sh

Each scenario spawns the binary against the bundled tests/fixtures/jfk.wav (~11 s, public-domain JFK inaugural excerpt). Non-zero exit on any assertion failure or server timeout.

Scenario What it verifies
baseline Single transcription, model loads, /health flips to ready, transcript contains expected phrases.
multi-request Three back-to-back transcribes against the same warmed server. Catches state-leak across requests (each request must see a fresh model state regardless of prior calls).
error-response POST /v1/audio/transcriptions without a file field returns 400 + {"error": ...} (not 200 + {"text": ""}).
hostile-LD_LIBRARY_PATH Server starts with LD_LIBRARY_PATH pointing at zero-byte stub .so files for libonnxruntime.so, libcudart.so.12, etc. Confirms the in-process SONAME preload (src/main.rs::preload) wins over LD_LIBRARY_PATH — without it the loader picks the stubs and the server hangs on warm-up.

MODEL_DIR, PORT, DEVICE, and ONLY env vars override defaults.

How CUDA libs get pinned

A user's host often has its own CUDA toolkit, a pip-installed onnxruntime-gpu, or LD_LIBRARY_PATH=/usr/local/cuda/lib64 set globally. By default, the Linux dynamic loader's dlopen search order puts LD_LIBRARY_PATH above a binary's RUNPATH — so a system CUDA can shadow the libs we ship. To stay deterministic about ABI:

  1. Bundled SOs ship as real files in the lib dir. dist/lib/ (tarball) or /usr/lib/parakeet-realtime-server/ (.deb).
  2. Binary RUNPATH = $ORIGIN/../lib:/usr/lib/parakeet-realtime-server so the binary finds libonnxruntime.so at process start.
  3. libonnxruntime_providers_cuda.so is patchelf'd to RUNPATH=$ORIGIN so its DT_NEEDED CUDA libs resolve from the same dir when ORT dlopens it. (Microsoft ships it bare.)
  4. In-process preload at main() start (src/main.rs::preload):
    • Sets ORT_DYLIB_PATH to the absolute path of our bundled libonnxruntime.so.1.25.1, so the ort crate's load-dynamic skips the loader's search.
    • dlopens every bundled lib{cudart,cublas,cudnn,nvrtc,nvrtc-builtins,cufft,curand,onnxruntime}* by absolute path with RTLD_LAZY|RTLD_GLOBAL. Once a SONAME is registered in the process, glibc's loader reuses the existing object for any subsequent DT_NEEDED resolution — LD_LIBRARY_PATH and ld.so.cache no longer apply.
  5. Driver pre-flight when --device cuda: dlopen("libcuda.so.1") and call cuDriverGetVersion. If the kernel driver is missing or supports < CUDA 12.0, refuse to start with an actionable message rather than a deep ORT error stack.
  6. Audit log: after the preload, walk /proc/self/maps and WARN if any of our watched libs resolved from outside the bundle dir. This surfaces the residual cases SONAME pinning genuinely can't defeat — LD_PRELOAD, /etc/ld.so.preload, LD_AUDIT — instead of letting them silently corrupt inference.

Verified by the hostile-LD_LIBRARY_PATH test scenario above.


Troubleshooting

Symptom Fix
error while loading shared libraries: libonnxruntime.so.X Bundle didn't ship; run scripts/fetch-onnxruntime.sh and check readelf -d ./bin/parakeet-realtime-server | grep RUNPATH.
Error: NVIDIA driver supports CUDA up to ... (process exits) Driver too old. sudo ubuntu-drivers install to get R525+, then reboot. Or run with --device cpu.
Error: libcuda.so.1 is not loadable (process exits) Driver missing entirely. nvidia-smi should work; if not, install the driver as above.
/health stuck at {"ready": false} Model files missing or --model-dir wrong. Check the directory has encoder.onnx, decoder_joint.onnx, tokenizer.json. Run scripts/download-models.sh if not — it sha256-verifies each file.
WARN: libcudart.so.12 resolved outside the bundled lib dir Something on the host (LD_PRELOAD, /etc/ld.so.preload, LD_AUDIT) is overriding our libs. The transcription may use a different CUDA ABI than expected. Unset the override or document the expected resolution.
Service starts but no GPU Check the parakeet user is in video + render groups: id parakeet.
ORT logging is noisy Set RUST_LOG=ort=warn,parakeet_realtime_server=info (in the env file for the systemd service).
Concurrent transcribes appear to queue Intentional — the semaphore limits in-flight requests to one to bound GPU memory. Bump transcribe_permit size in src/main.rs if you have GPU memory to burn.
sha256 mismatch from download-models.sh Truncated download or upstream republish. Re-run; if it persists update the expected hash in the script after verifying the new file.

License

MIT. See LICENSE.

About

Streaming HTTP + WebSocket ASR server (NVIDIA Parakeet Realtime EOU 120M) — Linux x64 + CUDA

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors