parakeet-realtime-server-linux (Linux x64 / CUDA)

Linux port of parakeet-realtime-server. Same Rust source, two packaging paths so you can pick whichever fits your host.

Path	Best for	Setup commands
Tarball	Bare-metal servers, NixOS, distros without apt	`tar -xzf …; ./bin/parakeet-realtime-server`
`.deb` package	Ubuntu/Debian with apt	`sudo apt install ./parakeet-realtime-server_*.deb`

Both paths run the same binary against CUDA 12.x + cuDNN 9.x.

Prerequisites (host)

The NVIDIA kernel driver must be installed. On a fresh Ubuntu 26 install:

sudo ubuntu-drivers install
sudo reboot

That's the only system-level requirement.

Verify:

nvidia-smi   # should print a table of GPUs

Install (pre-built release)

Pre-built artifacts are published on the GitHub Releases page with stable filenames, so the URLs below always point at the latest version. Pinning to a specific version: substitute latest with tag/vX.Y.Z in the URL.

One-line install (recommended)

curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh | bash

That one command downloads + sha256-verifies the .deb, runs sudo apt install, and downloads the model weights. Add --start to also enable + start the systemd service:

curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh | bash -s -- --start

Other flags: --no-models (skip the 480 MB download), --version vX.Y.Z (pin), --dry-run (print what would happen). --help for the full list.

The script does not run itself as root — it calls sudo only for the steps that need it. Don't pipe it to sudo bash. If you'd rather inspect before running:

curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh -o install.sh
less install.sh
bash install.sh

Manual `.deb` install

If you'd rather drive each step yourself:

curl -fL -o /tmp/parakeet.deb \
    https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/parakeet-realtime-server_amd64.deb
sudo apt install -y /tmp/parakeet.deb

# One-time: download the model weights (~480 MB, sha256-verified):
sudo env OUT_DIR=/var/lib/parakeet-realtime-server/models \
    /usr/share/parakeet-realtime-server/download-models.sh
sudo chown -R parakeet:parakeet /var/lib/parakeet-realtime-server/models

sudo systemctl enable --now parakeet-realtime-server
curl http://127.0.0.1:9005/health    # {"ready":true} once warm

Tarball (any glibc-≥2.36 distro)

curl -fL -o /tmp/parakeet.tgz \
    https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/parakeet-realtime-server-linux-x64.tar.gz
mkdir -p ~/parakeet && tar -xzf /tmp/parakeet.tgz -C ~/parakeet
cd ~/parakeet/parakeet-realtime-server-v*-linux-x64
OUT_DIR=./models ./scripts/download-models.sh
./bin/parakeet-realtime-server --model-dir ./models

Verify the download (optional)

Each release also ships a SHA256SUMS file:

curl -fLO https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing
# parakeet-realtime-server-linux-x64.tar.gz: OK
# parakeet-realtime-server_amd64.deb: OK

Build from source

The rest of this section is for anyone producing the artifacts themselves (forking, patching, or running an unreleased commit). End users should use Install above.

Build-host prerequisites

These are only needed on the machine producing the artifacts (tarball / .deb / image), not on hosts that consume them.

# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh

# uv — used by scripts/fetch-cuda-deps.sh to fetch NVIDIA wheels.
# Self-contained, no system python required.
curl -LsSf https://astral.sh/uv/install.sh | sh

# cargo-deb — only for the .deb path
cargo install cargo-deb

# jq + curl — used by scripts/test.sh
sudo apt install -y jq curl

Path 0: Run from a clone (local dev)

Fastest path to iterate on the source. Produces dist/bin/parakeet-realtime-server and populates dist/lib/ + dist/models/. Each fetch-* script is skip-if-present, so re-running is cheap.

git clone https://github.com/pauldaywork/parakeet-realtime-server-linux
cd parakeet-realtime-server-linux

# Build the binary.
scripts/build.sh

# Stage the runtime libs (onnxruntime + CUDA/cuDNN). One-time per repo.
scripts/fetch-onnxruntime.sh
scripts/fetch-cuda-deps.sh

# Download the model weights (~480 MB, sha256-verified). One-time.
scripts/download-models.sh    # writes dist/models/ by default

# Run it. /health flips to {"ready": true} after ~1 s warm-up.
./dist/bin/parakeet-realtime-server --model-dir ./dist/models

After the first run, the iterate-edit-rerun loop is just scripts/build.sh && ./dist/bin/parakeet-realtime-server --model-dir ./dist/models — the libs and models are already on disk.

The bundled smoke test (scripts/test.sh) and the examples/ demos all assume this layout (binary in dist/bin/, models in dist/models/).

Path 1: Build your own tarball

# Once: clone the repo and produce the asset.
git clone https://github.com/pauldaywork/parakeet-realtime-server-linux
cd parakeet-realtime-server-linux
scripts/release-tarball.sh v0.1.0
# -> ./parakeet-realtime-server-v0.1.0-linux-x64.tar.gz  (~1.4 GB)

# On any Linux x64 host with an NVIDIA driver:
tar -xzf parakeet-realtime-server-v0.1.0-linux-x64.tar.gz
cd parakeet-realtime-server-v0.1.0-linux-x64
OUT_DIR=./models ./scripts/download-models.sh   # fetches ~480 MB, sha256-verified
./bin/parakeet-realtime-server --model-dir ./models --port 9005

The binary has RPATH=$ORIGIN/../lib baked in, so it finds the bundled libonnxruntime.so + cuDNN/cuBLAS automatically. No LD_LIBRARY_PATH needed — the binary also pre-dlopens its bundled libs by SONAME at startup, so it's robust against a host's LD_LIBRARY_PATH pointing at a different CUDA toolkit (see How CUDA libs get pinned).

Path 2: Build your own `.deb`

# Once: build the .deb on a build host.
cargo install cargo-deb
scripts/build-deb.sh
# -> ./target/debian/parakeet-realtime-server_0.1.0-1_amd64.deb

# On any Ubuntu/Debian host:
sudo apt install ./parakeet-realtime-server_0.1.0_amd64.deb

# Download models (one-time, ~480 MB, sha256-verified):
sudo OUT_DIR=/var/lib/parakeet-realtime-server/models \
    /usr/share/parakeet-realtime-server/download-models.sh
sudo chown -R parakeet:parakeet /var/lib/parakeet-realtime-server/models

# Start the service:
sudo systemctl enable --now parakeet-realtime-server
sudo systemctl status parakeet-realtime-server

What the package installs:

/usr/bin/parakeet-realtime-server — the binary
/usr/lib/parakeet-realtime-server/*.so* — bundled onnxruntime + cuDNN
/lib/systemd/system/parakeet-realtime-server.service — systemd unit
/etc/default/parakeet-realtime-server — env-var overrides
/var/lib/parakeet-realtime-server/models/ — empty, populate per above

System user parakeet is created in the video and render groups so the service can talk to /dev/nvidia*.

To change host/port/device, edit /etc/default/parakeet-realtime-server and:

sudo systemctl restart parakeet-realtime-server

Health check (all paths)

curl http://127.0.0.1:9005/health
# {"ready": false}  during ~5–10s warm-up
# {"ready": true}   once the model has finished loading

CLI

parakeet-realtime-server --model-dir <DIR> [--host 127.0.0.1] [--port 9005]
                         [--device cuda|cpu] [--cuda-device-id N]
                         [--chunk-ms 160]
parakeet-realtime-server --version

Flag	Default	Notes
`--model-dir`	required	dir holding `encoder.onnx`, `decoder_joint.onnx`, `tokenizer.json`
`--host`	`127.0.0.1`	bind address. Use `0.0.0.0` only behind a trusted network or reverse proxy — see CORS below.
`--port`	`9005`	TCP port
`--device`	`cuda`	`cuda` or `cpu`. CUDA path runs a driver pre-flight at startup.
`--cuda-device-id`	unset	pin to a specific GPU on multi-GPU hosts. Sets `CUDA_VISIBLE_DEVICES`. Ignored if user already set that env var.
`--chunk-ms`	`160`	streaming chunk size in ms

Logging

Log level is controlled by RUST_LOG. ORT's default INFO logging is verbose; suppress it with:

RUST_LOG=ort=warn,parakeet_realtime_server=info \
    parakeet-realtime-server --model-dir ./models

For the systemd service, set RUST_LOG in /etc/default/parakeet-realtime-server.

API

GET /health — readiness check. {"ready": false} while warming up, {"ready": true} once the model is loaded.
POST /v1/audio/transcriptions — multipart WAV upload (file=@audio.wav).
- Success: 200 + {"text": "..."}.
- Missing file field: 400 + {"error": "..."}.
- Server warming up: 503 + {"error": "..."}.
- Inference failure: 500 + {"error": "..."}.
- Concurrent requests are serialised by an internal semaphore (default 1) to bound GPU memory; extra requests queue rather than OOM the device.
WS /stream — binary frames of int16 LE PCM at 16 kHz; server emits {"type": "partial"|"final", "text": "..."} JSON frames; client sends {"type":"end"} to flush. Each WebSocket connection loads its own model.

CORS

The server returns permissive CORS headers (Access-Control-Allow-Origin: *) on all routes so the bundled browser demos work from file:// or a Vite dev server without proxying. For production deployments exposed beyond 127.0.0.1, put a reverse proxy in front and tighten the policy there.

See examples/mic-streaming.html and examples/web-app/ for working clients — covered in the Examples section below.

Examples

Both demos hit the server at http://127.0.0.1:9005 by default. Edit the URLs in the source if your server runs elsewhere.

The server commands below assume you've completed Path 0: Run from a clone — i.e. the binary is at dist/bin/parakeet-realtime-server and models are at dist/models/. If you installed the .deb, the binary is at /usr/bin/parakeet-realtime-server and models live under /var/lib/parakeet-realtime-server/models; substitute accordingly.

`examples/mic-streaming.html`

Single self-contained HTML file. Records the mic via Web Audio + AudioWorklet, opens WS /stream, and renders partial/final transcripts. No build step — but the file must be served over HTTP, not opened directly. Browsers reject AudioWorklet.addModule() calls from file:// documents.

# Terminal 1: parakeet server
./dist/bin/parakeet-realtime-server --model-dir ./dist/models

# Terminal 2: tiny static server for the HTML, bound to localhost only
python3 -m http.server -d examples 5000 --bind 127.0.0.1
# then open http://127.0.0.1:5000/mic-streaming.html in any modern browser

`examples/web-app/` (Vite + React + TypeScript)

Slightly fancier demo with mic-device selection, an event log panel, and a health badge.

cd examples/web-app
npm install
npm run dev   # serves on http://localhost:5173

vite.config.ts ships a built-in proxy for /health, /v1/*, and /stream to 127.0.0.1:9005, so you can flip src/config.ts to use same-origin paths if you'd rather not rely on CORS. Edit upstream in vite.config.ts to point at a different host/port.

Build matrix

Script	Output	Use case
`scripts/build.sh`	`dist/bin/parakeet-realtime-server`	Local dev / iteration
`scripts/release-tarball.sh vX.Y.Z`	`parakeet-realtime-server-vX.Y.Z-linux-x64.tar.gz`	Distro-agnostic release asset
`scripts/build-deb.sh`	`target/debian/parakeet-realtime-server_X.Y.Z-1_amd64.deb`	Ubuntu/Debian packages
`scripts/release.sh vX.Y.Z`	GitHub Release with both artifacts + `SHA256SUMS`	One-shot publish to the Releases page

scripts/release.sh is the maintainer's "ship it" command — it builds the tarball and .deb, stages stable-named copies (so the URLs in Install keep working across releases), generates SHA256SUMS, and uploads everything to a new GitHub Release at the given tag. Pre-flight checks: gh authenticated, working tree clean, tag exists, Cargo.toml version matches, no existing release.

The fetch scripts (fetch-onnxruntime.sh, fetch-cuda-deps.sh) skip work if the SOs are already in dist/lib/, so iterating on the build doesn't re-download every time.

Tests

# After build + fetch + download-models, run all scenarios:
scripts/test.sh                                  # all scenarios, --device cuda
DEVICE=cpu scripts/test.sh                       # all scenarios, CPU
ONLY=baseline scripts/test.sh                    # just one
ONLY=multi-request scripts/test.sh
ONLY=error-response scripts/test.sh
ONLY=hostile-LD_LIBRARY_PATH scripts/test.sh

Each scenario spawns the binary against the bundled tests/fixtures/jfk.wav (~11 s, public-domain JFK inaugural excerpt). Non-zero exit on any assertion failure or server timeout.

Scenario	What it verifies
`baseline`	Single transcription, model loads, `/health` flips to ready, transcript contains expected phrases.
`multi-request`	Three back-to-back transcribes against the same warmed server. Catches state-leak across requests (each request must see a fresh model state regardless of prior calls).
`error-response`	`POST /v1/audio/transcriptions` without a `file` field returns `400 + {"error": ...}` (not `200 + {"text": ""}`).
`hostile-LD_LIBRARY_PATH`	Server starts with `LD_LIBRARY_PATH` pointing at zero-byte stub `.so` files for `libonnxruntime.so`, `libcudart.so.12`, etc. Confirms the in-process SONAME preload (`src/main.rs::preload`) wins over `LD_LIBRARY_PATH` — without it the loader picks the stubs and the server hangs on warm-up.

MODEL_DIR, PORT, DEVICE, and ONLY env vars override defaults.

How CUDA libs get pinned

A user's host often has its own CUDA toolkit, a pip-installed onnxruntime-gpu, or LD_LIBRARY_PATH=/usr/local/cuda/lib64 set globally. By default, the Linux dynamic loader's dlopen search order puts LD_LIBRARY_PATH above a binary's RUNPATH — so a system CUDA can shadow the libs we ship. To stay deterministic about ABI:

Bundled SOs ship as real files in the lib dir. dist/lib/ (tarball) or /usr/lib/parakeet-realtime-server/ (.deb).
Binary RUNPATH = $ORIGIN/../lib:/usr/lib/parakeet-realtime-server so the binary finds libonnxruntime.so at process start.
libonnxruntime_providers_cuda.so is patchelf'd to RUNPATH=$ORIGIN so its DT_NEEDED CUDA libs resolve from the same dir when ORT dlopens it. (Microsoft ships it bare.)
In-process preload at main() start (src/main.rs::preload):
- Sets ORT_DYLIB_PATH to the absolute path of our bundled libonnxruntime.so.1.25.1, so the ort crate's load-dynamic skips the loader's search.
- dlopens every bundled lib{cudart,cublas,cudnn,nvrtc,nvrtc-builtins,cufft,curand,onnxruntime}* by absolute path with RTLD_LAZY|RTLD_GLOBAL. Once a SONAME is registered in the process, glibc's loader reuses the existing object for any subsequent DT_NEEDED resolution — LD_LIBRARY_PATH and ld.so.cache no longer apply.
Driver pre-flight when --device cuda: dlopen("libcuda.so.1") and call cuDriverGetVersion. If the kernel driver is missing or supports < CUDA 12.0, refuse to start with an actionable message rather than a deep ORT error stack.
Audit log: after the preload, walk /proc/self/maps and WARN if any of our watched libs resolved from outside the bundle dir. This surfaces the residual cases SONAME pinning genuinely can't defeat — LD_PRELOAD, /etc/ld.so.preload, LD_AUDIT — instead of letting them silently corrupt inference.

Verified by the hostile-LD_LIBRARY_PATH test scenario above.

Troubleshooting

Symptom	Fix
`error while loading shared libraries: libonnxruntime.so.X`	Bundle didn't ship; run `scripts/fetch-onnxruntime.sh` and check `readelf -d ./bin/parakeet-realtime-server \| grep RUNPATH`.
`Error: NVIDIA driver supports CUDA up to ...` (process exits)	Driver too old. `sudo ubuntu-drivers install` to get R525+, then reboot. Or run with `--device cpu`.
`Error: libcuda.so.1 is not loadable` (process exits)	Driver missing entirely. `nvidia-smi` should work; if not, install the driver as above.
`/health` stuck at `{"ready": false}`	Model files missing or `--model-dir` wrong. Check the directory has `encoder.onnx`, `decoder_joint.onnx`, `tokenizer.json`. Run `scripts/download-models.sh` if not — it sha256-verifies each file.
`WARN: libcudart.so.12 resolved outside the bundled lib dir`	Something on the host (`LD_PRELOAD`, `/etc/ld.so.preload`, `LD_AUDIT`) is overriding our libs. The transcription may use a different CUDA ABI than expected. Unset the override or document the expected resolution.
Service starts but no GPU	Check the `parakeet` user is in `video` + `render` groups: `id parakeet`.
ORT logging is noisy	Set `RUST_LOG=ort=warn,parakeet_realtime_server=info` (in the env file for the systemd service).
Concurrent transcribes appear to queue	Intentional — the semaphore limits in-flight requests to one to bound GPU memory. Bump `transcribe_permit` size in `src/main.rs` if you have GPU memory to burn.
`sha256 mismatch` from `download-models.sh`	Truncated download or upstream republish. Re-run; if it persists update the expected hash in the script after verifying the new file.

License

MIT. See LICENSE.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
debian		debian
examples		examples
scripts		scripts
src		src
tests/fixtures		tests/fixtures
.dockerignore		.dockerignore
.gitignore		.gitignore
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
LICENSE		LICENSE
README.md		README.md
THIRD_PARTY_NOTICES.md		THIRD_PARTY_NOTICES.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

parakeet-realtime-server-linux (Linux x64 / CUDA)

Prerequisites (host)

Install (pre-built release)

One-line install (recommended)

Manual `.deb` install

Tarball (any glibc-≥2.36 distro)

Verify the download (optional)

Build from source

Build-host prerequisites

Path 0: Run from a clone (local dev)

Path 1: Build your own tarball

Path 2: Build your own `.deb`

Health check (all paths)

CLI

Logging

API

CORS

Examples

`examples/mic-streaming.html`

`examples/web-app/` (Vite + React + TypeScript)

Build matrix

Tests

How CUDA libs get pinned

Troubleshooting

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

parakeet-realtime-server-linux (Linux x64 / CUDA)

Prerequisites (host)

Install (pre-built release)

One-line install (recommended)

Manual .deb install

Tarball (any glibc-≥2.36 distro)

Verify the download (optional)

Build from source

Build-host prerequisites

Path 0: Run from a clone (local dev)

Path 1: Build your own tarball

Path 2: Build your own .deb

Health check (all paths)

CLI

Logging

API

CORS

Examples

examples/mic-streaming.html

examples/web-app/ (Vite + React + TypeScript)

Build matrix

Tests

How CUDA libs get pinned

Troubleshooting

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Manual `.deb` install

Path 2: Build your own `.deb`

`examples/mic-streaming.html`

`examples/web-app/` (Vite + React + TypeScript)

Packages