Linux port of parakeet-realtime-server. Same Rust source, two packaging paths so you can pick whichever fits your host.
| Path | Best for | Setup commands |
|---|---|---|
| Tarball | Bare-metal servers, NixOS, distros without apt | tar -xzf …; ./bin/parakeet-realtime-server |
.deb package |
Ubuntu/Debian with apt | sudo apt install ./parakeet-realtime-server_*.deb |
Both paths run the same binary against CUDA 12.x + cuDNN 9.x.
The NVIDIA kernel driver must be installed. On a fresh Ubuntu 26 install:
sudo ubuntu-drivers install
sudo rebootThat's the only system-level requirement.
Verify:
nvidia-smi # should print a table of GPUsPre-built artifacts are published on the GitHub Releases page with stable filenames, so the URLs below always point at the latest version. Pinning to a specific version: substitute latest with tag/vX.Y.Z in the URL.
curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh | bashThat one command downloads + sha256-verifies the .deb, runs sudo apt install, and downloads the model weights. Add --start to also enable + start the systemd service:
curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh | bash -s -- --startOther flags: --no-models (skip the 480 MB download), --version vX.Y.Z (pin), --dry-run (print what would happen). --help for the full list.
The script does not run itself as root — it calls sudo only for the steps that need it. Don't pipe it to sudo bash. If you'd rather inspect before running:
curl -fL https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/install.sh -o install.sh
less install.sh
bash install.shIf you'd rather drive each step yourself:
curl -fL -o /tmp/parakeet.deb \
https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/parakeet-realtime-server_amd64.deb
sudo apt install -y /tmp/parakeet.deb
# One-time: download the model weights (~480 MB, sha256-verified):
sudo env OUT_DIR=/var/lib/parakeet-realtime-server/models \
/usr/share/parakeet-realtime-server/download-models.sh
sudo chown -R parakeet:parakeet /var/lib/parakeet-realtime-server/models
sudo systemctl enable --now parakeet-realtime-server
curl http://127.0.0.1:9005/health # {"ready":true} once warmcurl -fL -o /tmp/parakeet.tgz \
https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/parakeet-realtime-server-linux-x64.tar.gz
mkdir -p ~/parakeet && tar -xzf /tmp/parakeet.tgz -C ~/parakeet
cd ~/parakeet/parakeet-realtime-server-v*-linux-x64
OUT_DIR=./models ./scripts/download-models.sh
./bin/parakeet-realtime-server --model-dir ./modelsEach release also ships a SHA256SUMS file:
curl -fLO https://github.com/pauldaywork/parakeet-realtime-server-linux/releases/latest/download/SHA256SUMS
sha256sum -c SHA256SUMS --ignore-missing
# parakeet-realtime-server-linux-x64.tar.gz: OK
# parakeet-realtime-server_amd64.deb: OKThe rest of this section is for anyone producing the artifacts themselves (forking, patching, or running an unreleased commit). End users should use Install above.
These are only needed on the machine producing the artifacts (tarball / .deb / image), not on hosts that consume them.
# Rust toolchain
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
# uv — used by scripts/fetch-cuda-deps.sh to fetch NVIDIA wheels.
# Self-contained, no system python required.
curl -LsSf https://astral.sh/uv/install.sh | sh
# cargo-deb — only for the .deb path
cargo install cargo-deb
# jq + curl — used by scripts/test.sh
sudo apt install -y jq curlFastest path to iterate on the source. Produces dist/bin/parakeet-realtime-server and populates dist/lib/ + dist/models/. Each fetch-* script is skip-if-present, so re-running is cheap.
git clone https://github.com/pauldaywork/parakeet-realtime-server-linux
cd parakeet-realtime-server-linux
# Build the binary.
scripts/build.sh
# Stage the runtime libs (onnxruntime + CUDA/cuDNN). One-time per repo.
scripts/fetch-onnxruntime.sh
scripts/fetch-cuda-deps.sh
# Download the model weights (~480 MB, sha256-verified). One-time.
scripts/download-models.sh # writes dist/models/ by default
# Run it. /health flips to {"ready": true} after ~1 s warm-up.
./dist/bin/parakeet-realtime-server --model-dir ./dist/modelsAfter the first run, the iterate-edit-rerun loop is just scripts/build.sh && ./dist/bin/parakeet-realtime-server --model-dir ./dist/models — the libs and models are already on disk.
The bundled smoke test (scripts/test.sh) and the examples/ demos all assume this layout (binary in dist/bin/, models in dist/models/).
# Once: clone the repo and produce the asset.
git clone https://github.com/pauldaywork/parakeet-realtime-server-linux
cd parakeet-realtime-server-linux
scripts/release-tarball.sh v0.1.0
# -> ./parakeet-realtime-server-v0.1.0-linux-x64.tar.gz (~1.4 GB)
# On any Linux x64 host with an NVIDIA driver:
tar -xzf parakeet-realtime-server-v0.1.0-linux-x64.tar.gz
cd parakeet-realtime-server-v0.1.0-linux-x64
OUT_DIR=./models ./scripts/download-models.sh # fetches ~480 MB, sha256-verified
./bin/parakeet-realtime-server --model-dir ./models --port 9005The binary has RPATH=$ORIGIN/../lib baked in, so it finds the bundled libonnxruntime.so + cuDNN/cuBLAS automatically. No LD_LIBRARY_PATH needed — the binary also pre-dlopens its bundled libs by SONAME at startup, so it's robust against a host's LD_LIBRARY_PATH pointing at a different CUDA toolkit (see How CUDA libs get pinned).
# Once: build the .deb on a build host.
cargo install cargo-deb
scripts/build-deb.sh
# -> ./target/debian/parakeet-realtime-server_0.1.0-1_amd64.deb
# On any Ubuntu/Debian host:
sudo apt install ./parakeet-realtime-server_0.1.0_amd64.deb
# Download models (one-time, ~480 MB, sha256-verified):
sudo OUT_DIR=/var/lib/parakeet-realtime-server/models \
/usr/share/parakeet-realtime-server/download-models.sh
sudo chown -R parakeet:parakeet /var/lib/parakeet-realtime-server/models
# Start the service:
sudo systemctl enable --now parakeet-realtime-server
sudo systemctl status parakeet-realtime-serverWhat the package installs:
/usr/bin/parakeet-realtime-server— the binary/usr/lib/parakeet-realtime-server/*.so*— bundled onnxruntime + cuDNN/lib/systemd/system/parakeet-realtime-server.service— systemd unit/etc/default/parakeet-realtime-server— env-var overrides/var/lib/parakeet-realtime-server/models/— empty, populate per above
System user parakeet is created in the video and render groups so the service can talk to /dev/nvidia*.
To change host/port/device, edit /etc/default/parakeet-realtime-server and:
sudo systemctl restart parakeet-realtime-servercurl http://127.0.0.1:9005/health
# {"ready": false} during ~5–10s warm-up
# {"ready": true} once the model has finished loadingparakeet-realtime-server --model-dir <DIR> [--host 127.0.0.1] [--port 9005]
[--device cuda|cpu] [--cuda-device-id N]
[--chunk-ms 160]
parakeet-realtime-server --version
| Flag | Default | Notes |
|---|---|---|
--model-dir |
required | dir holding encoder.onnx, decoder_joint.onnx, tokenizer.json |
--host |
127.0.0.1 |
bind address. Use 0.0.0.0 only behind a trusted network or reverse proxy — see CORS below. |
--port |
9005 |
TCP port |
--device |
cuda |
cuda or cpu. CUDA path runs a driver pre-flight at startup. |
--cuda-device-id |
unset | pin to a specific GPU on multi-GPU hosts. Sets CUDA_VISIBLE_DEVICES. Ignored if user already set that env var. |
--chunk-ms |
160 |
streaming chunk size in ms |
Log level is controlled by RUST_LOG. ORT's default INFO logging is verbose; suppress it with:
RUST_LOG=ort=warn,parakeet_realtime_server=info \
parakeet-realtime-server --model-dir ./modelsFor the systemd service, set RUST_LOG in /etc/default/parakeet-realtime-server.
GET /health— readiness check.{"ready": false}while warming up,{"ready": true}once the model is loaded.POST /v1/audio/transcriptions— multipart WAV upload (file=@audio.wav).- Success:
200 + {"text": "..."}. - Missing
filefield:400 + {"error": "..."}. - Server warming up:
503 + {"error": "..."}. - Inference failure:
500 + {"error": "..."}. - Concurrent requests are serialised by an internal semaphore (default 1) to bound GPU memory; extra requests queue rather than OOM the device.
- Success:
WS /stream— binary frames of int16 LE PCM at 16 kHz; server emits{"type": "partial"|"final", "text": "..."}JSON frames; client sends{"type":"end"}to flush. Each WebSocket connection loads its own model.
The server returns permissive CORS headers (Access-Control-Allow-Origin: *) on all routes so the bundled browser demos work from file:// or a Vite dev server without proxying. For production deployments exposed beyond 127.0.0.1, put a reverse proxy in front and tighten the policy there.
See examples/mic-streaming.html and examples/web-app/ for working clients — covered in the Examples section below.
Both demos hit the server at http://127.0.0.1:9005 by default. Edit the URLs in the source if your server runs elsewhere.
The server commands below assume you've completed Path 0: Run from a clone — i.e. the binary is at dist/bin/parakeet-realtime-server and models are at dist/models/. If you installed the .deb, the binary is at /usr/bin/parakeet-realtime-server and models live under /var/lib/parakeet-realtime-server/models; substitute accordingly.
Single self-contained HTML file. Records the mic via Web Audio + AudioWorklet, opens WS /stream, and renders partial/final transcripts. No build step — but the file must be served over HTTP, not opened directly. Browsers reject AudioWorklet.addModule() calls from file:// documents.
# Terminal 1: parakeet server
./dist/bin/parakeet-realtime-server --model-dir ./dist/models
# Terminal 2: tiny static server for the HTML, bound to localhost only
python3 -m http.server -d examples 5000 --bind 127.0.0.1
# then open http://127.0.0.1:5000/mic-streaming.html in any modern browserSlightly fancier demo with mic-device selection, an event log panel, and a health badge.
cd examples/web-app
npm install
npm run dev # serves on http://localhost:5173vite.config.ts ships a built-in proxy for /health, /v1/*, and /stream to 127.0.0.1:9005, so you can flip src/config.ts to use same-origin paths if you'd rather not rely on CORS. Edit upstream in vite.config.ts to point at a different host/port.
| Script | Output | Use case |
|---|---|---|
scripts/build.sh |
dist/bin/parakeet-realtime-server |
Local dev / iteration |
scripts/release-tarball.sh vX.Y.Z |
parakeet-realtime-server-vX.Y.Z-linux-x64.tar.gz |
Distro-agnostic release asset |
scripts/build-deb.sh |
target/debian/parakeet-realtime-server_X.Y.Z-1_amd64.deb |
Ubuntu/Debian packages |
scripts/release.sh vX.Y.Z |
GitHub Release with both artifacts + SHA256SUMS |
One-shot publish to the Releases page |
scripts/release.sh is the maintainer's "ship it" command — it builds the tarball and .deb, stages stable-named copies (so the URLs in Install keep working across releases), generates SHA256SUMS, and uploads everything to a new GitHub Release at the given tag. Pre-flight checks: gh authenticated, working tree clean, tag exists, Cargo.toml version matches, no existing release.
The fetch scripts (fetch-onnxruntime.sh, fetch-cuda-deps.sh) skip work if the SOs are already in dist/lib/, so iterating on the build doesn't re-download every time.
# After build + fetch + download-models, run all scenarios:
scripts/test.sh # all scenarios, --device cuda
DEVICE=cpu scripts/test.sh # all scenarios, CPU
ONLY=baseline scripts/test.sh # just one
ONLY=multi-request scripts/test.sh
ONLY=error-response scripts/test.sh
ONLY=hostile-LD_LIBRARY_PATH scripts/test.shEach scenario spawns the binary against the bundled tests/fixtures/jfk.wav (~11 s, public-domain JFK inaugural excerpt). Non-zero exit on any assertion failure or server timeout.
| Scenario | What it verifies |
|---|---|
baseline |
Single transcription, model loads, /health flips to ready, transcript contains expected phrases. |
multi-request |
Three back-to-back transcribes against the same warmed server. Catches state-leak across requests (each request must see a fresh model state regardless of prior calls). |
error-response |
POST /v1/audio/transcriptions without a file field returns 400 + {"error": ...} (not 200 + {"text": ""}). |
hostile-LD_LIBRARY_PATH |
Server starts with LD_LIBRARY_PATH pointing at zero-byte stub .so files for libonnxruntime.so, libcudart.so.12, etc. Confirms the in-process SONAME preload (src/main.rs::preload) wins over LD_LIBRARY_PATH — without it the loader picks the stubs and the server hangs on warm-up. |
MODEL_DIR, PORT, DEVICE, and ONLY env vars override defaults.
A user's host often has its own CUDA toolkit, a pip-installed onnxruntime-gpu, or LD_LIBRARY_PATH=/usr/local/cuda/lib64 set globally. By default, the Linux dynamic loader's dlopen search order puts LD_LIBRARY_PATH above a binary's RUNPATH — so a system CUDA can shadow the libs we ship. To stay deterministic about ABI:
- Bundled SOs ship as real files in the lib dir.
dist/lib/(tarball) or/usr/lib/parakeet-realtime-server/(.deb). - Binary RUNPATH =
$ORIGIN/../lib:/usr/lib/parakeet-realtime-serverso the binary findslibonnxruntime.soat process start. libonnxruntime_providers_cuda.sois patchelf'd toRUNPATH=$ORIGINso its DT_NEEDED CUDA libs resolve from the same dir when ORT dlopens it. (Microsoft ships it bare.)- In-process preload at
main()start (src/main.rs::preload):- Sets
ORT_DYLIB_PATHto the absolute path of our bundledlibonnxruntime.so.1.25.1, so theortcrate's load-dynamic skips the loader's search. dlopens every bundledlib{cudart,cublas,cudnn,nvrtc,nvrtc-builtins,cufft,curand,onnxruntime}*by absolute path withRTLD_LAZY|RTLD_GLOBAL. Once a SONAME is registered in the process, glibc's loader reuses the existing object for any subsequentDT_NEEDEDresolution —LD_LIBRARY_PATHandld.so.cacheno longer apply.
- Sets
- Driver pre-flight when
--device cuda:dlopen("libcuda.so.1")and callcuDriverGetVersion. If the kernel driver is missing or supports < CUDA 12.0, refuse to start with an actionable message rather than a deep ORT error stack. - Audit log: after the preload, walk
/proc/self/mapsandWARNif any of our watched libs resolved from outside the bundle dir. This surfaces the residual cases SONAME pinning genuinely can't defeat —LD_PRELOAD,/etc/ld.so.preload,LD_AUDIT— instead of letting them silently corrupt inference.
Verified by the hostile-LD_LIBRARY_PATH test scenario above.
| Symptom | Fix |
|---|---|
error while loading shared libraries: libonnxruntime.so.X |
Bundle didn't ship; run scripts/fetch-onnxruntime.sh and check readelf -d ./bin/parakeet-realtime-server | grep RUNPATH. |
Error: NVIDIA driver supports CUDA up to ... (process exits) |
Driver too old. sudo ubuntu-drivers install to get R525+, then reboot. Or run with --device cpu. |
Error: libcuda.so.1 is not loadable (process exits) |
Driver missing entirely. nvidia-smi should work; if not, install the driver as above. |
/health stuck at {"ready": false} |
Model files missing or --model-dir wrong. Check the directory has encoder.onnx, decoder_joint.onnx, tokenizer.json. Run scripts/download-models.sh if not — it sha256-verifies each file. |
WARN: libcudart.so.12 resolved outside the bundled lib dir |
Something on the host (LD_PRELOAD, /etc/ld.so.preload, LD_AUDIT) is overriding our libs. The transcription may use a different CUDA ABI than expected. Unset the override or document the expected resolution. |
| Service starts but no GPU | Check the parakeet user is in video + render groups: id parakeet. |
| ORT logging is noisy | Set RUST_LOG=ort=warn,parakeet_realtime_server=info (in the env file for the systemd service). |
| Concurrent transcribes appear to queue | Intentional — the semaphore limits in-flight requests to one to bound GPU memory. Bump transcribe_permit size in src/main.rs if you have GPU memory to burn. |
sha256 mismatch from download-models.sh |
Truncated download or upstream republish. Re-run; if it persists update the expected hash in the script after verifying the new file. |
MIT. See LICENSE.