Feature: Audio Voice Enhancement Filter for Recordings (G.711 → AAC pipeline) #395

origin2000 · 2026-04-20T23:45:07Z

origin2000
Apr 20, 2026

Overview

Most consumer CCTV cameras deliver audio as G.711 (µ-law/a-law, 8kHz
— essentially telephone quality). LightNVR already transcodes this to AAC
for MP4 recording. Since the stream must be decoded to raw PCM during this
step anyway, inserting DSP filters into the pipeline adds negligible CPU
overhead while significantly improving voice clarity in recordings.

This is particularly relevant for outdoor cameras where background noise
(wind, birds, ambient hum) can make recorded speech difficult to
understand.

Why the Overhead is Negligible

The recording pipeline for a typical IP camera in lightNVR looks like this:

Camera
│
├── Video: H.264 ──► go2rtc ──► -c:v copy ──► MP4 (no transcode)
│
└── Audio: G.711 ──► decode to PCM ──► [filters] ──► AAC encode ──► MP4

Video is passed through with -c:v copy — no decode, no encode, zero
CPU. The G.711 → AAC transcode for audio is mandatory regardless, and
the proposed filters operate on the already-decoded PCM data.

Proposed Filter Chain

Filter	Default	Purpose
`highpass=f=80`	80 Hz	Removes wind rumble, low-frequency hum, vibration noise
`lowpass=f=8000`	8000 Hz	Attenuates bird chirping, high-frequency ambient noise
`afftdn=nf=-25`	-25 dB	FFT-based noise reduction, suppresses continuous background noise

afftdn is the most impactful filter — it learns the noise floor and
subtracts it, cleanly separating voice from ambient background. The
nf (noise floor) parameter controls aggressiveness: -15 is gentle,
-35 is aggressive.

Note: afftdn requires ~1-2 seconds to profile the noise floor at the
start of each recording. For NVR recordings this is irrelevant.

Proposed UI

In the stream configuration panel, alongside the existing audio recording
toggle:

Record Audio: [✓ On]
Voice Enhancement: [✓ On]
Reduces wind, birds and background noise to improve voice clarity

One could consider an advanced view that allows to play with low-, high-pass and afftdn values, but I 'd rather wait if there is demand from people and prefer just a simple toggle to start with.

Audio Codec Coverage

For context, the distribution of audio codecs found in consumer CCTV
cameras:

Codec	Prevalence	Notes
G.711 (µ-law/a-law)	~70%	Dominant in consumer cameras
G.726	~15%	Older Hikvision, Axis, Bosch
AAC	~10%	Newer premium cameras, UniFi
G.722	~3%	Some Axis, Bosch
Others (MP3, Opus)	~2%	Rare

Supporting G.711 → AAC with filters covers the large majority of
real-world deployments. G.726 input and AAC passthrough (filter only,
no re-encode) would be natural follow-ups.

Implementation Scope

This is intentionally a small, self-contained change:

Store filter parameters per stream in the existing SQLite schema
Pass filter string to the FFmpeg invocation for recording
Add toggle + optional advanced panel to the stream config UI
No changes to the video pipeline, go2rtc config, or WebRTC path

The filter only applies to RECORDINGS — the live WebRTC audio path
via go2rtc is unaffected, keeping that path simple and low-latency.

As far as I understand, no major open-source NVR project (Home Assistant,
Frigate) currently offers built-in audio enhancement for recordings.
Home Assistant largely ignores audio processing; Frigate removes audio
by default and focuses on AI-based audio classification rather than
voice clarity improvement. We might get a little differenciator without compromising
the spirit of this project being lightweight.

Happy to discuss, refine and please challenge it. Plus I will need some help around the gui part at least.

matteius · 2026-04-21T05:28:31Z

matteius
Apr 21, 2026
Maintainer

This is a solid proposal — well-scoped, technically sound, and the author has done their homework. A few thoughts, organized by what I'd push back on vs. what I'd green-light:

What's strong:

The CPU-overhead argument is correct and worth emphasizing in the PR description. Since you're already decoding G.711 → PCM → AAC, the filter graph runs on data that's already in PCM form. On a Pi-class device recording 8 streams, highpass/lowpass are essentially free (biquad IIR, handful of multiplies per sample at 8kHz), and afftdn at 8kHz is maybe 1-2% per stream. The framing "negligible overhead" holds up.

The scope discipline is the best part, recordings only, no touching the WebRTC/live path, no go2rtc changes. That's exactly the right call and it's what makes this mergeable vs. a multi-month rewrite.

What I'd push back on:

lowpass=f=8000 on 8kHz G.711 input is a no-op. G.711 is already bandlimited to ~3.4kHz (telephone band) and sampled at 8kHz, so Nyquist is 4kHz. A lowpass at 8000Hz does literally nothing on this input. You probably want lowpass=f=3400 or just drop the filter entirely for G.711 sources. For the G.722 / AAC / future wideband cases, 8kHz makes sense — so the filter chain should probably be codec-aware, or the default should be conditional.
afftdn noise-floor profiling on segmented recordings. LightNVR uses segmented MP4 recording (typical NVR pattern — rolling segments). If each segment is a fresh FFmpeg invocation, you eat the 1-2s profiling window per segment, not per stream. On 60s segments that's a meaningful chunk of each file with degraded denoising. Worth checking whether the segmenter keeps the filter graph alive across segment boundaries (ffmpeg's -f segment muxer does; separate invocations per segment don't). If it's the latter, afftdn with nt=w (whitelist noise type) or a pre-computed noise profile via arnndn with a model file might behave better.
arnndn as an alternative worth mentioning. RNNoise (via arnndn) is specifically trained on speech vs. non-speech and handles non-stationary noise (bird chirps, intermittent wind gusts) better than afftdn, which assumes a roughly stationary noise floor. Downside: needs a model file shipped with the binary, and it's trained at 48kHz so you're resampling 8kHz → 48kHz → 8kHz. CPU is still modest but not as trivial as afftdn. Might be worth a sentence acknowledging the tradeoff rather than defaulting to it.
The codec prevalence numbers should be sourced or hedged. The ~70/15/10/3/2 split feels plausible but I'd bet it's from the author's gut rather than a dataset. Not a blocker — just phrase as "in our experience" rather than presenting as measured.

What I'd ask for before merging:

A short A/B sample (10s of the same camera clip, raw vs. filtered) attached to the PR. This sells the feature to users and also surfaces edge cases — e.g., if afftdn is too aggressive it produces audible "musical noise" artifacts that are arguably worse than the original hiss.
Per-stream toggle stored in the existing streams table, defaulting off. Don't change behavior for existing recordings on upgrade.
A way to disable via config/setting for users who want deterministic bit-exact recordings (some legal/forensic use cases need the unmodified audio preserved — this is a real concern for NVR specifically since recordings sometimes end up as evidence).

2 replies

origin2000 Apr 21, 2026
Author

Fair point — the right defaults will come from testing against real-world data.
This is actually a critical concern. In driveway scenarios we're looking at roughly 10s prebuffer, a couple of seconds of live stream, and 30s postbuffer. If those are three separate ffmpeg invocations, each with 1–2s of afftdn noise profiling overhead clearly becomes noticeable on a ~60s clip. Needs further investigation — any ideas?
arnndn is worth considering, but only if there are pretrained speech models available (e.g. https://github.com/GregorR/rnnoise-models) that deliver a meaningful quality improvement over afftdn. For UX simplicity, I'd advise against user-supplied models as a default — though exposing it as an opt-in option seems reasonable.
To clarify: this was desk research — market leaders in the consumer space, their codec choices, plus a few related signals. More than gut feel, but not strictly validated data.

"A way to disable via config/setting for users who want to…" — could you elaborate? If the feature is opt-in by default and requires explicit activation, what's the disable path you're referring to?

matteius Apr 23, 2026
Maintainer

I mean that it should be opt-in and not always on, per stream probably.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feature: Audio Voice Enhancement Filter for Recordings (G.711 → AAC pipeline) #395

Uh oh!

{{title}}

Uh oh!

Replies: 1 comment 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Uh oh!

Feature: Audio Voice Enhancement Filter for Recordings (G.711 → AAC pipeline) #395

Uh oh!

origin2000 Apr 20, 2026

Overview

Why the Overhead is Negligible

Proposed Filter Chain

Proposed UI

Audio Codec Coverage

Implementation Scope

Replies: 1 comment · 2 replies

Uh oh!

matteius Apr 21, 2026 Maintainer

Uh oh!

Uh oh!

origin2000 Apr 21, 2026 Author

Uh oh!

matteius Apr 23, 2026 Maintainer

origin2000
Apr 20, 2026

Replies: 1 comment 2 replies

matteius
Apr 21, 2026
Maintainer

origin2000 Apr 21, 2026
Author

matteius Apr 23, 2026
Maintainer