Feature: Audio Voice Enhancement Filter for Recordings (G.711 → AAC pipeline) #395
Replies: 1 comment 2 replies
-
|
This is a solid proposal — well-scoped, technically sound, and the author has done their homework. A few thoughts, organized by what I'd push back on vs. what I'd green-light: What's strong: The CPU-overhead argument is correct and worth emphasizing in the PR description. Since you're already decoding G.711 → PCM → AAC, the filter graph runs on data that's already in PCM form. On a Pi-class device recording 8 streams, The scope discipline is the best part, recordings only, no touching the WebRTC/live path, no go2rtc changes. That's exactly the right call and it's what makes this mergeable vs. a multi-month rewrite. What I'd push back on:
What I'd ask for before merging:
|
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Overview
Most consumer CCTV cameras deliver audio as G.711 (µ-law/a-law, 8kHz
— essentially telephone quality). LightNVR already transcodes this to AAC
for MP4 recording. Since the stream must be decoded to raw PCM during this
step anyway, inserting DSP filters into the pipeline adds negligible CPU
overhead while significantly improving voice clarity in recordings.
This is particularly relevant for outdoor cameras where background noise
(wind, birds, ambient hum) can make recorded speech difficult to
understand.
Why the Overhead is Negligible
The recording pipeline for a typical IP camera in lightNVR looks like this:
Camera
│
├── Video: H.264 ──► go2rtc ──► -c:v copy ──► MP4 (no transcode)
│
└── Audio: G.711 ──► decode to PCM ──► [filters] ──► AAC encode ──► MP4
Video is passed through with
-c:v copy— no decode, no encode, zeroCPU. The G.711 → AAC transcode for audio is mandatory regardless, and
the proposed filters operate on the already-decoded PCM data.
Proposed Filter Chain
highpass=f=80lowpass=f=8000afftdn=nf=-25afftdnis the most impactful filter — it learns the noise floor andsubtracts it, cleanly separating voice from ambient background. The
nf(noise floor) parameter controls aggressiveness:-15is gentle,-35is aggressive.Note:
afftdnrequires ~1-2 seconds to profile the noise floor at thestart of each recording. For NVR recordings this is irrelevant.
Proposed UI
In the stream configuration panel, alongside the existing audio recording
toggle:
Record Audio: [✓ On]
Voice Enhancement: [✓ On]
Reduces wind, birds and background noise to improve voice clarity
One could consider an advanced view that allows to play with low-, high-pass and afftdn values, but I 'd rather wait if there is demand from people and prefer just a simple toggle to start with.
Audio Codec Coverage
For context, the distribution of audio codecs found in consumer CCTV
cameras:
Supporting G.711 → AAC with filters covers the large majority of
real-world deployments. G.726 input and AAC passthrough (filter only,
no re-encode) would be natural follow-ups.
Implementation Scope
This is intentionally a small, self-contained change:
The filter only applies to RECORDINGS — the live WebRTC audio path
via go2rtc is unaffected, keeping that path simple and low-latency.
Happy to discuss, refine and please challenge it. Plus I will need some help around the gui part at least.
Beta Was this translation helpful? Give feedback.
All reactions