Skip to content

Serialize recreateEncoder with encode goroutine#157

Merged
jayli-nuro merged 1 commit into
mainfrom
jayli/recreate-encoder-race
Jun 3, 2026
Merged

Serialize recreateEncoder with encode goroutine#157
jayli-nuro merged 1 commit into
mainfrom
jayli/recreate-encoder-race

Conversation

@jayli-nuro

Copy link
Copy Markdown
Contributor

Why

After #156 (event-driven encode), SetBitrateAllocation calling recreateEncoder on a 0→positive transition can race the per-track encode goroutine:

  1. The encode goroutine snapshots track.encodedReader under tracksMu.RLock, releases the lock, then calls Read() on the snapshotted pointer.
  2. recreateEncoder runs under tracksMu.Lock() and does:
    • fb.ResetInitialized() (temporarily flips FrameBuffer's Read semantics)
    • mediaTrack.NewEncodedReader(...) (creates a second encoder consuming from the same FrameBuffer)
    • track.encodedReader.Close() (destroys the old encoder)
    • swaps track.encodedReader

Race vectors:

  • release() after Close(): the goroutine's in-flight Read() returns (encoded, release, err). By the time it invokes release(), the old encoder may have been destroyed. Whether that's a no-op, double-free, or UAF depends on pion/mediadevices internals; not a documented contract.
  • Concurrent FrameBuffer access: during NewEncodedReader's codec-detection reads, the old encoder is still consuming from the same FrameBuffer. Real frames can be stolen between the two encoders.
  • Black-frame emission: while fb is uninitialized, the old encoder's pending Read falls through the uninitialized path and can emit a black frame as if real.

vpx.encoder.mu serializes Read vs Close at the vpx layer, so the bare crash window inside Close-mid-Read is narrow — but the release()-after-Close() window is not protected by it.

Fix

Per-track encoderMu sync.RWMutex.

  • The encode goroutine takes encoderMu.RLock() immediately before encodedReader.Read() and holds it through release(). The lock is released regardless of success/error via defer.
  • recreateEncoder takes encoderMu.Lock() for the duration of the swap (FrameBuffer reset, NewEncodedReader, Close, pointer assignment).

Lock ordering (consistent, no deadlock):

  • Writers: tracksMu.Lock then encoderMu.Lock.
  • Readers: tracksMu.RLock for a brief snapshot, then tracksMu.RUnlock, then encoderMu.RLock — locks are never nested.

Cost: encoderMu.Lock blocks until the in-flight encode cycle (Read + WriteSample + release) completes — tens of ms typically. recreateEncoder is infrequent (0→positive bitrate transitions only), so the latency cost is acceptable.

Tests

  • go test -race -count=1 ./sender/... — clean.
  • go test -race -bench BenchmarkUpdateBitrate -benchtime=500ms — clean, no regression vs main.

Reported by @jcui on a downstream consumer that calls SetBitrateAllocation immediately after AddVideoTrack during init, triggering 0→positive transitions on every track while encode goroutines are already running.

@codecov

codecov Bot commented Jun 3, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 45.52%. Comparing base (ba9115c) to head (5d33db7).

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #157      +/-   ##
==========================================
+ Coverage   45.43%   45.52%   +0.09%     
==========================================
  Files          18       18              
  Lines        1774     1777       +3     
==========================================
+ Hits          806      809       +3     
  Misses        901      901              
  Partials       67       67              
Flag Coverage Δ
go 45.52% <100.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@jayli-nuro jayli-nuro requested a review from hanguyen-nuro June 3, 2026 20:50
Per-track encoderMu (sync.RWMutex). The encode goroutine holds
RLock from encodedReader.Read through release(); recreateEncoder
takes Lock to swap the reader without racing the goroutine.

Before this, SetBitrateAllocation calling recreateEncoder on a
0->positive transition could close the old encoder, reset the
shared FrameBuffer, or create a second NewEncodedReader while the
per-track encode goroutine was inside Read() or holding a release()
callback on the old encoder. Lock order: tracksMu (write) then
encoderMu (write); readers take them sequentially and never nest.
@jayli-nuro jayli-nuro force-pushed the jayli/recreate-encoder-race branch from fdd41e0 to 5d33db7 Compare June 3, 2026 21:06
@jayli-nuro jayli-nuro merged commit 6bdcbdc into main Jun 3, 2026
19 checks passed
@jayli-nuro jayli-nuro deleted the jayli/recreate-encoder-race branch June 3, 2026 21:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants