fix: guard file unlink in audio extraction to prevent crash in library mode#1637
Open
enwaiax wants to merge 3 commits intoNVIDIA:mainfrom
Open
fix: guard file unlink in audio extraction to prevent crash in library mode#1637enwaiax wants to merge 3 commits intoNVIDIA:mainfrom
enwaiax wants to merge 3 commits intoNVIDIA:mainfrom
Conversation
…y mode In library mode, audio content arrives as base64-encoded binary data (not a file path). PR NVIDIA#1119 added file-path support for Dataloader but left Path.unlink() unconditional, causing OSError (ENAMETOOLONG) when the base64 string (~2MB) is treated as a filename. Use a `source_file_path` sentinel so unlink only runs when content was actually resolved from an on-disk file (Dataloader/V2 API path). Fixes: NVBug 5984261 Made-with: Cursor
jperez999
approved these changes
Mar 16, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fix
OSError: [Errno 36] File name too longcrash in_extract_from_audio()when processing audio files in library mode.Problem
PR #1119 (commit
f511b315, 2025-11-13, "Dataloader video ingest pipeline support") introduced a regression that breaks all audio extraction in library mode.The root cause is in
api/src/nv_ingest_api/internal/extract/audio/audio_extraction.pylines 61-80. PR #1119 added logic to support content being either:content= base64(file_path_string) → decode path → read file → delete temp filecontent= base64(audio_binary) → use directlyHowever,
Path(base64_file_path).unlink(missing_ok=True)was placed unconditionally outside both branches. When theexceptcatchesUnicodeDecodeError(audio binary can't decode as UTF-8),base64_file_pathstill holds the original base64 string (~2MB for a 1.5MB WAV file), andunlink()triggersOSError: [Errno 36] File name too long.Why only library mode is affected
RestClient(default)v2/ingest.py:963-1000, writes to temp file, passes file path as contentbase64("/tmp/chunk_0001.mp3")SimpleClientbase64(<1.5MB WAV binary>)except: pass→ unlink on 2MB string → OSError ❌Fix
Replace the unconditional
unlinkwith asource_file_pathsentinel variable that is only set when content is actually resolved from an on-disk file:Reproduction
Test plan
api_tests/internal/extract/audio/test_audio_extraction.py)infer()called with correct data, no OSErrorinfer()called → temp file deleteddata/multimodal_test.wav): pipeline reaches RIVA ASRFixes: NVBug 5984261
Description
Checklist