feat: replace GPAC with FFmpeg for MP4 demuxing #2170
feat: replace GPAC with FFmpeg for MP4 demuxing #2170DhanushVarma-2 wants to merge 5 commits intoCCExtractor:masterfrom
Conversation
006130e to
4039844
Compare
|
The 9 Windows test failures (autoprogram, spupng, startcreditstext) are unrelated to MP4 demuxing — they involve subtitle encoding and credits detection, not the MP4 code path. These same tests pass on the Linux CI run. The Windows build itself was also delayed by a Chocolatey 503 outage when installing GPAC. All 237 Linux tests pass, including the 3 MP4-specific tests. The Linux CI bot also notes this PR fixes 9 previously-broken tests that had never passed before. |
87f2178 to
9e95cd7
Compare
|
Before going deep into this. It would be a lot more readable to separate both implementations. Have process_mp4_ffmpeg and process_mp4_gpac functions, possibly in separate files, so we only need a few #ifdef guards. Minimize the changes in the existing code (which already is not super well organized)... Even (much better), do the ffmpeg part in rust. We don't want to add more C to the code - we really want to switch to rust. |
yeah sure. |
513276d to
aa75430
Compare
|
@cfsmp3 Addressed all review feedback. |
aa75430 to
814de82
Compare
|
@cfsmp3 can you review it. |
|
Hi @DhanushVarma-2 — it looks like this branch was force-pushed and now only contains the MKV KATE extension fix (3 commits all with the same message). The original FFmpeg MP4 demuxer work is gone. The KATE fix is already merged via #2250, so this PR currently has no diff against master. Did you accidentally force-push the wrong branch? You'll need to restore the FFmpeg work or open a new PR. Let us know. |
814de82 to
aa75430
Compare
|
HI @cfsmp3 Sorry about that , I accidentally force-pushed the wrong branch and overwrote the FFmpeg work. Restored it now and merged with upstream master. Should be good to review. |
cfsmp3
left a comment
There was a problem hiding this comment.
Deep Review
Default build (no FFmpeg) — CLEAN
We built the PR without -DWITH_FFMPEG=ON and ran all 27 CI-flagged tests. Every single output file is byte-identical to master. The default GPAC path has zero regressions.
FFmpeg build — DOES NOT BUILD
We tried to build with -DWITH_FFMPEG=ON:
mkdir build_ffmpeg && cd build_ffmpeg
cmake ../src -DWITH_FFMPEG=ON
make -j$(nproc)Error 1 (pre-existing, not your fault): ffmpeg_intgr.c:100 — av_find_best_stream incompatible pointer type. Our FFmpeg 7.x changed AVCodec** to const AVCodec**. We suppressed this with -Wno-incompatible-pointer-types to continue.
Error 2 (this PR): Linker fails with:
undefined reference to 'ccx_mp4_process_hevc_sample'
The symbol exists in mp4_rust_bridge.c → compiled into libccx.a. But libccx_rust.a (which needs it) comes LAST in the link order. The linker already consumed libccx.a and won't go back.
The existing codebase handles this with --undefined= flags (see --undefined=decode_vbi, --undefined=do_cb, --undefined=store_hdcc in the CMakeLists). You need the same for your new bridge symbols:
set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--undefined=ccx_mp4_process_hevc_sample -Wl,--undefined=ccx_mp4_process_avc_sample")Error 3 (this PR): lib_ccx/CMakeLists.txt replaces ENABLE_HARDSUBX with ENABLE_FFMPEG_MP4:
- set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_HARDSUBX")
+ set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_FFMPEG_MP4")This means -DWITH_FFMPEG=ON no longer defines ENABLE_HARDSUBX, breaking hardsubx when FFmpeg is enabled. It should be added alongside, not replaced.
Code concerns
-
ccextractor.cchanges the default MP4 processing loop — it now iterates over all input files inside the MP4 case. On master, onlyctx->inputfile[ctx->current_file]is processed here. This changes behavior in the default (GPAC) path even without FFmpeg. Our tests didn't catch a regression (single-file MP4 tests pass), but this is a behavioral change that needs justification. -
Unrelated cosmetic changes:
ccx_decoders_isdb.creformats the CLUT array (one value per line),teletext.rsreformats code from #2240. These don't belong in this PR — please remove them. -
Test files:
tests/popon_test.sccandtests/popon_test.srt— what are these for? Not mentioned in the PR description.
Summary
Since the FFmpeg build doesn't link, we couldn't test the actual new feature. Please fix:
- Add
--undefinedlinker flags for the bridge symbols - Keep
ENABLE_HARDSUBXalongsideENABLE_FFMPEG_MP4 - Remove unrelated cosmetic changes
- Explain or remove the test files
Once it builds, we'll do a full test with MP4 samples comparing GPAC vs FFmpeg output.
3b702c0 to
e3e0a27
Compare
|
@cfsmp3 can you review it again |
Follow-up Review (Apr 6)Thanks for the quick turnaround on the review feedback — the commit message shows you addressed every point we raised. However, we found new issues. isdb changes introduce bugsThe 1. Removed malloc NULL check (line 348-352 deleted) text->buf = malloc(128);
-if (!text->buf)
-{
- free(text);
- return NULL;
-}If 2. Bounds check changed from -if (i >= sizeof(arg) - 1)
+if (i >= (sizeof(arg)) + 1)This allows writing 2 extra bytes past the end of 3. Removed bounds guard on write (line 735) -if (i < sizeof(arg))
- arg[i] = *buf++;
+arg[i] = *buf++;Unconditional write without bounds check → potential OOB write. These must be reverted. The isdb file should not be modified at all in this PR. teletext.rs still has unrelated changesThe reformatting of code from #2240 (multi-line → single-line) is still present. Please revert — it doesn't belong in this PR. FFmpeg buildWe attempted to build with Your What's still needed
The default (non-FFmpeg) build remains clean — 27 CI tests all byte-identical to master. |
Replace GPAC with FFmpeg for MP4 demuxing via a Rust implementation using rsmpeg. Supports AVC/H.264, HEVC/H.265, CEA-608, CEA-708, and tx3g subtitle tracks. Includes chapter extraction via ccxr_dumpchapters. VobSub tracks are detected but not yet supported. Selected at compile time via ENABLE_FFMPEG_MP4 / enable_mp4_ffmpeg Cargo feature.
- mp4_rust_bridge.c/.h: C-callable functions for AVC/HEVC/CC processing - ccx_gpac_types.h: compat defines for GF_4CC, GF_ISOM_SUBTYPE_C708, etc. - mp4.c: dispatch to Rust demuxer when ENABLE_FFMPEG_MP4 is set - ccextractor.c: iterate all input files in MP4 mode - Remove dead processmp4_ffmpeg/dumpchapters_ffmpeg declarations
- src/CMakeLists.txt: link swresample, re-add libs after ccx_rust for circular dependencies, --no-as-needed on Linux, ENABLE_HARDSUBX - lib_ccx/CMakeLists.txt: target_compile_definitions for ENABLE_HARDSUBX - rust/CMakeLists.txt: enable_mp4_ffmpeg Cargo feature
- Add --undefined= linker flags for bridge symbols (ccx_mp4_process_avc_sample, ccx_mp4_process_hevc_sample, ccx_mp4_process_cc_packet, ccx_mp4_flush_tx3g) - Restore ENABLE_HARDSUBX alongside ENABLE_FFMPEG_MP4 in both CMakeLists - Guard MP4 multi-file loop with #ifdef ENABLE_FFMPEG_MP4 (GPAC path unchanged) - Revert unrelated cosmetic changes (isdb CLUT formatting, teletext.rs) - Remove unrelated test files (popon_test.scc/srt) - Revert .gitignore local dev additions
e3e0a27 to
864f86e
Compare
|
@cfsmp3 , all three issues from the follow-up review are fixed now. Rebased onto latest master which also cleaned up the root cause. |
CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 5fdd9b8...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 5fdd9b8...:
Your PR breaks these cases:
NOTE: The following tests have been failing on the master branch as well as the PR:
Congratulations: Merging this PR would fix the following tests:
It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you). Check the result page for more info. |
|
Hi @cfsmp3 is there anything that needs to be updated in the code or is there any problem in testing? |
cfsmp3
left a comment
There was a problem hiding this comment.
Comprehensive Comparison Review (Apr 18)
We tested this PR alongside #2191 (gaurav02081), which implements the same FFmpeg MP4 demuxer feature. Both were built with -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON and tested against all 36 MP4/MOV samples in our collection.
Key finding: both PRs produce byte-identical output
#2170 and #2191 use the same rsmpeg backend and produce exactly the same output on all 36 samples. Neither is functionally superior to the other.
Critical: caption extraction gaps vs GPAC
The FFmpeg path is NOT a drop-in replacement for GPAC. 7 samples lose captions:
| Sample | GPAC | FFmpeg (#2170 and #2191) | Issue |
|---|---|---|---|
132d7df7e993.mov |
108KB | 104B | CEA-608 in H.264 SEI — no separate subtitle track |
1974a299f050.mov |
127KB | 104B | Same — captions embedded in video NALs |
99e5eaafdc55.mov |
164KB | 104B | Same |
8849331ddae9.mp4 |
48KB (2352 lines, clean) | 6KB (227 lines, garbled ÷ chars) |
c608 track exists but decoded as raw bytes, not through CEA-608 decoder |
b2771c84c2a3.mp4 |
2.6KB (130 lines) | 284B (11 lines) | Same c608 garbling |
5df914ce773d.mp4 |
1KB | 0B | c608 track present but entirely missed |
1f3e951d516b.mp4 |
5KB | 0B | dvdsub (bitmap) in MP4 — not supported |
One sample works BETTER with FFmpeg:
| ad9f9e03240e.m4v | 0B | 51KB | FFmpeg extracts what GPAC can't |
The remaining 28 samples produce identical output (including 0B on both).
Three bugs to fix
Bug 1 — CEA-608 in H.264 SEI (3 MOV samples): These files have no separate subtitle track — captions are embedded as SEI user data in the H.264 video stream NAL units. GPAC handles this via process_avc_sample() which parses NALs. The FFmpeg demuxer needs to do the same — scan AVC samples for SEI-embedded CEA-608 data.
Bug 2 — c608 track garbled (2 samples): The c608 subtitle track exists and is found, but the data comes out as raw CC bytes (÷÷ ÷ ÷HI÷, ÷TH÷ER÷E.÷) instead of decoded text. The data needs to go through the CEA-608 character decoder, not be dumped as raw bytes.
Bug 3 — c608 track missed (1 sample): 5df914ce773d.mp4 has a c608 track visible to ffprobe but FFmpeg extracts nothing. Investigate why.
(dvdsub/bitmap in MP4 can be documented as a known limitation for now.)
PR-specific issues for #2170
Compared to #2191:
- Code is leaner (437 lines vs 1019) — this is a plus
- isdb bugs introduced in review fix (removed malloc NULL check, changed bounds check from
-1to+1, removed bounds guard) — we flagged this Apr 7, please confirm these are reverted - Unrelated changes (teletext.rs reformatting) — should be clean now after rebase
- No CI additions — #2191 adds a
cmake_ffmpeg_mp4CI job which is useful. Consider adding one. - Linker approach (
--no-as-needed+ re-add libs) works but is less clean than #2191'sINTERFACE_LINK_LIBRARIESapproach - Still has ENABLE_HARDSUBX coupling in
lib_ccx/CMakeLists.txt— our #2259 fixed this on master but your branch still has the old code. Rebase needed.
Samples tested
All 36 MP4/MOV/M4V files from ~/media_samples/completed/ and ~/media_samples/failed/, including files up to 4.5GB. No files were skipped.
|
Closing — #2191 (gaurav02081) has been merged, which implements the same FFmpeg MP4 demuxer feature. Both PRs produced identical output on all 36 samples we tested. #2191 addressed all caption extraction bugs we reported (CEA-608 in SEI, c608 garbling, dvdsub), cleaned up to 3 commits, and included CI workflow additions. Thanks for the work on this — the original FFmpeg demuxer concept came from your PR. |

In raising this pull request, I confirm the following (please check boxes):
My familiarity with the project is as follows (check one):
Adds a Rust FFmpeg-based MP4 demuxer (rsmpeg) as a compile-time alternative to GPAC. Enabled with
cmake -DWITH_FFMPEG=ON.Supported: AVC/H.264, HEVC/H.265, CEA-608, CEA-708, tx3g, chapter extraction
Known gap: VobSub detected but not decoded
GPAC path: Unchanged, still the default
Note:
mp4_ffmpeg_exports.rsshows as binary in the diff due to null-terminated C string literals for FFI — the file is clean UTF-8.