Skip to content

feat: replace GPAC with FFmpeg for MP4 demuxing #2170

Closed
DhanushVarma-2 wants to merge 5 commits intoCCExtractor:masterfrom
DhanushVarma-2:feat/replace-gpac-with-ffmpeg
Closed

feat: replace GPAC with FFmpeg for MP4 demuxing #2170
DhanushVarma-2 wants to merge 5 commits intoCCExtractor:masterfrom
DhanushVarma-2:feat/replace-gpac-with-ffmpeg

Conversation

@DhanushVarma-2
Copy link
Copy Markdown
Contributor

@DhanushVarma-2 DhanushVarma-2 commented Mar 4, 2026

In raising this pull request, I confirm the following (please check boxes):

  • I have read and understood the contributors guide.
  • I have checked that another pull request for this purpose does not exist.
  • I have considered, and confirmed that this submission will be valuable to others.
  • I accept that this submission may not be used, and the pull request closed at the will of the maintainer.
  • I give this submission freely, and claim no ownership to its content.
  • I have mentioned this change in the changelog.

My familiarity with the project is as follows (check one):

  • I have never used CCExtractor.
  • I have used CCExtractor just a couple of times.
  • I absolutely love CCExtractor, but have not contributed previously.
  • I am an active contributor to CCExtractor.

Adds a Rust FFmpeg-based MP4 demuxer (rsmpeg) as a compile-time alternative to GPAC. Enabled with cmake -DWITH_FFMPEG=ON.

Supported: AVC/H.264, HEVC/H.265, CEA-608, CEA-708, tx3g, chapter extraction
Known gap: VobSub detected but not decoded
GPAC path: Unchanged, still the default

Note: mp4_ffmpeg_exports.rs shows as binary in the diff due to null-terminated C string literals for FFI — the file is clean UTF-8.


@DhanushVarma-2 DhanushVarma-2 changed the title Feat/replace gpac with ffmpeg feat: replace GPAC with FFmpeg for MP4 demuxing Mar 5, 2026
@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/replace-gpac-with-ffmpeg branch from 006130e to 4039844 Compare March 6, 2026 06:23
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

The 9 Windows test failures (autoprogram, spupng, startcreditstext) are unrelated to MP4 demuxing — they involve subtitle encoding and credits detection, not the MP4 code path. These same tests pass on the Linux CI run. The Windows build itself was also delayed by a Chocolatey 503 outage when installing GPAC.

All 237 Linux tests pass, including the 3 MP4-specific tests. The Linux CI bot also notes this PR fixes 9 previously-broken tests that had never passed before.

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/replace-gpac-with-ffmpeg branch 13 times, most recently from 87f2178 to 9e95cd7 Compare March 7, 2026 06:59
@cfsmp3
Copy link
Copy Markdown
Contributor

cfsmp3 commented Mar 7, 2026

Before going deep into this.

It would be a lot more readable to separate both implementations. Have process_mp4_ffmpeg and process_mp4_gpac functions, possibly in separate files, so we only need a few #ifdef guards.

Minimize the changes in the existing code (which already is not super well organized)...

Even (much better), do the ffmpeg part in rust. We don't want to add more C to the code - we really want to switch to rust.

@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

Before going deep into this.

It would be a lot more readable to separate both implementations. Have process_mp4_ffmpeg and process_mp4_gpac functions, possibly in separate files, so we only need a few #ifdef guards.

Minimize the changes in the existing code (which already is not super well organized)...

Even (much better), do the ffmpeg part in rust. We don't want to add more C to the code - we really want to switch to rust.

yeah sure.

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/replace-gpac-with-ffmpeg branch 2 times, most recently from 513276d to aa75430 Compare March 10, 2026 11:03
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

@cfsmp3 Addressed all review feedback.
Rust FFmpeg demuxer supports AVC/H.264, HEVC/H.265, CEA-608/708, tx3g, and chapter extraction via rsmpeg. Build system fixes (ENABLE_HARDSUBX, link ordering, compat defines, swresample, bindgen allowlist), code cleanup (dead declarations, variable naming, match arm allocations), squashed to 3 commits. ccxr_dumpchapters is fully implemented shows as binary in diff due to FFI null terminators. VobSub detected but not decoded (known gap). Tested locally on MP4 files, all CI green.

@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

Screenshot 2026-03-10 at 6 15 03 PM

@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

@cfsmp3 can you review it.

@cfsmp3
Copy link
Copy Markdown
Contributor

cfsmp3 commented Apr 4, 2026

Hi @DhanushVarma-2 — it looks like this branch was force-pushed and now only contains the MKV KATE extension fix (3 commits all with the same message). The original FFmpeg MP4 demuxer work is gone.

The KATE fix is already merged via #2250, so this PR currently has no diff against master.

Did you accidentally force-push the wrong branch? You'll need to restore the FFmpeg work or open a new PR. Let us know.

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/replace-gpac-with-ffmpeg branch from 814de82 to aa75430 Compare April 4, 2026 22:04
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

HI @cfsmp3 Sorry about that , I accidentally force-pushed the wrong branch and overwrote the FFmpeg work. Restored it now and merged with upstream master. Should be good to review.

Copy link
Copy Markdown
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deep Review

Default build (no FFmpeg) — CLEAN

We built the PR without -DWITH_FFMPEG=ON and ran all 27 CI-flagged tests. Every single output file is byte-identical to master. The default GPAC path has zero regressions.

FFmpeg build — DOES NOT BUILD

We tried to build with -DWITH_FFMPEG=ON:

mkdir build_ffmpeg && cd build_ffmpeg
cmake ../src -DWITH_FFMPEG=ON
make -j$(nproc)

Error 1 (pre-existing, not your fault): ffmpeg_intgr.c:100av_find_best_stream incompatible pointer type. Our FFmpeg 7.x changed AVCodec** to const AVCodec**. We suppressed this with -Wno-incompatible-pointer-types to continue.

Error 2 (this PR): Linker fails with:

undefined reference to 'ccx_mp4_process_hevc_sample'

The symbol exists in mp4_rust_bridge.c → compiled into libccx.a. But libccx_rust.a (which needs it) comes LAST in the link order. The linker already consumed libccx.a and won't go back.

The existing codebase handles this with --undefined= flags (see --undefined=decode_vbi, --undefined=do_cb, --undefined=store_hdcc in the CMakeLists). You need the same for your new bridge symbols:

set(CMAKE_EXE_LINKER_FLAGS "${CMAKE_EXE_LINKER_FLAGS} -Wl,--undefined=ccx_mp4_process_hevc_sample -Wl,--undefined=ccx_mp4_process_avc_sample")

Error 3 (this PR): lib_ccx/CMakeLists.txt replaces ENABLE_HARDSUBX with ENABLE_FFMPEG_MP4:

-  set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_HARDSUBX")
+  set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -DENABLE_FFMPEG_MP4")

This means -DWITH_FFMPEG=ON no longer defines ENABLE_HARDSUBX, breaking hardsubx when FFmpeg is enabled. It should be added alongside, not replaced.

Code concerns

  1. ccextractor.c changes the default MP4 processing loop — it now iterates over all input files inside the MP4 case. On master, only ctx->inputfile[ctx->current_file] is processed here. This changes behavior in the default (GPAC) path even without FFmpeg. Our tests didn't catch a regression (single-file MP4 tests pass), but this is a behavioral change that needs justification.

  2. Unrelated cosmetic changes: ccx_decoders_isdb.c reformats the CLUT array (one value per line), teletext.rs reformats code from #2240. These don't belong in this PR — please remove them.

  3. Test files: tests/popon_test.scc and tests/popon_test.srt — what are these for? Not mentioned in the PR description.

Summary

Since the FFmpeg build doesn't link, we couldn't test the actual new feature. Please fix:

  1. Add --undefined linker flags for the bridge symbols
  2. Keep ENABLE_HARDSUBX alongside ENABLE_FFMPEG_MP4
  3. Remove unrelated cosmetic changes
  4. Explain or remove the test files

Once it builds, we'll do a full test with MP4 samples comparing GPAC vs FFmpeg output.

@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/replace-gpac-with-ffmpeg branch from 3b702c0 to e3e0a27 Compare April 5, 2026 13:06
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

@cfsmp3 can you review it again

@cfsmp3
Copy link
Copy Markdown
Contributor

cfsmp3 commented Apr 7, 2026

Follow-up Review (Apr 6)

Thanks for the quick turnaround on the review feedback — the commit message shows you addressed every point we raised. However, we found new issues.

isdb changes introduce bugs

The ccx_decoders_isdb.c diff introduces three bugs that were NOT in the previous version of this PR:

1. Removed malloc NULL check (line 348-352 deleted)

 text->buf = malloc(128);
-if (!text->buf)
-{
-    free(text);
-    return NULL;
-}

If malloc fails, *text->buf = 0 on the next line dereferences NULL → crash.

2. Bounds check changed from - 1 to + 1 (line 727)

-if (i >= sizeof(arg) - 1)
+if (i >= (sizeof(arg)) + 1)

This allows writing 2 extra bytes past the end of arg before the check triggers. Buffer overflow.

3. Removed bounds guard on write (line 735)

-if (i < sizeof(arg))
-    arg[i] = *buf++;
+arg[i] = *buf++;

Unconditional write without bounds check → potential OOB write.

These must be reverted. The isdb file should not be modified at all in this PR.

teletext.rs still has unrelated changes

The reformatting of code from #2240 (multi-line → single-line) is still present. Please revert — it doesn't belong in this PR.

FFmpeg build

We attempted to build with -DWITH_FFMPEG=ON but hit a pre-existing issue: master itself fails with -DWITH_FFMPEG=ON because it defines ENABLE_HARDSUBX without linking tesseract/OCR libs. This is NOT your fault — it's a pre-existing broken coupling in the build system. We can't verify the FFmpeg code path builds correctly until this is resolved separately.

Your --undefined linker flag additions and the ENABLE_HARDSUBX preservation look correct from code inspection, but we can't confirm at link time.

What's still needed

  1. Revert ALL changes to ccx_decoders_isdb.c — the file should not be in this PR
  2. Revert teletext.rs changes — unrelated reformatting
  3. Once those are clean, we'll work on getting the FFmpeg build path testable

The default (non-FFmpeg) build remains clean — 27 CI tests all byte-identical to master.

Dhanush Varma added 5 commits April 7, 2026 12:20
Replace GPAC with FFmpeg for MP4 demuxing via a Rust implementation
using rsmpeg. Supports AVC/H.264, HEVC/H.265, CEA-608, CEA-708,
and tx3g subtitle tracks. Includes chapter extraction via
ccxr_dumpchapters. VobSub tracks are detected but not yet supported.

Selected at compile time via ENABLE_FFMPEG_MP4 / enable_mp4_ffmpeg
Cargo feature.
- mp4_rust_bridge.c/.h: C-callable functions for AVC/HEVC/CC processing
- ccx_gpac_types.h: compat defines for GF_4CC, GF_ISOM_SUBTYPE_C708, etc.
- mp4.c: dispatch to Rust demuxer when ENABLE_FFMPEG_MP4 is set
- ccextractor.c: iterate all input files in MP4 mode
- Remove dead processmp4_ffmpeg/dumpchapters_ffmpeg declarations
- src/CMakeLists.txt: link swresample, re-add libs after ccx_rust for
  circular dependencies, --no-as-needed on Linux, ENABLE_HARDSUBX
- lib_ccx/CMakeLists.txt: target_compile_definitions for ENABLE_HARDSUBX
- rust/CMakeLists.txt: enable_mp4_ffmpeg Cargo feature
- Add --undefined= linker flags for bridge symbols (ccx_mp4_process_avc_sample,
  ccx_mp4_process_hevc_sample, ccx_mp4_process_cc_packet, ccx_mp4_flush_tx3g)
- Restore ENABLE_HARDSUBX alongside ENABLE_FFMPEG_MP4 in both CMakeLists
- Guard MP4 multi-file loop with #ifdef ENABLE_FFMPEG_MP4 (GPAC path unchanged)
- Revert unrelated cosmetic changes (isdb CLUT formatting, teletext.rs)
- Remove unrelated test files (popon_test.scc/srt)
- Revert .gitignore local dev additions
@DhanushVarma-2 DhanushVarma-2 force-pushed the feat/replace-gpac-with-ffmpeg branch from e3e0a27 to 864f86e Compare April 7, 2026 07:03
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

@cfsmp3 , all three issues from the follow-up review are fixed now. Rebased onto latest master which also cleaned up the root cause.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on linux. Below is a summary of the test results, when compared to test for commit 5fdd9b8...:
Report Name Tests Passed
Broken 10/13
CEA-708 1/14
DVB 4/7
DVD 3/3
DVR-MS 2/2
General 25/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 79/86
Teletext 20/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 56c9f34548..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 5d3a29f9f8..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27..., Last passed: Never
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b..., Last passed: Never
  • ccextractor --out=spupng c83f765c66..., Last passed: Never
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 4e56e88ba4..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 c0d2fba8c0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 27d7a43dd6..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 e2e2b501e0..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@ccextractor-bot
Copy link
Copy Markdown
Collaborator

CCExtractor CI platform finished running the test files on windows. Below is a summary of the test results, when compared to test for commit 5fdd9b8...:
Report Name Tests Passed
Broken 10/13
CEA-708 1/14
DVB 3/7
DVD 3/3
DVR-MS 2/2
General 25/27
Hardsubx 1/1
Hauppage 3/3
MP4 3/3
NoCC 10/10
Options 75/86
Teletext 20/21
WTV 13/13
XDS 34/34

Your PR breaks these cases:

  • ccextractor --autoprogram --out=srt --latin1 --quant 0 85271be4d2...
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla dab1c1bd65...
  • ccextractor --out=srt --latin1 --autoprogram 29e5ffd34b...
  • ccextractor --out=spupng c83f765c66...
  • ccextractor --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotbefore 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsnotafter 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatleast 1 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...
  • ccextractor --startcreditsforatmost 2 --startcreditstext "CCextractor Start crdit Testing" c4dd893cb9...

NOTE: The following tests have been failing on the master branch as well as the PR:

Congratulations: Merging this PR would fix the following tests:

  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 8e8229b88b..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 132d7df7e9..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 99e5eaafdc..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla 7aad20907e..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 01509e4d27..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --xds --latin1 --ucla 85058ad37e..., Last passed: Never
  • ccextractor --autoprogram --out=srt --latin1 --ucla b22260d065..., Last passed: Never
  • ccextractor --autoprogram --out=ttxt --latin1 --ucla --xds 7f41299cc7..., Last passed: Never

It seems that not all tests were passed completely. This is an indication that the output of some files is not as expected (but might be according to you).

Check the result page for more info.

@DhanushVarma-2 DhanushVarma-2 requested a review from cfsmp3 April 7, 2026 20:37
@DhanushVarma-2
Copy link
Copy Markdown
Contributor Author

Hi @cfsmp3 is there anything that needs to be updated in the code or is there any problem in testing?

Copy link
Copy Markdown
Contributor

@cfsmp3 cfsmp3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comprehensive Comparison Review (Apr 18)

We tested this PR alongside #2191 (gaurav02081), which implements the same FFmpeg MP4 demuxer feature. Both were built with -DWITH_FFMPEG=ON -DWITH_OCR=ON -DWITH_HARDSUBX=ON and tested against all 36 MP4/MOV samples in our collection.

Key finding: both PRs produce byte-identical output

#2170 and #2191 use the same rsmpeg backend and produce exactly the same output on all 36 samples. Neither is functionally superior to the other.

Critical: caption extraction gaps vs GPAC

The FFmpeg path is NOT a drop-in replacement for GPAC. 7 samples lose captions:

Sample GPAC FFmpeg (#2170 and #2191) Issue
132d7df7e993.mov 108KB 104B CEA-608 in H.264 SEI — no separate subtitle track
1974a299f050.mov 127KB 104B Same — captions embedded in video NALs
99e5eaafdc55.mov 164KB 104B Same
8849331ddae9.mp4 48KB (2352 lines, clean) 6KB (227 lines, garbled ÷ chars) c608 track exists but decoded as raw bytes, not through CEA-608 decoder
b2771c84c2a3.mp4 2.6KB (130 lines) 284B (11 lines) Same c608 garbling
5df914ce773d.mp4 1KB 0B c608 track present but entirely missed
1f3e951d516b.mp4 5KB 0B dvdsub (bitmap) in MP4 — not supported

One sample works BETTER with FFmpeg:
| ad9f9e03240e.m4v | 0B | 51KB | FFmpeg extracts what GPAC can't |

The remaining 28 samples produce identical output (including 0B on both).

Three bugs to fix

Bug 1 — CEA-608 in H.264 SEI (3 MOV samples): These files have no separate subtitle track — captions are embedded as SEI user data in the H.264 video stream NAL units. GPAC handles this via process_avc_sample() which parses NALs. The FFmpeg demuxer needs to do the same — scan AVC samples for SEI-embedded CEA-608 data.

Bug 2 — c608 track garbled (2 samples): The c608 subtitle track exists and is found, but the data comes out as raw CC bytes (÷÷ ÷ ÷HI÷, ÷TH÷ER÷E.÷) instead of decoded text. The data needs to go through the CEA-608 character decoder, not be dumped as raw bytes.

Bug 3 — c608 track missed (1 sample): 5df914ce773d.mp4 has a c608 track visible to ffprobe but FFmpeg extracts nothing. Investigate why.

(dvdsub/bitmap in MP4 can be documented as a known limitation for now.)

PR-specific issues for #2170

Compared to #2191:

  • Code is leaner (437 lines vs 1019) — this is a plus
  • isdb bugs introduced in review fix (removed malloc NULL check, changed bounds check from -1 to +1, removed bounds guard) — we flagged this Apr 7, please confirm these are reverted
  • Unrelated changes (teletext.rs reformatting) — should be clean now after rebase
  • No CI additions#2191 adds a cmake_ffmpeg_mp4 CI job which is useful. Consider adding one.
  • Linker approach (--no-as-needed + re-add libs) works but is less clean than #2191's INTERFACE_LINK_LIBRARIES approach
  • Still has ENABLE_HARDSUBX coupling in lib_ccx/CMakeLists.txt — our #2259 fixed this on master but your branch still has the old code. Rebase needed.

Samples tested

All 36 MP4/MOV/M4V files from ~/media_samples/completed/ and ~/media_samples/failed/, including files up to 4.5GB. No files were skipped.

@cfsmp3
Copy link
Copy Markdown
Contributor

cfsmp3 commented Apr 20, 2026

Closing — #2191 (gaurav02081) has been merged, which implements the same FFmpeg MP4 demuxer feature. Both PRs produced identical output on all 36 samples we tested. #2191 addressed all caption extraction bugs we reported (CEA-608 in SEI, c608 garbling, dvdsub), cleaned up to 3 commits, and included CI workflow additions. Thanks for the work on this — the original FFmpeg demuxer concept came from your PR.

@cfsmp3 cfsmp3 closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants