Skip to content

demux/packet_pool: add fast quit mode#12566

Open
kasper93 wants to merge 1 commit intompv-player:masterfrom
kasper93:demux_d
Open

demux/packet_pool: add fast quit mode#12566
kasper93 wants to merge 1 commit intompv-player:masterfrom
kasper93:demux_d

Conversation

@kasper93
Copy link
Member

@kasper93 kasper93 commented Oct 5, 2023

The cache is stored as a linked list, containing possibly hundreds of thousands of small cached packets. Iterating over all of them and deallocating takes a significant amount of time, especially if this memory gets swapped out.

When exiting the player, there is no need to clear all of this, let the kernel reclaim the memory after the program exits.

This change significantly speeds up the closing time when a large demuxer cache is used.

Fixes: #12294, #12563

@llyyr
Copy link
Contributor

llyyr commented Oct 5, 2023

I'm not a big fan of doing this, this isn't really a problem outside of stupid settings like having a bigger demuxer cache than physical memory available. This will also ignore all leaks in demuxer.c, and in general just doesn't feel like the right thing to do.

mpv closes fast enough when it's not swapped already, even with 12GiB demuxer cache on my machine.

@Dudemanguy
Copy link
Member

You'd have to differentiate between libmpv and the "normal player" at the least.

@Andarwinux
Copy link
Contributor

It seems to be platform related, Windows deallocate cache is significantly slower than Linux, even 8GB takes about 10 seconds on my machine.

@kasper93
Copy link
Member Author

kasper93 commented Oct 5, 2023

I'm not a big fan of doing this, this isn't really a problem outside of stupid settings like having a bigger demuxer cache than physical memory available. This will also ignore all leaks in demuxer.c, and in general just doesn't feel like the right thing to do.

mpv closes fast enough when it's not swapped already, even with 12GiB demuxer cache on my machine.

Allocating hundreds of thousands small packets it bad enough and iterating over linked list just to deallocate them for nothing is just unneeded. Not doing that is just pragmatic thing to do.

Also the overhead is clearly visible in profile, whether it is Linux or Windows. While on Windows the overhead of actual memory free is quite huge in comparison. It doesn't matter, because even when not dealocating, traversing whole list is significant. And the swapped memory is only one factor that expose the completely unneeded clearing that we do. Whether it is 150MB, 1GB or 100GB, touching all this memory on exit is just not needed overhead.

You'd have to differentiate between libmpv and the "normal player" at the least.

I expected libmpv doesn't use PT_QUIT, but it does. Well, sure need to handle this too.

@kasper93 kasper93 force-pushed the demux_d branch 2 times, most recently from a78808e to 1ac142c Compare October 5, 2023 19:26
@kasper93
Copy link
Member Author

kasper93 commented Oct 5, 2023

You'd have to differentiate between libmpv and the "normal player" at the least.

Added a flag now, that is set only from player, and possibly can be set by libmpv user if they want it.

@llyyr
Copy link
Contributor

llyyr commented Oct 5, 2023

This also breaks MPV_LEAK_REPORT=1

@Jules-A
Copy link

Jules-A commented Oct 5, 2023

What about making this user-configurable for now and not default?

@Dudemanguy
Copy link
Member

This also breaks MPV_LEAK_REPORT=1

Not directly related to this but is mpv leak report even useful at all anymore? I just use ASAN since it's just plain better.

@kasper93
Copy link
Member Author

kasper93 commented Oct 5, 2023

What about making this user-configurable for now and not default?

No. Internal implementation details should not be exposed to users.

@kasper93
Copy link
Member Author

kasper93 commented Oct 5, 2023

This also breaks MPV_LEAK_REPORT=1

I know, I was hoping no one is using this anymore, oh well.

@Dudemanguy
Copy link
Member

I was hoping no one is using this anymore, oh well.

I'll probably remove it in a PR later. It was probably not common used before and is definitely pretty obsolete now.

@N-R-K
Copy link
Contributor

N-R-K commented Oct 5, 2023

When exiting the player, there is no need to clear all of this, let the kernel reclaim the memory after the program exits.

Ideally this should be done everywhere, but this is a good start. Are there any other low hanging fruits? (I.e where cleanup is noticeably slow but isn't needed).

What about making this user-configurable for now and not default?

This is change that's strictly better. There's no reason (that I'm aware of) why anyone would want to disable it.

It seems to be platform related

Hopping lots of pointers to do basically nothing worthwhile is wasteful on any platform.

This also breaks MPV_LEAK_REPORT=1

Another way of looking at it is that it's MPV_LEAK_REPORT that's broken. Even setting this aside, it cannot differentiate between reachable and unreachable "leaks" - which LeakSanitizer can.

@kasper93
Copy link
Member Author

kasper93 commented Oct 5, 2023

Ideally this should be done everywhere, but this is a good start. Are there any other low hanging fruits? (I.e where cleanup is noticeably slow but isn't needed).

Yes, that's why I now added quit_fast flag, that can be used anywhere where needed. But from my quick tests, the demuxer cache is the worse offender the rest is really not noticeable. It is a good start and improves status quo a lot, we can build upon that if more issues are identified.

@Dudemanguy
Copy link
Member

Are there any other low hanging fruits? (I.e where cleanup is noticeably slow but isn't needed).

There is no way any other part of the player is slow on cleanup. And this particular section only slow in extreme cases where you have a massive cache. And even then with massive caches it seems not really reliably reproduced (at least with my naive attempts there was no hangups). Considering that mpv is also a library, we should still be cleaning up properly basically all the time. As for this specific change, I kind of personally don't like it because it feels like a sledgehammer for a edge case. But I dunno how everyone else feels.

@kasper93
Copy link
Member Author

kasper93 commented Oct 5, 2023

I kind of personally don't like it because it feels like a sledgehammer for a edge case. But I dunno how everyone else feels.

I was personally triggered when we started suggesting how to set up keybind to SIGKILL mpv... because we cannot close in reasonable time ourselves. I'm not saying this is elegant solution, but it is a solution that works and as far as I can tell will not have any side effects.

@kasper93 kasper93 force-pushed the demux_d branch 2 times, most recently from 7f68997 to 586ebd2 Compare October 6, 2023 00:30
@llyyr
Copy link
Contributor

llyyr commented Oct 6, 2023

we started suggesting how to set up keybind to SIGKILL mpv

We did it for a user who explicitly asked about it, my recommendation to that user would always be to use a sane demuxer cache value or buy more RAM.

@N-R-K
Copy link
Contributor

N-R-K commented Oct 6, 2023

I initially suggested this in here but forgot about it: but instead of LSan specific suppression, you can just avoid setting quit_fast in debug builds.

This is arguably better since it will always exercise the cleanup code under debug builds since...

Considering that mpv is also a library, we should still be cleaning up properly basically all the time

... but still avoid doing unnecessary work on release builds.

(You can probably avoid breaking MPV_LEAK_REPORT as well if anyone cares about that.)

@N-R-K
Copy link
Contributor

N-R-K commented Oct 6, 2023

Mostly bikeshedding but I feel quite strongly about the following comment, so I'll leave a response:

buy more RAM

This might be appropriate if the software actually needs/uses additional memory to do useful things but in this case it's doing what the OS would've (more efficiently) done anyways except being significantly slower and wasting more resources (reading from disk if the memory was swapped, evicting other useful stuff from shared L2 & L3 cache etc).

@Obegg

This comment was marked as off-topic.

@N-R-K
Copy link
Contributor

N-R-K commented Oct 6, 2023

This is arguably better since it will always exercise the cleanup code under debug builds

Looks like this is definitely the better way since it already found cases where shutdown_dealloc was unintentionally false. EDIT: Nevermind it was harmless. But the proposed diff still might make sense.

Proposed fix, instead of "shutdown_dealloc" change to "shutdown_fast" so that the "default value" (i.e 0) does the sensible thing of dealloc-ing by default. And also remove the suppression entirely and just dealloc always when ASan is present (could do the same for MPV_LEAK_REPORT if there's interest in it):

dealloc.diff

@kasper93
Copy link
Member Author

kasper93 commented Jan 7, 2026

Will merge this "memory leak" if there are no objections.

@sfan5
Copy link
Member

sfan5 commented Jan 7, 2026

I am still not a fan of this approach.
have there been recent reports of mpv being slow to quit?

@na-na-hi
Copy link
Contributor

na-na-hi commented Jan 8, 2026

IMO this should be an optional argument of the quit command. Memory leak by default is not good in the context of requiring exceptions for external memory leak checkers (and might be impossible when symbols are stripped).

The argument approach also allows more agressive fast quit modes, for example by calling exit() directly and let the OS clean up everything.

@kasper93
Copy link
Member Author

kasper93 commented Jan 8, 2026

IMO this should be an optional argument of the quit command. Memory leak by default is not good in the context of requiring exceptions for external memory leak checkers (and might be impossible when symbols are stripped).

Are you debugging memory leaks without symbols? Honestly I called this memory leak as a joke, because it's not a leak. It's reachable memory that is purposefully not cleaned, because there is no technical reason to do so.

The argument approach also allows more agressive fast quit modes, for example by calling exit() directly and let the OS clean up everything.

No, we want to save watch later, close scripts gracefully. Allow threads to join, where dependencies threads may want to do something like saving cache or anything. OS cleans after us regardless, you don't have to call exit().

@llyyr
Copy link
Contributor

llyyr commented Jan 8, 2026

As I understand, the goal is just to make the perceived mpv exit faster, then as I suggested before, it'd be better to destroy VO window first thing. I understand this could be a big change, but if we want to do it then that'd be the best way.

As an aside, in at least one of the issues, one potential cause of slow exit could be that we do mp_shutdown_clients which waits on all scripts and the issue mentions using autoload.lua on large network directory. There's a good chance this PR doesn't solve that issue at all, because we're not waiting on demuxer at all, but on clients instead

@kasper93
Copy link
Member Author

kasper93 commented Jan 8, 2026

As I understand, the goal is just to make the perceived mpv exit faster, then as I suggested before, it'd be better to destroy VO window first thing. I understand this could be a big change, but if we want to do it then that'd be the best way.

As an aside, in at least one of the issues, one potential cause of slow exit could be that we do mp_shutdown_clients which waits on all scripts and the issue mentions using autoload.lua on large network directory. There's a good chance this PR doesn't solve that issue at all, because we're not waiting on demuxer at all, but on clients instead

Those are all separate improvements that can be done. This PR focuses on significant delay that is caused by traversing linked list of hundreds thousands of packets. I do not claim to fix other issues.

We can block this PR forever on "I don't like it" basis, but do we have any technical or otherwise issues with this approach? Except probably debugging mpv without symbols... with leak checkers that don't recognize reachable memory... which is no issue really.

@na-na-hi
Copy link
Contributor

Are you debugging memory leaks without symbols? Honestly I called this memory leak as a joke, because it's not a leak.

Requiring exceptions is already bad enough especially when mpv is the only "bad kid in the block". Because currently mpv cleans up memory completely, automated memory leak tests can check mpv at strict settings without needing to add exceptions.

For example, valgrind is strict by default, and it will only report no leak if all memory are freed, even reachable memory.

As you mentioned, the benefit of this PR is situational for users with large demux cache settings. By not making this behavior a defailt and make it opt-in, users with such cache settings can opt-in the fast quitting behavior.

Can you demonstrate any significant measurable benefit for users with default cache settings and with enough memory to not swap significantly? If not, then this should not be enabled by default.

@N-R-K
Copy link
Contributor

N-R-K commented Jan 10, 2026

the benefit of this PR is situational for users with large demux cache settings.

If the argument is that defaults should be set based on majority of the users (valid argument) then...

For example, valgrind is strict by default, and it will only report no leak if all memory are freed, even reachable memory.

...this is inconsistent. Memory debugging is NOT a common user scenario at all.

Can you demonstrate any significant measurable benefit for users with default cache settings and with enough memory to not swap significantly? If not, then this should not be enabled by default.

Let's assume it only matters for a small portion of the users, but are the rest of them being harmed by this change? No. So there's no downside for any user whatsoever if this behavior is default while having upside for some.

Only downside would be for devs trying to memory debug, which again is not the common usecase so it shouldn't dictate the default according to your own argument.

@na-na-hi
Copy link
Contributor

If the argument is that defaults should be set based on majority of the users (valid argument) then...

No, this was never the implication. Clean up all memory is good programming practice and hygiene, and not cleaning it up needs to have a good reason to do so. You present as if not cleaning up is as good as cleaning up in the absense of performance issues, which is not the case.

The rest of your argument builds on this assumption, so it is moot.

@N-R-K
Copy link
Contributor

N-R-K commented Jan 10, 2026

Clean up all memory is good programming practice and hygiene

This again just boils down to "I don't like it". It's not backed up by technical reason. In fact there's technical reason NOT to do needless cleanup ourself when the OS will do it anyways - and much faster since it's doesn't have to hop individual pointers.

@sfan5
Copy link
Member

sfan5 commented Jan 10, 2026

If pointer chasing is the bottleneck in clearing the demux pool then we could consider a different data structure.
I thought it was the libc's malloc implementation, since (afaik) so far the reports of this problem were exclusively on Windows.

@na-na-hi
Copy link
Contributor

na-na-hi commented Jan 10, 2026

This again just boils down to "I don't like it". It's not backed up by technical reason.

This is simply not true. Even the current situation with talloc the practice is already lenient enough that without carefully manually freeing memory at the right place, unbonded memory growth can occur even though they are not techniacally leaked.

The PR would also not help any libmpv clients at all, since it is disabled, and it cannot afford leaking as a library. So it is more like a partial workaround.

If pointer chasing is the bottleneck in clearing the demux pool then we could consider a different data structure.

Agreed. If they are allocated into a few large chunks of continuous memory regions, freeing will not be a bottleneck. This is a better solution than this PR.

@N-R-K
Copy link
Contributor

N-R-K commented Jan 10, 2026

[...] without carefully manually freeing memory at the right place, unbonded memory growth can occur even though they are not techniacally leaked.

Holding heavy resources that are not needed is obviously bad. But no one is advocating for that and I fail to see how that's relavant to this PR.

The PR would also not help any libmpv clients at all, since it is disabled, and it cannot afford leaking as a library. So it is more like a partial workaround.

It's been a while since I've looked at this, but I think clients can set mpctx->quit_fast if they want.

Agreed. If they are allocated into a few large chunks of continuous memory regions, freeing will not be a bottleneck. This is a better solution than this PR.

Again, been a while so I might remember wrong, but using better allocation pattern/data structure was suggested but we don't control the data structure/allocation functions, ffmpeg does. And every packet needs to allocated and free-ed individually.

@kasper93
Copy link
Member Author

Clean up all memory is good programming practice and hygiene, and not cleaning it up needs to have a good reason to do so.

Managing and keeping track of objects lifetime is good practice and we do that. "good reason" is arguable here.

If pointer chasing is the bottleneck in clearing the demux pool then we could consider a different data structure.

Such us? If this was a possibility I would do that. See we only need linked list to track allocated packets, currently we pool them, and access from the front, append to the end. At no point there is a need to iterate or access other packets.

Except when we are clearing them at the end, which I argue is pointless work. The only way to work with that is to clear all packets at once, which we cannot do, whatever structure you use to store pointer to AVPackets, you still need to iterate over this million long storage and call free individually.

And sure, some packets may be backed by some more resources, that we can avoid clearing manually.

I thought it was the libc's malloc implementation, since (afaik) so far the reports of this problem were exclusively on Windows.

I've reproduced the issue on linux primarily. The fact that mpv's biggest user base is on Windows causes bias towards reporting issues on Windows. Also linux is likely faster, but not immune to the issue, especially if memory is swapped.

The PR would also not help any libmpv clients at all, since it is disabled, and it cannot afford leaking as a library. So it is more like a partial workaround.

It doesn't harm them either. If any user of libmpv wants such feature they can request or add an api to set this flag.

Let's assume it only matters for a small portion of the users, but are the rest of them being harmed by this change? No.

Exactly, this is a pattern that is recurring. There is "a change" that improves A and doesn't degrade anything else, but people come to review on "I don't like A" basis and try to move the bar to some imaginary scenarios that effectively stall "a change'.

I don't claim this PR fixes all mpv slowdowns, I don't claim it fixes libmpv. I fix one specific scenario, with to the best of my knowledge doesn't have negative effects.

I agree that debugging/leak checking is affected, but it is trivial to adjust this as needed.

Agreed. If they are allocated into a few large chunks of continuous memory regions, freeing will not be a bottleneck. This is a better solution than this PR.

How would you achieve that? Mind I remind you:

-------- 8< --------- FFmpeg 4.4 was cut here -------- 8< ---------

2021-03-17 - f7db77bd87 - lavc 58.133.100 - codec.h
  Deprecated av_init_packet(). Once removed, sizeof(AVPacket) will
  no longer be a part of the public ABI.
  Deprecated AVPacketList.

also AVPackets themselves have child allocations, so still you have to clear them all individually.

It's been a while since I've looked at this, but I think clients can set mpctx->quit_fast if they want.

There is no public API currently, which is out of scope of this PR, first we have to agree on the change itself.

Again, been a while so I might remember wrong, but using better allocation pattern/data structure was suggested but we don't control the data structure/allocation functions, ffmpeg does. And every packet needs to allocated and free-ed individually.

Correct.

@kasper93 kasper93 added the priority:on-ice may be revisited later label Jan 10, 2026
@voidpointertonull
Copy link

I thought it was the libc's malloc implementation, since (afaik) so far the reports of this problem were exclusively on Windows.

I've reproduced the issue on linux primarily. The fact that mpv's biggest user base is on Windows causes bias towards reporting issues on Windows. Also linux is likely faster, but not immune to the issue, especially if memory is swapped.

It's definitely not just Windows, I can confirm it by regularly using 8-32 GiB cache sizes for scrubbing through large videos without wanting to be bottlenecked by HDD performance.

If pointer chasing is the bottleneck in clearing the demux pool then we could consider a different data structure.

Such us? If this was a possibility I would do that. See we only need linked list to track allocated packets, currently we pool them, and access from the front, append to the end. At no point there is a need to iterate or access other packets.

There are definitely other improvements which could be done, such as:

  • Using madvise with MADV_HUGEPAGE on the cached data to opportunistically reduce page table cache trashing which would also improve playback performance
  • Explicit prefetching (__builtin_prefetch) of all known pointers ahead of time to reduce being memory latency bound
  • Grouping packets, so instead of each pointing at the next, a group of let's say up to 256 packets would point to the next group, reducing indirection, and memory allocator pressure. This may also help with some really silly files, like I've seen a couple oddities being seemingly packet decode bound, just pulling data into cache even with playback stopped being really slow without being I/O bound
  • Freeing can be kicked off to an extra thread(pool), but this is really getting niche with even drawbacks for some other use cases

Generally I'm a fan of the work done here, even if people are afraid of using some of the capabilities of the OS.
The only changes I'd make is keeping around the pointer zeroing of demux_packet_pool_clear for safety, and maybe renaming quit_fast to something more dangerous as a principle of least surprise consideration.

But even as-is, this is useful work for use cases with large caches, and I also believe that shortcomings for the "I don't like it" "use cases" weren't actually proven to even exist.

@sfan5
Copy link
Member

sfan5 commented Jan 11, 2026

Such us? If this was a possibility I would do that.

@voidpointertonull already said it but: something like std::deque, which is basically

struct chunk {
    T content[512]; // T = AVPacket*
    struct chunk *next;
};

I've reproduced the issue on linux primarily. The fact that mpv's biggest user base is on Windows causes bias towards reporting issues on Windows. Also linux is likely faster, but not immune to the issue, especially if memory is swapped.

The reason I said this is because I remember the Windows CRT malloc known to be slow.

How would you achieve that? Mind I remind you:

Really I don't see why ffmpeg couldn't add a function that returns sizeof(AVPacket).
Or more involved if you want to preclude obvious misuse:

/// Initializes an AVPacket in-place in the buffer given by the caller.
/// @param buf buffer (must be aligned to XXX)
/// @param buflen number of available bytes in buffer
/// @param consumed number of bytes consumed by the AVPacket will be written here
/// @return 0 if success: the AVPacket will live at the beginning of the buffer
int av_packet_alloc_inplace(char *buf, size_t buflen, size_t *consumed);

@N-R-K
Copy link
Contributor

N-R-K commented Jan 11, 2026

Using madvise with MADV_HUGEPAGE [...]

We don't control where the allocation is placed and it maybe scattered all over the address space.

Grouping packets

I don't mind this. But you still need to needlessly free on all the packets (and their nested allocations) individually.

Freeing can be kicked off to an extra thread(pool)

This might be faster than doing it single threaded, but I'm going to bet that it won't be faster than the OS.

So we'd be doing more work using more resource just to get worse results than what doing no work would give.

Generally I'm a fan of the work done here, even if people are afraid of using some of the capabilities of the OS.

Agreed. There's no need for us to step in and do a shoddy job when the OS would do a much better job anyways. And I just don't buy the argument that it's "best practice" to waste resources doing absolutely nothing worthwhile.

@N-R-K
Copy link
Contributor

N-R-K commented Jan 11, 2026

av_packet_alloc_inplace

This is interesting. If we can place all the packets in one (or a few) large array(s) and call free only a couple times on the few arrays rather than individual packets, then that's not so bad. But @kasper93 said earlier that packets also have nested allocations, so you'd need to iterate and free individually anyways.

@sfan5
Copy link
Member

sfan5 commented Jan 11, 2026

But @kasper93 said earlier that packets also have nested allocations, so you'd need to iterate and free individually anyways.

They can and do in practice. If we want to fix this in a different way we need to know what of...

  • calls to free for the AVPacket itself
  • calls to free for AVPacket contents (what exactly?)
  • pointer chasing in linked list of pool
  • something else

...is the bottleneck and how to best address it.

@voidpointertonull
Copy link

struct chunk {
    T content[512]; // T = AVPacket*
    struct chunk *next;
};

Something like that, although packets don't seem to be particularly fat, so thought something more like this:

struct chunk
{
    struct demux_packet packets[256];
    uint8_t begin;
    uint8_t end;
    struct chunk *next;
}

This would reduce the pressure on the memory allocator, and cut down on indirection, there's enough of that inside demux_packet anyway.

Oh, just noticed T = AVPacket* .
There's definitely more info to be stored than that.

Using boundary indices are not necessarily the best if packets are often removed from the middle, but if it's a rare occasion, then such an operation could split the chunk into 2 chunks, they don't have to be full.

Still, I'm not joining the bikeshedding committee, these improvements can be considered elsewhere.
This commit is good as-is, the safety pointer zeroing and a more "here be dragons" naming would already make it perfect, no need to stall on anything else here.

Using madvise with MADV_HUGEPAGE [...]

We don't control where the allocation is placed and it maybe scattered all over the address space.

Definitely didn't go deep enough to confidently say it is easy to apply in this case, I just know that the performance gains are typically quite neat, so could be worth a try somewhere.
It kind of feels like mpv is the kind of program where hugepages being used elsewhere is unlikely to go too wrong to begin with.

Freeing can be kicked off to an extra thread(pool)

This might be faster than doing it single threaded, but I'm going to bet that it won't be faster than the OS.

So we'd be doing more work using more resource just to get worse results than what doing no work would give.

Technically it can be faster, at least I'm guessing that process cleanup is single-threaded in the kernel, and I've seen some kernel work on page table lock improvements, but the issue is mostly with the more resource usage problem you mentioned.

I don't think it's worth it in this case, just mentioned as one potential alternative as this PR seems to be quite stuck, and I've been just looking forward to any kind of improvement.

av_packet_alloc_inplace

This is interesting. If we can place all the packets in one (or a few) large array(s) and call free only a couple times on the few arrays rather than individual packets, then that's not so bad. But @kasper93 said earlier that packets also have nested allocations, so you'd need to iterate and free individually anyways.

Apparently this intentionally doesn't exist, as AVPacket* is desired to be an opaque pointer: "sizeof(AVPacket) being a part of the public ABI is deprecated".

I believe this is rather silly, but then this is yet another reason why the current PR is fine as-is, and chasing perfection would just leave the known good improvement wasting away here for more years.

@obscenelyvague
Copy link

Before this gets stalled for another year @kasper93 can you rebase this over f8577d1 pls thx

A new field, MPContext::quit_fast, has been introduced. Setting this
field to true bypasses the time-consuming packet deallocation process.
This is particularly useful when exiting the process, as there's no need
to iterate over and free all packets. The cache, stored as a linked list
with potentially millions of small packets, requires significant time to
iterate through and deallocate, especially if the memory is swapped out.

This change significantly speeds up the player closure, especially when
a large demuxer cache is used.

Fixes: mpv-player#12294
Fixes: mpv-player#12563
@kasper93
Copy link
Member Author

But @kasper93 said earlier that packets also have nested allocations, so you'd need to iterate and free individually anyways.

They can and do in practice. If we want to fix this in a different way we need to know what of...

* calls to `free` for the `AVPacket` itself

* calls to `free` for `AVPacket` contents (what exactly?)

* pointer chasing in linked list of pool

* something else

...is the bottleneck and how to best address it.

All of the above. AVPacket structure itself is just a wrapper over the underlying data. The main issue is that output from demuxer is huge amount of small packets. So when you request cache that has multiple GB in size, the number of packets grow very quickly.

Many small allocations invites fragmentation which were an issue already on glibc #12076, but that's not what were are fixing here anyway.

Now, any access to AVPacket will(can) force page fault, and bring whole page to memory. This affects our linked list, avpacket itself and underlying buffer allocation, all of which can be small allocations in different pages. Hence why this whole exercise of iterating all packets and doing "dummy" free() is expensive. While free itself doesn't have to read the released memory, we have to dig to that pointer by 3 or more layers of indirection (linked list->avpacket->avbuffer->....), where the avbuffer may have custom deallocator function.

Even if we patch ffmpeg to allow allocating struct AVPacket in bigger chunks this does not fix the underlying allocations. While for video data (and audio) the packets are bigger in general, for demuxed data, they are small and to back the GBs of cache, it just doesn't scale. But again, it doesn't have to, because we don't ever need to iterate all packets, any random access is after seek and it will naturally page fault some packets, but only in the seek target area, not whole cache.

To summary, I don't see the solution to those problems by changing allocation scheme, when we do not even need to access those packets in most cases. The only reliable solution would be to have custom memory allocator, and group all those allocations in bigger chunks, to allow effectively page fault this memory when needed. Though ffmpeg doesn't allow any external allocators and like I said there is no reason to complicate this.

Before this gets stalled for another year @kasper93 can you rebase this over f8577d1 pls thx

Pushed.

@sfan5
Copy link
Member

sfan5 commented Jan 16, 2026

To summary, I don't see the solution to those problems by changing allocation scheme, when we do not even need to access those packets in most cases.

To be clear, it's obvious that "not freeing" will always be the most efficient way to quit an mpv process. Allowing custom allocators for packets and their data on the ffmpeg is still a worthwhile goal IMO to reduce fragmentation at runtime/in general.

My personal opinion is not that we shouldn't merge this PR but it would be nice if we could get ffmpeg to consider the alternatives.

@kasper93
Copy link
Member Author

kasper93 commented Jan 16, 2026

To be clear, it's obvious that "not freeing" will always be the most efficient way to quit an mpv process. Allowing custom allocators for packets and their data on the ffmpeg is still a worthwhile goal IMO to reduce fragmentation at runtime/in general.

Packets are not cleared during runtime already, they are returned to the packet pool where they wait to be reused.

My personal opinion is not that we shouldn't merge this PR but it would be nice if we could get ffmpeg to consider the alternatives.

It's still not clear to me what are the alternatives. Especially on ffmpeg side, any changes in this area would require huge refactoring and it's not even clear to me how we could make this work differently that would resolve the issue. Only feasible change would be to implement custom allocator and use it throughout the ffmpeg, this would allow us to manage the memory better, but this is big project in itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

priority:on-ice may be revisited later

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Avoid free() on quit