fit-params : refactor + add option to output estimated memory per device by ggerganov · Pull Request #22171 · ggml-org/llama.cpp

ggerganov · 2026-04-20T14:07:21Z

Overview

cont #16653
ref #19070 (review)

Refactor the fit param logic. Move from libllama to libcommon
Add CLI argument -fite, --fit-estimate to the llama-fit-params tool. This is useful for 3rd party applications to estimate the required memory for a model.

Additional information

Example:

llama-fit-params -m ~/models/gemma-3-4b-it/ggml-model-f16.gguf -c 32768 --fit-estimate on

0.00.196.882 I main: printing estimated memory in MiB to stdout (device, model, context, compute) ...
MTL0 7401 814 517 
host 1280 0 154

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: NO

JohannesGaessler · 2026-04-20T14:24:40Z

 #include "llama.h"

 #include <cstdint>
+#include <vector>


We are making llama-ext.h a C++ only header then?

This header is not required to be C-style. It is for staging new API and can be C++ if needed.

Once an API is ready to become public, it has to be become C-style.

JohannesGaessler · 2026-04-20T14:28:16Z

+        for (size_t id = 0; id < devs.size(); id++) {
+            printf("%s ",  ggml_backend_dev_name(devs[id]));
+            printf("%zu ", dmd[id].mb.model/1024/1024);
+            printf("%zu ", dmd[id].mb.context/1024/1024);
+            printf("%zu ", dmd[id].mb.compute/1024/1024);
+            printf("\n");
        }
-        printf("%s%s=%s", itbo > 0 ? "," : "", mparams.tensor_buft_overrides[itbo].pattern, ggml_backend_buft_name(mparams.tensor_buft_overrides[itbo].buft));
-        any_tbo = true;
+        printf("Host ");
+        printf("%zu ", dmd.back().mb.model/1024/1024);
+        printf("%zu ", dmd.back().mb.context/1024/1024);
+        printf("%zu ", dmd.back().mb.compute/1024/1024);
+        printf("\n");
    }


If the intent is to parse this output programmatically I think it would be preferable to use a well-defined format like JSON.

For me, JSON is a huge overhead and in this specific case I don't think it is warranted because the data that we want to output is very simple.

JohannesGaessler · 2026-04-20T14:30:39Z

+        { "-fite", "--fit-estimate" }, "[on|off]",
+        string_format("estimate the required memory to run the model ('on' or 'off', default: '%s')", params.fit_params_est ? "on" : "off"),


I think this descriptions is confusing. It is true that with this option enabled the program will estimate the required memory but that is already what it's doing anyways. Maybe it would be better to call this something like --fit-print (or --fit-print-json depending on my other comment).

JohannesGaessler · 2026-04-20T14:43:54Z

-    LOG_INF("%s: printing fitted CLI arguments to stdout...\n", __func__);
-    common_log_flush(common_log_main());
-    printf("-c %" PRIu32 " -ngl %" PRIi32, cparams.n_ctx, mparams.n_gpu_layers);
+    if (!params.fit_params_est) {


In principle, if the only goal is to disable the fitting this can already be done by manually setting e.g. -c 0 -ngl 999. The code should then recognize that these have been set manually and will not alter them. The only downside vs. the current approach would be that it's slightly slower than to only retrieve llama_get_device_memory_data once. But if we're already messing around with internal headers anyways we may as well make llama_params_fit_impl return the device memory data instead. Then all that would need to be done is change what is being printed.

ggerganov · 2026-04-20T17:29:07Z

I took the opportunity to refactor the implementation and move the param fitting logic outside of libllama. It's something that I suggested in the original PR (#16653 (review)) and I think it makes sense to have this logic in libcommon for now. IMO the param fitting API on master is too narrow and does not allow flexibility for the user code. The idea is to eventually expose a much more lightweight and flexible API that would allow the applications to implement more sophisticated logic for param fitting and memory queries.

The prototype of this API is now in llama-ext.h:

llama.cpp/src/llama-ext.h

Lines 61 to 90 in 7681f17

    
           // 
        
           // device memory querying 
        
           // 
        
           // "memory" as in physical memory for a buffer type, in bytes 
        
           struct llama_memory_breakdown_data { 
        
               size_t model   = 0; // memory allocated for the model 
        
               size_t context = 0; // memory allocated for the context 
        
               size_t compute = 0; // memory allocated for temporary compute buffers 
        
               size_t total() const { 
        
                   return model + context + compute; 
        
               } 
        
           }; 
        
           struct llama_device_memory_data { 
        
               int64_t total; 
        
               int64_t free; 
        
               llama_memory_breakdown_data mb; 
        
           }; 
        
           // TODO: convert to C-style data structure 
        
           using llama_memory_breakdown = std::map<ggml_backend_buffer_type_t, llama_memory_breakdown_data>; 
        
           int32_t llama_model_n_expert (const struct llama_model * model); 
        
           int32_t llama_model_n_devices(const struct llama_model * model); 
        
           ggml_backend_dev_t llama_model_get_device(const struct llama_model * model, int i); 
        
           llama_memory_breakdown llama_get_memory_breakdown(const struct llama_context * ctx);

JohannesGaessler

Please also add me as a code owner for fit.cpp.

fit-params : add option to output estimated memory per device

4458b62

ggerganov requested a review from a team as a code owner April 20, 2026 14:07

ggerganov requested a review from JohannesGaessler April 20, 2026 14:08

cont : minor

0828676

JohannesGaessler reviewed Apr 20, 2026

View reviewed changes

github-actions bot added the examples label Apr 20, 2026

ggerganov added 2 commits April 20, 2026 19:34

cont : refactor

6ffba78

cont : move fit params implementation to libcommon

a71cdca

ggerganov requested review from a team, CISC and ngxson as code owners April 20, 2026 17:16

cont : header

7681f17

ggerganov added the breaking change Changes that break ABIs, APIs, file formats, or other forms of backwards compatibility. label Apr 20, 2026

ggerganov changed the title ~~fit-params : add option to output estimated memory per device~~ fit-params : refactor + add option to output estimated memory per device Apr 20, 2026

cont : headers

1d26971

JohannesGaessler approved these changes Apr 20, 2026

View reviewed changes

github-actions bot added the server label Apr 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fit-params : refactor + add option to output estimated memory per device#22171

fit-params : refactor + add option to output estimated memory per device#22171
ggerganov wants to merge 6 commits intomasterfrom
gg/fit-params-estimate

ggerganov commented Apr 20, 2026 •

edited

Loading

Uh oh!

JohannesGaessler Apr 20, 2026

Uh oh!

ggerganov Apr 20, 2026

Uh oh!

JohannesGaessler Apr 20, 2026

Uh oh!

ggerganov Apr 20, 2026

Uh oh!

JohannesGaessler Apr 20, 2026

Uh oh!

JohannesGaessler Apr 20, 2026

Uh oh!

ggerganov commented Apr 20, 2026

Uh oh!

JohannesGaessler left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		{ "-fite", "--fit-estimate" }, "[on\|off]",
		string_format("estimate the required memory to run the model ('on' or 'off', default: '%s')", params.fit_params_est ? "on" : "off"),

Conversation

ggerganov commented Apr 20, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview

Additional information

Requirements

Uh oh!

JohannesGaessler Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

JohannesGaessler Apr 20, 2026

Choose a reason for hiding this comment

Uh oh!

ggerganov commented Apr 20, 2026

Uh oh!

JohannesGaessler left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

ggerganov commented Apr 20, 2026 •

edited

Loading