I just noticed this, significant overstatement of the context used, way over the actual supported by the model instance:
Despite it, the model does behave well, has sensible, coherent output.
Also note how the max length is reported as 200k when the /v1/models does provide some info to say it is 256k:
curl http://[::1]:8078/v1/models
{"models":[{"name":"unsloth/gemma-4-31B-it-UD-Q8_K_XL","model":"unsloth/gemma-4-31B-it-UD-Q8_K_XL","modified_at":"","size":"","digest":"","type":"model","description":"","tags":[""],"capabilities":["completion","multimodal"],"parameters":"","details":{"parent_model":"","format":"gguf","family":"","families":[""],"parameter_size":"","quantization_level":""}}],"object":"list","data":[{"id":"unsloth/gemma-4-31B-it-UD-Q8_K_XL","aliases":["unsloth/gemma-4-31B-it-UD-Q8_K_XL"],"tags":[],"object":"model","created":1783253170,"owned_by":"llamacpp","meta":{"vocab_type":2,"n_vocab":262144,"n_ctx":262144,"n_ctx_train":262144,"n_embd":5376,"n_params":30697345596,"size":35004205296,"ftype":"Q8_0"}}]}
jcode provider add local --base-url http://[::1]:8078/v1 --no-api-key -m "unsloth/gemma-4-31B-it-UD-Q8_K_XL" --set-default
I just noticed this, significant overstatement of the context used, way over the actual supported by the model instance:
Despite it, the model does behave well, has sensible, coherent output.
Also note how the max length is reported as 200k when the /v1/models does provide some info to say it is 256k:
The provider was added with:
commit: 7d5e840