Name and Version
The client is repeatedly sending the same request
The client (OpenHands) sends the request and apparently stops the task due to a client-side timeout.
llama-server does not use the prompt cache, so the timeout repeats, causing a deadlock.
Hardware is generating unnecesary CO₂ emission.
By design, does “stop task” imply no cache?
IMO: By default, we should use the cache whenever possible [1].
Deadlogs
Apr 20 04:49:31 | slot 0 | task 16004 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 04:53:46 | slot 0 | task 16004 | n_tokens = 43476, batch.n_tokens = 256, progress = 0.551453
Apr 20 04:53:47 | srv stop: cancel task, id_task = 16004
Apr 20 04:54:34 | slot 0 | task 16511 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 04:58:47 | slot 0 | task 16511 | n_tokens = 43220, batch.n_tokens = 256, progress = 0.548206
Apr 20 04:58:48 | srv stop: cancel task, id_task = 16511
Apr 20 04:59:33 | slot 0 | task 16762 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 05:03:47 | slot 0 | task 16762 | n_tokens = 43220, batch.n_tokens = 256, progress = 0.548206
Apr 20 05:03:49 | srv stop: cancel task, id_task = 16762
Apr 20 05:04:35 | slot 0 | task 17728 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 05:08:55 | slot 0 | task 17728 | n_tokens = 43732, batch.n_tokens = 256, progress = 0.554700
Apr 20 05:08:57 | srv stop: cancel task, id_task = 17728
Tested with glm-4.7-flash and latest version of llama.cpp
[1] Como diría dijo un viejo "cachero".
Name and Version
The client is repeatedly sending the same request
The client (OpenHands) sends the request and apparently stops the task due to a client-side timeout.
llama-server does not use the prompt cache, so the timeout repeats, causing a deadlock.
Hardware is generating unnecesary CO₂ emission.
By design, does “stop task” imply no cache?
IMO: By default, we should use the cache whenever possible [1].
Deadlogs
Tested with glm-4.7-flash and latest version of llama.cpp
[1] Como diría dijo un viejo "cachero".