Skip to content

BUG - Deadlock by Design #22160

@kripper

Description

@kripper

Name and Version

The client is repeatedly sending the same request

The client (OpenHands) sends the request and apparently stops the task due to a client-side timeout.
llama-server does not use the prompt cache, so the timeout repeats, causing a deadlock.
Hardware is generating unnecesary CO₂ emission.

By design, does “stop task” imply no cache?

IMO: By default, we should use the cache whenever possible [1].

Deadlogs

Apr 20 04:49:31 | slot  0 | task 16004 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 04:53:46 | slot  0 | task 16004 | n_tokens = 43476, batch.n_tokens = 256, progress = 0.551453
Apr 20 04:53:47 | srv          stop: cancel task, id_task = 16004

Apr 20 04:54:34 | slot  0 | task 16511 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 04:58:47 | slot  0 | task 16511 | n_tokens = 43220, batch.n_tokens = 256, progress = 0.548206
Apr 20 04:58:48 | srv          stop: cancel task, id_task = 16511

Apr 20 04:59:33 | slot  0 | task 16762 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 05:03:47 | slot  0 | task 16762 | n_tokens = 43220, batch.n_tokens = 256, progress = 0.548206
Apr 20 05:03:49 | srv          stop: cancel task, id_task = 16762

Apr 20 05:04:35 | slot  0 | task 17728 | n_tokens = 15316, batch.n_tokens = 256, progress = 0.194269
...
Apr 20 05:08:55 | slot  0 | task 17728 | n_tokens = 43732, batch.n_tokens = 256, progress = 0.554700
Apr 20 05:08:57 | srv          stop: cancel task, id_task = 17728

Tested with glm-4.7-flash and latest version of llama.cpp


[1] Como diría dijo un viejo "cachero".

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions