Slow response times

### Which model?

Gemma 4 31B

### Mac model + RAM

M1 Max 64GB

### What happened?

Thanks for creating this repository with the capability I have been desiring for some time. I have one issue that I am not able to resolve (yet). Any assisistance you can provide would be greatly appreciated.

When prompting via the claude CLI, the responses are very slow (78.6s / 3m 23s) and the response doesn't return back to the claude CLI prompt immediately even through I see the response in the server logs. BTW...In order to get things to run properly without prompting for a login to Anthropic, I added the following to my ~.claude.json:

"primaryApiKey": "sk-local"

Any ideas why the responses are very slow? I tried adding the following setting to my ~/.claude/settings.json:

"CLAUDE_CODE_ATTRIBUTION_HEADER" : "0"

This did not change the slowness of the responses. In the posted screenshot you can see the overall time it took for the response was 3m 23s. In the logs, you can see tha tthe response was generated @ 06:55:23, however, in the last log you can see that it was not returned to the CLI until 06:57:03. I also notice the fan on my Mac M1 going crazy. I wonder if this machine has enough power to run the model. 

<img width="1934" height="1301" alt="Image" src="https://github.com/user-attachments/assets/831a9fb7-3bd5-4317-ac77-2e5cc4535b72" />

### Relevant logs or error output

```shell

```

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow response times #28

Which model?

Mac model + RAM

What happened?

Relevant logs or error output

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Slow response times #28

Description

Which model?

Mac model + RAM

What happened?

Relevant logs or error output

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions