Which model?
Gemma 4 31B
Mac model + RAM
M1 Max 64GB
What happened?
Thanks for creating this repository with the capability I have been desiring for some time. I have one issue that I am not able to resolve (yet). Any assisistance you can provide would be greatly appreciated.
When prompting via the claude CLI, the responses are very slow (78.6s / 3m 23s) and the response doesn't return back to the claude CLI prompt immediately even through I see the response in the server logs. BTW...In order to get things to run properly without prompting for a login to Anthropic, I added the following to my ~.claude.json:
"primaryApiKey": "sk-local"
Any ideas why the responses are very slow? I tried adding the following setting to my ~/.claude/settings.json:
"CLAUDE_CODE_ATTRIBUTION_HEADER" : "0"
This did not change the slowness of the responses. In the posted screenshot you can see the overall time it took for the response was 3m 23s. In the logs, you can see tha tthe response was generated @ 06:55:23, however, in the last log you can see that it was not returned to the CLI until 06:57:03. I also notice the fan on my Mac M1 going crazy. I wonder if this machine has enough power to run the model.
Relevant logs or error output
Which model?
Gemma 4 31B
Mac model + RAM
M1 Max 64GB
What happened?
Thanks for creating this repository with the capability I have been desiring for some time. I have one issue that I am not able to resolve (yet). Any assisistance you can provide would be greatly appreciated.
When prompting via the claude CLI, the responses are very slow (78.6s / 3m 23s) and the response doesn't return back to the claude CLI prompt immediately even through I see the response in the server logs. BTW...In order to get things to run properly without prompting for a login to Anthropic, I added the following to my ~.claude.json:
"primaryApiKey": "sk-local"
Any ideas why the responses are very slow? I tried adding the following setting to my ~/.claude/settings.json:
"CLAUDE_CODE_ATTRIBUTION_HEADER" : "0"
This did not change the slowness of the responses. In the posted screenshot you can see the overall time it took for the response was 3m 23s. In the logs, you can see tha tthe response was generated @ 06:55:23, however, in the last log you can see that it was not returned to the CLI until 06:57:03. I also notice the fan on my Mac M1 going crazy. I wonder if this machine has enough power to run the model.
Relevant logs or error output