openalchemy

Here are 2 public repositories matching this topic...

openalchemy / whisper.cpp

OpenAlchemy fork of whisper.cpp — RTX 50-series Blackwell (sm_120) NULL-slot guards on top of upstream. Powers /v1/audio/transcriptions

cuda speech-to-text whisper blackwell distributed-inference sm-120 openalchemy

Updated Jun 1, 2026
C++

openalchemy / llama.cpp

Star

🧪 OpenAlchemy fork of llama.cpp — TurboQuant KV-cache compression (3-bit / 2-bit). ~4 GB VRAM saved on Qwen2.5-Coder-14B Q4_K_M @ 32k ctx, +47% gen speed.

compression fork cuda quantization kv-cache llama-cpp llm-inference gguf turboquant openalchemy

Updated Jun 2, 2026
C++

Improve this page

Add a description, image, and links to the openalchemy topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the openalchemy topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly