llcuda v2.2.0 Documentation

Official documentation website for llcuda v2.2.0 - CUDA 12 inference backend for Unsloth with Graphistry network visualization on Kaggle dual Tesla T4 GPUs.

🌐 Live Documentation: https://llcuda.github.io/

What is llcuda v2.2.0?

llcuda is a CUDA 12 inference backend specifically designed for deploying Unsloth-fine-tuned models on Kaggle's dual Tesla T4 GPUs (30GB total VRAM).

Key Features

🚀 Dual T4 Support: Run on Kaggle's 2× Tesla T4 GPUs (15GB each)
🔥 Split-GPU Architecture: LLM on GPU 0, Graphistry on GPU 1
⚡ Native CUDA tensor-split: llama.cpp layer distribution (NOT NCCL)
🎯 70B Model Support: Run Llama-70B IQ3_XS on 30GB VRAM
📦 29 GGUF Quantization Formats: K-quants and I-quants
🔧 OpenAI-compatible API: Drop-in replacement via llama-server
🌐 Graphistry Integration: Extract and visualize knowledge graphs

Performance Benchmarks

Model	Quantization	VRAM	Speed	Platform
Gemma 2-2B	Q4_K_M	~3 GB	~60 tok/s	Single T4
Llama-3.2-3B	Q4_K_M	~4 GB	~45 tok/s	Single T4
Qwen-2.5-7B	Q4_K_M	~7 GB	~25 tok/s	Single T4
Llama-70B	IQ3_XS	~28 GB	~12 tok/s	Dual T4

Quick Links

Documentation Structure

Getting Started: Installation, quick start, Kaggle setup
Kaggle Dual T4: Multi-GPU inference, tensor-split, large models
Tutorial Notebooks: 10 comprehensive Kaggle notebooks
Architecture: Split-GPU design, LLM + Graphistry
Unsloth Integration: Fine-tuning → GGUF → Deployment
Graphistry & Visualization: Knowledge graph extraction
Performance: Benchmarks, optimization, memory management
GGUF & Quantization: K-quants, I-quants, selection guide
API Reference: ServerManager, MultiGPU, GGUF tools

Development

Setup

# Install dependencies
pip install mkdocs-material mkdocs-minify-plugin

# Serve locally
mkdocs serve

# View at http://127.0.0.1:8000

Deployment

# Deploy to GitHub Pages
mkdocs gh-deploy

SEO & Keywords

llcuda, CUDA 12, Tesla T4, Kaggle, dual GPU, LLM inference, Unsloth, GGUF, quantization, llama.cpp, multi-GPU, tensor-split, Graphistry, knowledge graphs, FlashAttention, 70B models, split-GPU architecture, Kaggle notebooks, RAPIDS, cuGraph, PyGraphistry

Version

llcuda v2.2.0 - CUDA12 Inference Backend for Unsloth

Released: January 2025

Name		Name	Last commit message	Last commit date
Latest commit History 28 Commits
docs		docs
.gitignore		.gitignore
DEPLOYMENT_COMPLETE_v2.1.0.md		DEPLOYMENT_COMPLETE_v2.1.0.md
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
GOOGLE_SEARCH_CONSOLE_SETUP.md		GOOGLE_SEARCH_CONSOLE_SETUP.md
GOOGLE_SEARCH_CONSOLE_STEPS.md		GOOGLE_SEARCH_CONSOLE_STEPS.md
QUICK_START_GUIDE.md		QUICK_START_GUIDE.md
README.md		README.md
SEO_IMPROVEMENTS.md		SEO_IMPROVEMENTS.md
SEO_SUMMARY.md		SEO_SUMMARY.md
SETUP_SHORT_URL.md		SETUP_SHORT_URL.md
SITEMAP_FIX_2026-01-13.md		SITEMAP_FIX_2026-01-13.md
SITEMAP_TROUBLESHOOTING.md		SITEMAP_TROUBLESHOOTING.md
UI_FIXES_JAN_18_2026.md		UI_FIXES_JAN_18_2026.md
WEBSITE_CREATION_SUMMARY.md		WEBSITE_CREATION_SUMMARY.md
WEBSITE_STATUS.md		WEBSITE_STATUS.md
create_remaining_docs.sh		create_remaining_docs.sh
create_tutorial_pages.py		create_tutorial_pages.py
generate_docs.py		generate_docs.py
google1fb8d1b2174922c5.html		google1fb8d1b2174922c5.html
mkdocs.yml		mkdocs.yml
requirements.txt		requirements.txt
robots.txt		robots.txt
sitemap.xml		sitemap.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

llcuda v2.2.0 Documentation

What is llcuda v2.2.0?

Key Features

Performance Benchmarks

Quick Links

Documentation Structure

Development

Setup

Deployment

SEO & Keywords

Version

License

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

llcuda/llcuda.github.io

Folders and files

Latest commit

History

Repository files navigation

llcuda v2.2.0 Documentation

What is llcuda v2.2.0?

Key Features

Performance Benchmarks

Quick Links

Documentation Structure

Development

Setup

Deployment

SEO & Keywords

Version

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages