Skip to content

rootkiller6788/LLMSched

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

LLMSched

A tiny LLM runtime microcore for KV cache, token budget, and batch scheduling.

LLMSched is not a full inference engine. It is a scheduler simulator that captures the core resource control plane of LLM serving: who gets KV cache, how many tokens to allocate, when to batch, and when to reject.

Power

Inference resource scheduling authority — whoever controls token allocation, KV cache leases, and batch merging holds the key to inference performance.

First-Edition Scope

No real model inference. Pure simulator:

requests.jsonl
  ↓
token budget
  ↓
KV cache allocation
  ↓
batch scheduling
  ↓
mock decode step
  ↓
trace.jsonl + scheduling report

Quick Start

cargo run -- run --requests examples/requests.jsonl

Key Concepts

  • Token Budget — global and per-request token allocation with overflow protection
  • KV Cache Lease — allocate/free/evict semantics for GPU memory pages
  • Batch Scheduler — priority queue + deadline-aware continuous batching
  • Telemetry — structured trace output (JSONL) for downstream training

Protocols

Input Output
plan.json (from Apeinx-IR) trace.jsonl (to ApexTrain-Core)
requests.jsonl scheduling report (Markdown)

Tech Stack

Layer Choice
Control logic Rust
CLI clap
Config TOML
Trace format JSONL
Underlying kernels C ABI → KernelLab
Logging tracing

Project Structure

llmsched-core/
├── crates/llmsched-core/src/   # Core: queue, budget, kv, scheduler, telemetry
├── crates/llmsched-cli/src/    # CLI entry point
├── ffi/                        # KernelLab C ABI bindings
├── examples/requests.jsonl     # Sample workload
└── reports/                    # Output trace + report

Relationship to Other Apeinx Projects

KernelLab → (C ABI) → LLMSched → (trace.jsonl) → ApexTrain-Core
                          ↑
                    Apeinx-IR → (plan.json)

License

TBD

About

Simulate KV cache, token budget, and batch scheduling for LLM serving.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages