Torch Inference - Enterprise ML Inference Server

High-performance PyTorch inference framework in Rust with production-grade testing and monitoring.

🎯 Features

Production-Ready Testing: 147+ unit tests, integration tests, and benchmarks
Enterprise Resilience: Circuit breaker, bulkhead isolation, request deduplication
High Performance: Multi-level caching, dynamic batching, concurrent processing
Comprehensive Monitoring: Real-time metrics, health checks, endpoint statistics
Type-Safe: Full Rust type safety with zero-cost abstractions

Quick Start

Build the Server

cargo build --release

Run Tests

# Run all tests (147+ unit tests + integration tests)
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test suites
cargo test cache::tests      # Cache system tests (38 tests)
cargo test batch::tests      # Batch processing tests (28 tests)
cargo test monitor::tests    # Monitoring tests (28 tests)
cargo test resilience::      # Resilience pattern tests (16 tests)

# Run integration tests only
cargo test --test integration_test

# Run benchmarks
cargo bench

Run the Server

cargo run --bin torch-inference-server

📊 Test Coverage (Enterprise-Grade)

Core Infrastructure (91 tests)

Cache System (38 tests)
- Basic CRUD operations
- TTL-based expiration
- Concurrent access (10+ threads)
- Unicode keys support
- Boundary conditions
- Memory efficiency
- Large value handling
- Stress testing (20 threads × 100 ops)
Batch Processing (28 tests)
- Dynamic batching
- Timeout handling
- Priority management
- Concurrent additions
- Large input handling
- Stress testing (20 producers × 100 items)
Monitoring (28 tests)
- Request tracking
- Latency metrics (min/max/avg)
- Throughput calculation
- Health status
- Endpoint statistics
- Concurrent recording (10 threads × 100 ops)
- High-frequency updates (10k ops/sec)

Resilience Patterns (16 tests)

Circuit Breaker (10 tests)
- State transitions (Closed → Open → HalfOpen)
- Failure threshold detection
- Automatic recovery
- Reset functionality
Bulkhead (6 tests)
- Permit acquisition
- Capacity management
- Resource isolation
- Concurrent operations

Additional Coverage (40+ tests)

Error handling and propagation
Configuration management
Request deduplication
API endpoints
Core ML components

Integration Tests (6 tests)

End-to-end request flow
Concurrent system load (100 concurrent requests)
Batch processing pipeline
Cache + Monitor integration
Error condition handling

🚀 Performance Benchmarks

Run benchmarks to measure performance:

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench cache_get

# Generate benchmark reports (in target/criterion)
cargo bench --bench cache_bench

Benchmark categories:

cache_set: Insertion performance at various scales (100, 1K, 10K)
cache_get: Retrieval performance with populated cache
cache_cleanup: Expiration cleanup performance

🏗️ Architecture

Test Structure

tests/
├── integration_test.rs     # Integration tests
└── ...

benches/
└── cache_bench.rs          # Performance benchmarks

src/
├── cache.rs               # 38 unit tests
├── batch.rs               # 28 unit tests
├── monitor.rs             # 28 unit tests
├── dedup.rs               # 9 unit tests
├── error.rs               # 11 unit tests
├── config.rs              # 7 unit tests
└── resilience/
    ├── circuit_breaker.rs # 10 unit tests
    └── bulkhead.rs        # 6 unit tests

Enterprise Testing Features

✅ Concurrency Testing: All components tested with 10-50 concurrent threads ✅ Stress Testing: High-load scenarios (10K+ operations) ✅ Boundary Conditions: Edge cases, zero values, max values ✅ Performance Testing: Criterion benchmarks for critical paths ✅ Integration Testing: End-to-end workflows ✅ Error Scenarios: Failure injection and recovery ✅ Memory Safety: No unsafe code, all tests thread-safe

📈 Continuous Testing

# Watch mode - run tests on file change
cargo watch -x test

# Coverage report (requires cargo-tarpaulin)
cargo tarpaulin --out Html

# Run tests in parallel
cargo test -- --test-threads=8

# Run tests sequentially (for debugging)
cargo test -- --test-threads=1

🔬 Test Quality Standards

All tests follow enterprise standards:

Isolation: Each test is independent and can run in any order
Determinism: Tests produce consistent results
Performance: Fast execution (<30s for full suite)
Readability: Clear test names and assertions
Coverage: Critical paths have multiple test scenarios
Documentation: Comments explain complex test logic

🛠️ Development

Features

Optional Backend Support

# Enable PyTorch backend
cargo build --features torch

# Enable ONNX backend (requires ONNX Runtime)
cargo build --features onnx

# Enable Candle backend
cargo build --features candle

# Enable all backends
cargo build --features all-backends

CUDA Support

cargo build --features cuda

Project Structure

src/
├── lib.rs              # Library exports for testing
├── main.rs             # Server entry point
├── api/                # REST API endpoints
├── auth/               # Authentication
├── batch.rs            # Batch processing
├── cache.rs            # Caching system
├── config.rs           # Configuration
├── core/               # ML inference engines
├── dedup.rs            # Request deduplication
├── error.rs            # Error handling
├── middleware/         # HTTP middleware
├── models/             # Model management
├── monitor.rs          # Monitoring & metrics
├── resilience/         # Resilience patterns
├── security/           # Security features
└── telemetry/          # Logging & tracing

Running Specific Tests

# Test caching system
cargo test cache::tests

# Test batch processing
cargo test batch::tests

# Test circuit breaker
cargo test circuit_breaker::tests

# Test monitoring
cargo test monitor::tests

# Run with verbose output
cargo test -- --nocapture --test-threads=1

Development

Code Style

All tests follow Rust best practices:

Tests are co-located with implementation using #[cfg(test)]
Async tests use #[tokio::test]
Tests are isolated and can run in parallel
No external dependencies for core tests

Adding New Tests

Add test modules at the bottom of implementation files:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_my_feature() {
        // Test code here
    }

    #[tokio::test]
    async fn test_async_feature() {
        // Async test code here
    }
}

Performance

The server includes several performance optimizations:

Request batching for improved throughput
Multi-level caching (in-memory + request deduplication)
Circuit breaker pattern for fault tolerance
Bulkhead pattern for resource isolation
Comprehensive monitoring and metrics

License

Testing

Comprehensive testing has been completed for all endpoints and features.

Quick Test

./test_quick.sh

Full Test Suite

./test_final_report.sh

Test Results

See docs/TEST_RESULTS.md for detailed test results and coverage.

Latest Test Results:

✅ 47/47 tests passed (100% success rate)
✅ All 6 TTS engines operational
✅ All 22 SOTA models available for download
✅ Stress tested with 20+ concurrent requests
✅ System monitoring and performance metrics verified

📚 Documentation

Complete documentation is available in the docs/ directory:

Getting Started

RUN_NOW.md - Quick start guide
BUILDING_WITH_TORCH.md - Build with PyTorch support
COMPLETE_TESTING_GUIDE.md - Complete testing guide

SOTA Models

SOTA_IMAGE_MODELS_SUMMARY.md - Model catalog
API_SOTA_MODELS.md - API documentation
IMAGE_MODELS_STATUS.md - Status & roadmap

Benchmarking

BENCHMARK_SUMMARY.md - Quick overview
BENCHMARK_GUIDE.md - User guide
BENCHMARK_README.md - Complete reference

Testing & Fixes

TEST_RESULTS.md - Test results
TEST_FIXES.md - Issues resolved
GIT_CLEANUP_SUMMARY.md - Repository cleanup

Name		Name	Last commit message	Last commit date
Latest commit History 123 Commits
benches		benches
config		config
docs		docs
.dockerignore		.dockerignore
.env.template		.env.template
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.uvrc		.uvrc
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
DOCKER_TROUBLESHOOTING.md		DOCKER_TROUBLESHOOTING.md
Dockerfile		Dockerfile
Makefile		Makefile
README.Docker.md		README.Docker.md
README.md		README.md
build.rs		build.rs
compose.dev.yaml		compose.dev.yaml
compose.gpu.yaml		compose.gpu.yaml
compose.prod.yaml		compose.prod.yaml
compose.yaml		compose.yaml
config.toml		config.toml
config.yaml		config.yaml
model_registry.json		model_registry.json
models.json		models.json
nginx.conf		nginx.conf

KolosalAI/torch-inference

Folders and files

Latest commit

History

Repository files navigation

Torch Inference - Enterprise ML Inference Server

🎯 Features

Quick Start

Build the Server

Run Tests

Run the Server

📊 Test Coverage (Enterprise-Grade)

Core Infrastructure (91 tests)

Resilience Patterns (16 tests)

Additional Coverage (40+ tests)

Integration Tests (6 tests)

🚀 Performance Benchmarks

🏗️ Architecture

Test Structure

Enterprise Testing Features

📈 Continuous Testing

🔬 Test Quality Standards

🛠️ Development

Features

Optional Backend Support

CUDA Support

Project Structure

Running Specific Tests

Development

Code Style

Adding New Tests

Performance

License

Testing

Quick Test

Full Test Suite

Test Results

📚 Documentation

Getting Started

SOTA Models

Benchmarking

Testing & Fixes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 2

Uh oh!

Languages

Packages