Skip to content

KolosalAI/torch-inference

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

123 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Torch Inference - Enterprise ML Inference Server

High-performance PyTorch inference framework in Rust with production-grade testing and monitoring.

🎯 Features

  • Production-Ready Testing: 147+ unit tests, integration tests, and benchmarks
  • Enterprise Resilience: Circuit breaker, bulkhead isolation, request deduplication
  • High Performance: Multi-level caching, dynamic batching, concurrent processing
  • Comprehensive Monitoring: Real-time metrics, health checks, endpoint statistics
  • Type-Safe: Full Rust type safety with zero-cost abstractions

Quick Start

Build the Server

cargo build --release

Run Tests

# Run all tests (147+ unit tests + integration tests)
cargo test

# Run with verbose output
cargo test -- --nocapture

# Run specific test suites
cargo test cache::tests      # Cache system tests (38 tests)
cargo test batch::tests      # Batch processing tests (28 tests)
cargo test monitor::tests    # Monitoring tests (28 tests)
cargo test resilience::      # Resilience pattern tests (16 tests)

# Run integration tests only
cargo test --test integration_test

# Run benchmarks
cargo bench

Run the Server

cargo run --bin torch-inference-server

📊 Test Coverage (Enterprise-Grade)

Core Infrastructure (91 tests)

  • Cache System (38 tests)

    • Basic CRUD operations
    • TTL-based expiration
    • Concurrent access (10+ threads)
    • Unicode keys support
    • Boundary conditions
    • Memory efficiency
    • Large value handling
    • Stress testing (20 threads × 100 ops)
  • Batch Processing (28 tests)

    • Dynamic batching
    • Timeout handling
    • Priority management
    • Concurrent additions
    • Large input handling
    • Stress testing (20 producers × 100 items)
  • Monitoring (28 tests)

    • Request tracking
    • Latency metrics (min/max/avg)
    • Throughput calculation
    • Health status
    • Endpoint statistics
    • Concurrent recording (10 threads × 100 ops)
    • High-frequency updates (10k ops/sec)

Resilience Patterns (16 tests)

  • Circuit Breaker (10 tests)

    • State transitions (Closed → Open → HalfOpen)
    • Failure threshold detection
    • Automatic recovery
    • Reset functionality
  • Bulkhead (6 tests)

    • Permit acquisition
    • Capacity management
    • Resource isolation
    • Concurrent operations

Additional Coverage (40+ tests)

  • Error handling and propagation
  • Configuration management
  • Request deduplication
  • API endpoints
  • Core ML components

Integration Tests (6 tests)

  • End-to-end request flow
  • Concurrent system load (100 concurrent requests)
  • Batch processing pipeline
  • Cache + Monitor integration
  • Error condition handling

🚀 Performance Benchmarks

Run benchmarks to measure performance:

# Run all benchmarks
cargo bench

# Run specific benchmark
cargo bench cache_get

# Generate benchmark reports (in target/criterion)
cargo bench --bench cache_bench

Benchmark categories:

  • cache_set: Insertion performance at various scales (100, 1K, 10K)
  • cache_get: Retrieval performance with populated cache
  • cache_cleanup: Expiration cleanup performance

🏗️ Architecture

Test Structure

tests/
├── integration_test.rs     # Integration tests
└── ...

benches/
└── cache_bench.rs          # Performance benchmarks

src/
├── cache.rs               # 38 unit tests
├── batch.rs               # 28 unit tests
├── monitor.rs             # 28 unit tests
├── dedup.rs               # 9 unit tests
├── error.rs               # 11 unit tests
├── config.rs              # 7 unit tests
└── resilience/
    ├── circuit_breaker.rs # 10 unit tests
    └── bulkhead.rs        # 6 unit tests

Enterprise Testing Features

Concurrency Testing: All components tested with 10-50 concurrent threads ✅ Stress Testing: High-load scenarios (10K+ operations) ✅ Boundary Conditions: Edge cases, zero values, max values ✅ Performance Testing: Criterion benchmarks for critical paths ✅ Integration Testing: End-to-end workflows ✅ Error Scenarios: Failure injection and recovery ✅ Memory Safety: No unsafe code, all tests thread-safe

📈 Continuous Testing

# Watch mode - run tests on file change
cargo watch -x test

# Coverage report (requires cargo-tarpaulin)
cargo tarpaulin --out Html

# Run tests in parallel
cargo test -- --test-threads=8

# Run tests sequentially (for debugging)
cargo test -- --test-threads=1

🔬 Test Quality Standards

All tests follow enterprise standards:

  1. Isolation: Each test is independent and can run in any order
  2. Determinism: Tests produce consistent results
  3. Performance: Fast execution (<30s for full suite)
  4. Readability: Clear test names and assertions
  5. Coverage: Critical paths have multiple test scenarios
  6. Documentation: Comments explain complex test logic

🛠️ Development

Features

Optional Backend Support

# Enable PyTorch backend
cargo build --features torch

# Enable ONNX backend (requires ONNX Runtime)
cargo build --features onnx

# Enable Candle backend
cargo build --features candle

# Enable all backends
cargo build --features all-backends

CUDA Support

cargo build --features cuda

Project Structure

src/
├── lib.rs              # Library exports for testing
├── main.rs             # Server entry point
├── api/                # REST API endpoints
├── auth/               # Authentication
├── batch.rs            # Batch processing
├── cache.rs            # Caching system
├── config.rs           # Configuration
├── core/               # ML inference engines
├── dedup.rs            # Request deduplication
├── error.rs            # Error handling
├── middleware/         # HTTP middleware
├── models/             # Model management
├── monitor.rs          # Monitoring & metrics
├── resilience/         # Resilience patterns
├── security/           # Security features
└── telemetry/          # Logging & tracing

Running Specific Tests

# Test caching system
cargo test cache::tests

# Test batch processing
cargo test batch::tests

# Test circuit breaker
cargo test circuit_breaker::tests

# Test monitoring
cargo test monitor::tests

# Run with verbose output
cargo test -- --nocapture --test-threads=1

Development

Code Style

All tests follow Rust best practices:

  • Tests are co-located with implementation using #[cfg(test)]
  • Async tests use #[tokio::test]
  • Tests are isolated and can run in parallel
  • No external dependencies for core tests

Adding New Tests

Add test modules at the bottom of implementation files:

#[cfg(test)]
mod tests {
    use super::*;

    #[test]
    fn test_my_feature() {
        // Test code here
    }

    #[tokio::test]
    async fn test_async_feature() {
        // Async test code here
    }
}

Performance

The server includes several performance optimizations:

  • Request batching for improved throughput
  • Multi-level caching (in-memory + request deduplication)
  • Circuit breaker pattern for fault tolerance
  • Bulkhead pattern for resource isolation
  • Comprehensive monitoring and metrics

License

Copyright © 2024 Genta Dev Team

Testing

Comprehensive testing has been completed for all endpoints and features.

Quick Test

./test_quick.sh

Full Test Suite

./test_final_report.sh

Test Results

See docs/TEST_RESULTS.md for detailed test results and coverage.

Latest Test Results:

  • ✅ 47/47 tests passed (100% success rate)
  • ✅ All 6 TTS engines operational
  • ✅ All 22 SOTA models available for download
  • ✅ Stress tested with 20+ concurrent requests
  • ✅ System monitoring and performance metrics verified

📚 Documentation

Complete documentation is available in the docs/ directory:

Getting Started

SOTA Models

Benchmarking

Testing & Fixes

About

Torch model inference serving

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •