High-performance PyTorch inference framework in Rust with production-grade testing and monitoring.
- Production-Ready Testing: 147+ unit tests, integration tests, and benchmarks
- Enterprise Resilience: Circuit breaker, bulkhead isolation, request deduplication
- High Performance: Multi-level caching, dynamic batching, concurrent processing
- Comprehensive Monitoring: Real-time metrics, health checks, endpoint statistics
- Type-Safe: Full Rust type safety with zero-cost abstractions
cargo build --release# Run all tests (147+ unit tests + integration tests)
cargo test
# Run with verbose output
cargo test -- --nocapture
# Run specific test suites
cargo test cache::tests # Cache system tests (38 tests)
cargo test batch::tests # Batch processing tests (28 tests)
cargo test monitor::tests # Monitoring tests (28 tests)
cargo test resilience:: # Resilience pattern tests (16 tests)
# Run integration tests only
cargo test --test integration_test
# Run benchmarks
cargo benchcargo run --bin torch-inference-server-
Cache System (38 tests)
- Basic CRUD operations
- TTL-based expiration
- Concurrent access (10+ threads)
- Unicode keys support
- Boundary conditions
- Memory efficiency
- Large value handling
- Stress testing (20 threads × 100 ops)
-
Batch Processing (28 tests)
- Dynamic batching
- Timeout handling
- Priority management
- Concurrent additions
- Large input handling
- Stress testing (20 producers × 100 items)
-
Monitoring (28 tests)
- Request tracking
- Latency metrics (min/max/avg)
- Throughput calculation
- Health status
- Endpoint statistics
- Concurrent recording (10 threads × 100 ops)
- High-frequency updates (10k ops/sec)
-
Circuit Breaker (10 tests)
- State transitions (Closed → Open → HalfOpen)
- Failure threshold detection
- Automatic recovery
- Reset functionality
-
Bulkhead (6 tests)
- Permit acquisition
- Capacity management
- Resource isolation
- Concurrent operations
- Error handling and propagation
- Configuration management
- Request deduplication
- API endpoints
- Core ML components
- End-to-end request flow
- Concurrent system load (100 concurrent requests)
- Batch processing pipeline
- Cache + Monitor integration
- Error condition handling
Run benchmarks to measure performance:
# Run all benchmarks
cargo bench
# Run specific benchmark
cargo bench cache_get
# Generate benchmark reports (in target/criterion)
cargo bench --bench cache_benchBenchmark categories:
- cache_set: Insertion performance at various scales (100, 1K, 10K)
- cache_get: Retrieval performance with populated cache
- cache_cleanup: Expiration cleanup performance
tests/
├── integration_test.rs # Integration tests
└── ...
benches/
└── cache_bench.rs # Performance benchmarks
src/
├── cache.rs # 38 unit tests
├── batch.rs # 28 unit tests
├── monitor.rs # 28 unit tests
├── dedup.rs # 9 unit tests
├── error.rs # 11 unit tests
├── config.rs # 7 unit tests
└── resilience/
├── circuit_breaker.rs # 10 unit tests
└── bulkhead.rs # 6 unit tests
✅ Concurrency Testing: All components tested with 10-50 concurrent threads ✅ Stress Testing: High-load scenarios (10K+ operations) ✅ Boundary Conditions: Edge cases, zero values, max values ✅ Performance Testing: Criterion benchmarks for critical paths ✅ Integration Testing: End-to-end workflows ✅ Error Scenarios: Failure injection and recovery ✅ Memory Safety: No unsafe code, all tests thread-safe
# Watch mode - run tests on file change
cargo watch -x test
# Coverage report (requires cargo-tarpaulin)
cargo tarpaulin --out Html
# Run tests in parallel
cargo test -- --test-threads=8
# Run tests sequentially (for debugging)
cargo test -- --test-threads=1All tests follow enterprise standards:
- Isolation: Each test is independent and can run in any order
- Determinism: Tests produce consistent results
- Performance: Fast execution (<30s for full suite)
- Readability: Clear test names and assertions
- Coverage: Critical paths have multiple test scenarios
- Documentation: Comments explain complex test logic
# Enable PyTorch backend
cargo build --features torch
# Enable ONNX backend (requires ONNX Runtime)
cargo build --features onnx
# Enable Candle backend
cargo build --features candle
# Enable all backends
cargo build --features all-backendscargo build --features cudasrc/
├── lib.rs # Library exports for testing
├── main.rs # Server entry point
├── api/ # REST API endpoints
├── auth/ # Authentication
├── batch.rs # Batch processing
├── cache.rs # Caching system
├── config.rs # Configuration
├── core/ # ML inference engines
├── dedup.rs # Request deduplication
├── error.rs # Error handling
├── middleware/ # HTTP middleware
├── models/ # Model management
├── monitor.rs # Monitoring & metrics
├── resilience/ # Resilience patterns
├── security/ # Security features
└── telemetry/ # Logging & tracing
# Test caching system
cargo test cache::tests
# Test batch processing
cargo test batch::tests
# Test circuit breaker
cargo test circuit_breaker::tests
# Test monitoring
cargo test monitor::tests
# Run with verbose output
cargo test -- --nocapture --test-threads=1All tests follow Rust best practices:
- Tests are co-located with implementation using
#[cfg(test)] - Async tests use
#[tokio::test] - Tests are isolated and can run in parallel
- No external dependencies for core tests
Add test modules at the bottom of implementation files:
#[cfg(test)]
mod tests {
use super::*;
#[test]
fn test_my_feature() {
// Test code here
}
#[tokio::test]
async fn test_async_feature() {
// Async test code here
}
}The server includes several performance optimizations:
- Request batching for improved throughput
- Multi-level caching (in-memory + request deduplication)
- Circuit breaker pattern for fault tolerance
- Bulkhead pattern for resource isolation
- Comprehensive monitoring and metrics
Copyright © 2024 Genta Dev Team
Comprehensive testing has been completed for all endpoints and features.
./test_quick.sh./test_final_report.shSee docs/TEST_RESULTS.md for detailed test results and coverage.
Latest Test Results:
- ✅ 47/47 tests passed (100% success rate)
- ✅ All 6 TTS engines operational
- ✅ All 22 SOTA models available for download
- ✅ Stress tested with 20+ concurrent requests
- ✅ System monitoring and performance metrics verified
Complete documentation is available in the docs/ directory:
- RUN_NOW.md - Quick start guide
- BUILDING_WITH_TORCH.md - Build with PyTorch support
- COMPLETE_TESTING_GUIDE.md - Complete testing guide
- SOTA_IMAGE_MODELS_SUMMARY.md - Model catalog
- API_SOTA_MODELS.md - API documentation
- IMAGE_MODELS_STATUS.md - Status & roadmap
- BENCHMARK_SUMMARY.md - Quick overview
- BENCHMARK_GUIDE.md - User guide
- BENCHMARK_README.md - Complete reference
- TEST_RESULTS.md - Test results
- TEST_FIXES.md - Issues resolved
- GIT_CLEANUP_SUMMARY.md - Repository cleanup