-
core/event.rs- ProcessorEvent wrapper -
core/timestamp.rs- Timestamp utilities -
core/key.rs- GroupingKey implementation - Unit tests for core types
-
error.rs- ProcessorError types - Error conversion implementations
- Result type aliases
-
config.rs- Configuration structures - Config validation
- Config loading from YAML
- Environment variable overrides
Deliverables: Core types, error handling, configuration loading
-
window/types.rs- Window, WindowSpec, WindowType -
window/assigner.rs- WindowAssigner trait -
window/tumbling.rs- TumblingWindowAssigner -
window/sliding.rs- SlidingWindowAssigner -
window/session.rs- SessionWindowAssigner - Unit tests for all window types
-
window/trigger.rs- WindowTrigger trait - EventTimeTrigger implementation
- ProcessingTimeTrigger implementation
- CountTrigger implementation
- Unit tests for triggers
Deliverables: Complete window management system with tests
-
watermark/generator.rs- Watermark and WatermarkGenerator trait -
watermark/bounded.rs- BoundedOutOfOrdernessWatermark -
watermark/periodic.rs- PeriodicWatermarkGenerator -
watermark/alignment.rs- Timestamp alignment utilities - Unit tests for watermark generators
- LateEventHandler implementation
- LateEvent type
- Side output mechanism
- Integration tests for late events
Deliverables: Watermark system with late event handling
-
aggregation/aggregator.rs- Aggregator trait -
aggregation/count.rs- CountAggregator -
aggregation/sum.rs- SumAggregator -
aggregation/average.rs- AverageAggregator -
aggregation/minmax.rs- Min/MaxAggregators - Unit tests for basic aggregators
-
aggregation/percentile.rs- PercentileAggregator -
aggregation/stddev.rs- StdDevAggregator -
aggregation/composite.rs- CompositeAggregator - AggregationState structure
- AggregationEngine coordinator
- Unit tests for advanced aggregators
Deliverables: Complete aggregation system with all functions
-
state/backend.rs- StateBackend trait -
state/manager.rs- WindowStateManager - Serialization/deserialization logic
- Unit tests for state abstractions
-
state/memory.rs- MemoryStateBackend -
state/rocksdb.rs- RocksDBStateBackend - Integration tests for backends
-
state/checkpoint.rs- Checkpointing logic -
state/recovery.rs- Recovery logic - End-to-end checkpoint/recovery tests
Deliverables: State management with persistence and recovery
-
kafka/consumer.rs- KafkaEventSource - Offset management
- Consumer group handling
- Error handling and retries
-
kafka/producer.rs- KafkaAggregationSink - Batching and compression
- Delivery guarantees
- Error handling
-
kafka/config.rs- Kafka configuration - Connection pooling
- Integration tests with test Kafka
Deliverables: Complete Kafka integration
-
pipeline/builder.rs- StreamPipelineBuilder - Fluent API implementation
- Validation logic
- Builder tests
-
pipeline/executor.rs- StreamPipeline - Event processing loop
- Watermark propagation
- Window triggering
- Result emission
-
pipeline/operator.rs- Stream operators (map, filter, etc.) -
pipeline/sink.rs- Output sink trait and implementations - Operator chaining
- Unit tests for operators
Deliverables: Complete processing pipeline
-
metrics/collector.rs- ProcessorMetrics - Throughput metrics
- Latency metrics (p50, p95, p99)
- State size metrics
- Error rate metrics
- Structured logging integration
- Span creation for operations
- Distributed tracing support
- Health check endpoint
- Readiness checks
- Liveness checks
Deliverables: Complete observability stack
- Comprehensive unit test coverage (>80%)
- Edge case testing
- Property-based testing
- End-to-end processing tests
- Late event handling tests
- State recovery tests
- Multi-window tests
- Error handling tests
- Throughput benchmarks
- Latency benchmarks
- Memory usage tests
- State backend performance tests
- API documentation
- Usage examples
- Configuration guide
- Deployment guide
- Performance tuning guide
Deliverables: Production-ready with comprehensive tests and docs
- Profile hot paths
- Optimize memory allocations
- Reduce lock contention
- Batch state updates
- Circuit breakers
- Graceful shutdown
- Backpressure handling
- Resource limits
- Docker image
- Kubernetes manifests
- Monitoring dashboards
- Alerting rules
- Runbooks
Deliverables: Production-ready deployment artifacts
[dependencies]
# Core
tokio = { version = "1.40", features = ["full"] }
tokio-stream = "0.1"
async-trait = "0.1"
futures = "0.3"
# Kafka
rdkafka = { version = "0.36", features = ["tokio"] }
# Serialization
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
bincode = "1.3"
# State backends
rocksdb = "0.22"
sled = "0.34"
# Statistics
statrs = "0.17"
# Utilities
chrono = { version = "0.4", features = ["serde"] }
uuid = { version = "1.10", features = ["v4", "serde"] }
thiserror = "1.0"
anyhow = "1.0"
dashmap = "6.0"
# Observability
tracing = "0.1"
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
# Testing
mockall = "0.13"
criterion = { version = "0.5", features = ["async_tokio"] }
tempfile = "3.8"llm-optimizer-types- Core types and eventsllm-optimizer-config- Configuration management
- Kafka cluster (for integration tests and production)
- PostgreSQL (for integration with other components)
- Prometheus (for metrics)
- Grafana (for dashboards)
- Process 10,000+ events/sec
- p99 processing latency < 100ms
- Support all three window types
- Support all aggregation functions
- Handle late events up to configured lateness
- Persistent state with recovery
- Zero data loss
- 99.9% uptime
- Graceful degradation under load
- Memory usage < 2GB for 100k windows
- State recovery < 30 seconds
- Checkpoint creation < 5 seconds
- Comprehensive error handling
- Full observability
- Unit test coverage > 80%
- Integration test coverage > 70%
- All public APIs documented
- Zero clippy warnings
- Zero unsafe code (except FFI)
Risk: RocksDB performance degradation with large state
- Mitigation: Implement state TTL and compaction strategies
- Fallback: Use Sled or Redis as alternative backend
Risk: Memory growth from unbounded percentile calculation
- Mitigation: Implement T-Digest or HdrHistogram for streaming percentiles
- Fallback: Sample values or use fixed-size reservoir
Risk: Kafka consumer lag under high load
- Mitigation: Implement backpressure and load shedding
- Fallback: Scale horizontally with consumer groups
Risk: State corruption during checkpoint
- Mitigation: Atomic checkpointing with versioning
- Fallback: Multiple checkpoint retention
Risk: Resource exhaustion
- Mitigation: Memory limits, circuit breakers, rate limiting
- Fallback: Graceful degradation mode
# Run with in-memory state
cargo run --bin processor -- --config config/processor-dev.yaml# Run with RocksDB state
docker-compose up -d kafka
cargo run --bin processor -- --config config/processor-test.yaml# Kubernetes deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: llm-optimizer-processor
spec:
replicas: 3
template:
spec:
containers:
- name: processor
image: llm-optimizer-processor:latest
resources:
limits:
memory: "4Gi"
cpu: "2000m"
requests:
memory: "2Gi"
cpu: "1000m"
volumeMounts:
- name: state
mountPath: /var/lib/optimizer/state
volumes:
- name: state
persistentVolumeClaim:
claimName: processor-state- Events processed per second
- Processing latency (p50, p95, p99)
- Kafka consumer lag
- State size (MB)
- Checkpoint duration
- Error rate
- Memory usage
- CPU usage
- Consumer lag > 10,000 messages
- p99 latency > 1 second
- Error rate > 1%
- State size > 10GB
- Memory usage > 80%
- Checkpoint failures
| Phase | Duration | Deliverable |
|---|---|---|
| Phase 1 | Week 1-2 | Core foundation |
| Phase 2 | Week 2-3 | Window management |
| Phase 3 | Week 3-4 | Watermarking |
| Phase 4 | Week 4-5 | Aggregation engine |
| Phase 5 | Week 5-6 | State management |
| Phase 6 | Week 6-7 | Kafka integration |
| Phase 7 | Week 7-8 | Processing pipeline |
| Phase 8 | Week 8 | Metrics and observability |
| Phase 9 | Week 9-10 | Testing and documentation |
| Phase 10 | Week 10-11 | Production hardening |
Total Duration: 11 weeks (2.5 months)
- Review and Approve: Stakeholder review of architecture and plan
- Setup Environment: Development environment and infrastructure
- Phase 1 Kickoff: Begin core foundation implementation
- Weekly Checkpoints: Progress reviews and adjustments
- Integration Planning: Coordinate with Analyzer and Collector teams