This guide explains how to interpret the validation_report.html generated by SQLTraceBench after a benchmark run.
The report provides a comprehensive view of the benchmark results, comparing the performance of the candidate database (or configuration) against a baseline.
- Status Card: Immediate visual feedback on whether the validation passed or failed.
- Performance Metrics: Charts visualizing QPS (Queries Per Second) and Latency distributions.
- Statistical Validation: Detailed table of statistical tests performed (e.g., KS Test, Chi-Square) and their results.
Definition: (Actual QPS - Baseline QPS) / Baseline QPS * 100%
Interpretation:
- Green (|Deviation| < 5%): Excellent match. The candidate performs similarly to the baseline.
- Yellow (5% ≤ |Deviation| < 15%): Acceptable variance. Minor tuning may be required.
- Red (|Deviation| ≥ 15%): Significant deviation. Requires investigation.
Common Scenarios:
- Negative Deviation (e.g., -20%): Candidate is slower. Check resource utilization (CPU, IO), index usage, or locking issues.
- Positive Deviation (e.g., +20%): Candidate is faster. While generally good, if the goal is to replicate behavior, this might indicate the candidate is skipping work or caching more aggressively.
Purpose: Checks if the latency distribution of the candidate matches the baseline.
p-value:
- p > 0.05: PASS. No significant difference in distributions.
- p ≤ 0.05: FAIL. Significant difference detected.
Purpose: Often used to check uniformity or goodness-of-fit for categorical data or binned distributions.
If Status is FAIL:
- Check QPS Deviation. Is the system under too much load?
- Examine Latency Charts. Is there a long tail? Are P99 latencies spiking?
- Review Error Rates. High error rates will invalidate performance metrics.
- Check logs for specific query failures.
Tips:
- Hover over charts to see exact values.
- Use the "Baseline" values as your ground truth.