Skip to content

Latest commit

 

History

History
51 lines (35 loc) · 2.25 KB

File metadata and controls

51 lines (35 loc) · 2.25 KB

Validation Report Interpretation Guide

This guide explains how to interpret the validation_report.html generated by SQLTraceBench after a benchmark run.

1. Report Overview

The report provides a comprehensive view of the benchmark results, comparing the performance of the candidate database (or configuration) against a baseline.

Key Sections:

  1. Status Card: Immediate visual feedback on whether the validation passed or failed.
  2. Performance Metrics: Charts visualizing QPS (Queries Per Second) and Latency distributions.
  3. Statistical Validation: Detailed table of statistical tests performed (e.g., KS Test, Chi-Square) and their results.

2. Key Metrics & Interpretation

2.1 QPS Deviation

Definition: (Actual QPS - Baseline QPS) / Baseline QPS * 100%

Interpretation:

  • Green (|Deviation| < 5%): Excellent match. The candidate performs similarly to the baseline.
  • Yellow (5% ≤ |Deviation| < 15%): Acceptable variance. Minor tuning may be required.
  • Red (|Deviation| ≥ 15%): Significant deviation. Requires investigation.

Common Scenarios:

  • Negative Deviation (e.g., -20%): Candidate is slower. Check resource utilization (CPU, IO), index usage, or locking issues.
  • Positive Deviation (e.g., +20%): Candidate is faster. While generally good, if the goal is to replicate behavior, this might indicate the candidate is skipping work or caching more aggressively.

2.2 Statistical Tests

KS Test (Kolmogorov-Smirnov)

Purpose: Checks if the latency distribution of the candidate matches the baseline.

p-value:

  • p > 0.05: PASS. No significant difference in distributions.
  • p ≤ 0.05: FAIL. Significant difference detected.

Chi-Square Test

Purpose: Often used to check uniformity or goodness-of-fit for categorical data or binned distributions.

3. Troubleshooting

If Status is FAIL:

  1. Check QPS Deviation. Is the system under too much load?
  2. Examine Latency Charts. Is there a long tail? Are P99 latencies spiking?
  3. Review Error Rates. High error rates will invalidate performance metrics.
  4. Check logs for specific query failures.

Tips:

  • Hover over charts to see exact values.
  • Use the "Baseline" values as your ground truth.