-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
The RESAMPLE_PERSIST path (lines 6133–6148) honors a persisted sample rate via RESAMPLE unconditionally — there is no minimum sample size floor check. If the persisted sample rate is very low (e.g., 0.01% on a 10B-row table), sp_StatUpdate will RESAMPLE using a rate that produces fewer than 200 usable histogram steps without any warning or override.
The values needed to compute the floor (@current_row_count and @current_persisted_sample_percent) are both in scope at the decision point, but the product row_count × (percent / 100.0) (absolute sampled rows) is never computed, and no floor parameter exists.
Impact
- Silent histogram degradation. A RESAMPLE_PERSIST call with a low persisted sample rate on a large table can produce a histogram with fewer than 200 meaningful steps — CE range estimates can be off by 10× or more.
- Invisible in query plans. When histogram steps are built from fewer than 50–100 sampled rows per step, CE range estimate errors reach 10×+, which directly corrupts row goal scale factors. The resulting plan failure (Nested Loops + Seek processing far more rows than expected) is invisible in
EstimatedRowsWithoutRowGoalbecause the histogram estimate itself is wrong — the plan looks correct but uses a corrupted cardinality baseline. - RESAMPLE honors a rate that may have been calibrated at a different table scale. A persisted rate of 1% calibrated at 100M rows (1M sampled) becomes 1% at 10B rows (still 1M sampled) — but also 1% at 1M rows (10K sampled — well below any useful histogram quality floor).
Root Cause
-- CURRENT (v2.3) — lines 6133–6148
-- @current_row_count is populated: ✅
-- @current_persisted_sample_percent is populated: ✅
-- absolute_sampled_rows = row_count × (percent/100.0): ❌ NEVER COMPUTED
-- @PersistSampleMinRows parameter: ❌ DOES NOT EXIST — floor check never written
ELSE IF @effective_sample_percent IS NULL
AND @current_persisted_sample_percent IS NOT NULL
AND @is_long_running_stat = 0
BEGIN
SELECT
@with_clause = N'RESAMPLE',
@has_with_option = 1,
@sample_source = N'RESAMPLE_PERSIST';
-- No floor check. RESAMPLE fires unconditionally regardless of resulting sample size.
ENDSQL Server's own auto-sample algorithm converges to approximately 1,000,000 absolute sampled rows for large tables — this is the closest available engineering proxy for "adequate for 200-step histogram construction." A 500K floor is the right order of magnitude; the v2.4 default should be 1,000,000 to align with SQL Server's own threshold.
Fix
1. Compute @absolute_sampled_rows before the floor check
Insert the computation before the IF @resample_persist = 1 block:
-- Compute absolute sampled rows for histogram quality floor check
DECLARE @absolute_sampled_rows BIGINT = NULL;
IF @current_row_count IS NOT NULL
AND @current_persisted_sample_percent IS NOT NULL
AND @current_persisted_sample_percent > 0
BEGIN
SET @absolute_sampled_rows = CAST(
@current_row_count * (@current_persisted_sample_percent / 100.0)
AS BIGINT
);
END2. Use it in the existing floor comparison (no structural change needed)
The IF @resample_persist = 1 AND ... AND @absolute_sampled_rows >= @PersistSampleMinRows guard then works as intended.
3. Add @PersistSampleMinRows parameter (new)
@PersistSampleMinRows BIGINT = 1000000 -- minimum absolute sampled rows for RESAMPLE_PERSIST pathThe default of 1,000,000 (1M rows) aligns with SQL Server's own auto-sample algorithm convergence threshold — the internal engineering reference for "enough rows to produce a reliable 200-step histogram." When @absolute_sampled_rows < @PersistSampleMinRows, fall back to an explicit sample rate rather than RESAMPLE.
4. Log to CommandLog ExtendedInfo
Add persisted_sampled_rows as an absolute figure to the CommandLog ExtendedInfo XML column. The current persisted_sample_percent alone is uninterpretable without the denominator — a DBA reviewing CommandLog cannot evaluate histogram quality without this.
<ExtendedInfo>
...
<PersistedSamplePercent>0.10</PersistedSamplePercent>
<PersistedSampledRows>10000000</PersistedSampledRows> <!-- NEW -->
</ExtendedInfo>Testing Requirements
- Validate
@absolute_sampled_rowsis computed before the RESAMPLE_PERSIST floor check - Validate floor check fires correctly for tables below 1M sampled rows
- Validate floor check passes for tables at or above 1M sampled rows
- Validate NULL-safe handling (both inputs NULL → skip floor check, not error)
- Test with
@Execute = 'N'(dry-run) to verify variable is set even without execution - Multi-version matrix: no version gate required (pure T-SQL arithmetic)
References
- Internal analysis: PERSIST_SAMPLE_PERCENT adequacy floor
- Claim-194: Code gap — absolute_sampled_rows never computed (❌ Bug Fix)
- Claim-193: SQL Server auto-sample convergence ~1M rows (engineering reference)
- Claim-196: CE downstream impact — row goal backfire invisible in
EstimatedRowsWithoutRowGoal - Related:
issue-p2b-qs-forced-plan-always-on.md(plan stability interaction)