Merged
Conversation
…ac_ctas flag - CTAS from PU/derived_pu tables propagates PAC_KEY and PROTECTED metadata to the new table (marked as derived_pu). Handles column renames and skips aggregate-consumed protected columns. - INSERT with aggregates into derived_pu tables triggers PAC noise injection. - New pac_ctas setting (default true) to control CTAS propagation. - New pac_check setting guard to allow CTAS to project protected columns. - DML wrapper detection: drills through INSERT/CTAS and CTE-wrapped DML to reach the SELECT plan for PAC rewriting. - Replace std::cerr debug traces with PAC_DEBUG_PRINT. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Write path (working): - ConvertDerivedPuToCounters: converts pac_noised_* to pac_* counter variants for CTAS targeting derived_pu tables - Updates CTAS column types from scalar to LIST<FLOAT> - derived_pu flag now only set when CTAS has aggregates Read path (WIP - crashes on type mismatch): - pac_finalize scalar function: LIST<FLOAT> -> FLOAT, reuses PacNoisySampleFrom64Counters with stable world picking - PACDerivedReadRule post-optimizer: injects pac_finalize into projections reading from derived_pu tables - PACDerivedTypePatcher ClientContextState: patches result->types after physical planning (not yet reached due to crash) - Known issue: physical planner crashes because result->types (set at plan time) says FLOAT[] but optimized plan says FLOAT New files: - src/aggregates/pac_finalize.cpp - src/query_processing/pac_derived_rewriter.cpp - src/include/query_processing/pac_derived_rewriter.hpp - test/sql/pac_derived.test Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…port - CanRequestRebind() override enables OnFinalizePrepare type patching - Generic FixAllStaleRefs: walks entire plan after pac_finalize injection, builds binding→type map from child operators, updates all stale column refs - Handles ORDER BY (orders field), WHERE (filters), subqueries - Full test suite: 50 assertions covering write path, read path, noise, ORDER BY, WHERE, subqueries, non-aggregate CTAS, manual pac_finalize Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Remove unused PacFinalizeBindData (pac_finalize reads settings from context)
- Fix RNG seeding: use entry.offset instead of row index for stability across batches
- Remove debug Printer::Print calls, use PAC_DEBUG_PRINT behind PAC_DEBUG flag
- Fix stale comment ("placeholder") on PACDerivedReadRule
- Early-exit in post-optimizer when no derived_pu GETs found (avoid unnecessary plan traversal)
- Add HasDerivedPuCounterGets() helper to avoid double traversal
- Restore in-line binding propagation in WrapCounterRefsWithFinalize (needed for ORDER BY)
- Add test: multiple aggregate columns (SUM + COUNT)
- Keep FixAllStaleRefs as safety net after ResolveOperatorTypes
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Replace typeof checks with value-based checks (length(), actual values) - Fix stale "placeholder" comment on PACDerivedReadRule - Document CanRequestRebind overhead tradeoff - Remove tests assuming zero-variance counters (counters vary by subsample) - No mi > 0 value checks (noise is non-deterministic) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
CTAS propagation now always sets derived_pu = true when the source is PU or derived_pu, so IVM delta tables automatically get PAC metadata. PACDerivedReadFunction skips pac_finalize injection for DML plans (INSERT/UPDATE/DELETE, including CTE-wrapped), ensuring upsert merges operate on raw counter data. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…mns, add stress tests - Fix INSERT into derived_pu: call ResolveOperatorTypes for all DML (not just CTAS) and strip redundant FLOAT[]→FLOAT[] casts after counter conversion - Fix WHERE on counter columns: make WrapExpr recursive so pac_finalize is injected into nested comparisons/filters, not just top-level projections - Register implicit casts LIST<FLOAT>→FLOAT and LIST<FLOAT>→DOUBLE with a real pac_finalize implementation so the binder can resolve >, <, >=, BETWEEN, IN, SUM(), AVG() etc. on counter columns even when DuckDB pushes filters into scans - Patch DuckDB comparison binder to check extension-registered implicit casts (one-hop LIST→child resolution) for both equality and inequality comparisons - Fix CTAS from derived_pu: skip PAC compilation for non-aggregate reads (gate on PlanHasAggregate), skip pac_finalize in post-optimizer for CREATE_TABLE so counter lists are preserved in the copy - Add 20 stress tests (Tests 6-25): BETWEEN, DISTINCT, GROUP BY, CASE WHEN, SUM/AVG over counters, JOIN two derived_pu tables, subqueries, UNION ALL, DELETE with counter filter, INSERT into plain table, IN, !=, NULLIF, ORDER BY counter, multiple INSERTs, multi-aggregate INSERT, CTAS chain Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…adata, implicit casts Major overhaul of the post-optimizer pac_finalize injection: - Use op->types (resolved output) instead of get.returned_types (stale after optimizer column reordering) to seed counter_bindings - Propagate counter bindings through PROJECTION and AGGREGATE output - Use LogicalOperatorVisitor::EnumerateExpressions to visit ALL operator-specific expression fields (DISTINCT targets, ORDER BY, TOP_N, etc.) - Suppress pac_finalize wrapping inside AGGREGATE subtrees (first() in scalar subqueries needs raw FLOAT[] input) - Fix CTE_REF chunk_types after pac_finalize changes CTE definition output types - Skip already-wrapped pac_finalize expressions to prevent double-wrapping - Immediately update PROJECTION types after wrapping so parent operators see correct types - Add counter_columns field to PACTableMetadata + serialization - Register implicit casts LIST<FLOAT>→FLOAT and LIST<FLOAT>→DOUBLE with real pac_finalize implementation - Patch DuckDB comparison binder to check extension-registered implicit casts (one-hop LIST→child resolution) - Gate derived_pu compilation on PlanHasAggregate; skip pac_finalize in post-optimizer for CTAS - Add 20 stress tests (Tests 6-25) Known limitation: scalar subquery equality on counter columns fails (CTE type propagation) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Aggregates (e.g. first() in scalar subqueries) need raw FLOAT[] counter input. Suppress pac_finalize injection in AGGREGATE subtrees while still propagating counter bindings through the AGGREGATE output so parent PROJECTIONs can apply pac_finalize correctly. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…fter pac_finalize After pac_finalize is injected inside a CASE expression (e.g. CASE WHEN ... ELSE pac_finalize(x) END), the CASE's return_type and any parent BoundSubqueryExpression return_type remain stale (FLOAT[] instead of FLOAT). Fix by detecting pac_finalize in projection expression subtrees and updating expression + operator types. All 42 tests pass (3684 assertions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…res, consistent APIs - Extract ExpressionContainsPacFinalize as named static function (was inline lambda) - Extract CounterListType() to avoid recomputing LIST(PacFloatLogicalType()) everywhere - Simplify DerivedPuGetInfo struct to just unordered_set<idx_t> (only table_index was used) - Use EnumerateExpressions consistently (FixAllStaleRefs, StripRedundantCasts) - Remove all debugging PAC_DEBUG_PRINT from read path - Remove stale/duplicate comments, consolidate skip_wrapping logic - Simplify FixCTERefTypes parameters 513 → 432 lines. All 42 tests pass (3684 assertions). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…compiler style - Add detailed file header explaining read/write paths, key challenges - Add PAC_DEBUG_PRINT at key decision points (GET seeding, wrapping, entry) - Short-circuit ExpressionContainsPacFinalize when found - Remove unused logical_order.hpp include - Add section markers (--- Helpers ---, --- Public API ---) - Improve inline comments explaining WHY not WHAT Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…90 lines) All post-wrapping type fixups are handled inline by WrapCounterRefsWithFinalize (immediate projection type updates + ExpressionContainsPacFinalize) and ResolveOperatorTypes. The separate FixAllStaleRefs, FixCTERefTypes, and subquery return_type patching were never triggered by any of the 42 test cases. 513 → 380 lines. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- ExpressionContainsPacFinalize: take non-const Expression& (EnumerateChildren requires mutable reference; the function doesn't modify the expression) - pac_optimizer.cpp: add GetMutableTableMetadata() to PACMetadataManager instead of const_casting the result of GetTableMetadata() Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
…ting Replace pac_finalize-based comparisons on derived_pu counter columns with pac_filter_<cmp>(scalar, counters) calls that evaluate all 64 subsamples via majority voting. This fixes incorrect lexicographic LIST comparison for scan-level filters and gives more robust results for single-PU groups. - Comparisons (>, >=, <, <=, =, !=) rewritten to pac_filter_<cmp> in WrapExpr - BETWEEN decomposed to pac_filter_lte + pac_filter_gte AND conjunction - Scan-level table_filters rewritten via RewriteCounterFiltersFromGets - DELETE WHERE on counter columns now correctly uses pac_filter_<cmp> - Non-comparison contexts (SELECT, ORDER BY) still use pac_finalize - Removed dead counter_bindings tracking, extracted shared rewrite helpers - Removed unused DecomposeBetweenOnCounterColumns declaration Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
Author
|
@copilot resolve the merge conflicts in this pull request |
Agent-Logs-Url: https://github.com/cwida/pac/sessions/2b9f6861-a7d1-487f-8a6a-ed1bb0f8bb5f Co-authored-by: ila <29451823+ila@users.noreply.github.com>
Contributor
Resolved and pushed in commit 114cc3d. I merged latest |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.