Skip to content

[CI] Testinh#91

Closed
jackluo923 wants to merge 12 commits intoy-scope:release-0.293-clp-connectorfrom
jackluo923:release-0.293-clp-connector-snapshot
Closed

[CI] Testinh#91
jackluo923 wants to merge 12 commits intoy-scope:release-0.293-clp-connectorfrom
jackluo923:release-0.293-clp-connector-snapshot

Conversation

@jackluo923
Copy link
Member

@jackluo923 jackluo923 commented Nov 10, 2025

Description

Checklist

  • The PR satisfies the contribution guidelines.
  • This is a breaking change and that has been indicated in the PR title, OR this isn't a
    breaking change.
  • Necessary docs have been updated, OR no docs need to be updated.

Validation performed

Summary by CodeRabbit

  • New Features

    • Added support for YAML-based metadata configuration as an alternative to database-driven metadata management.
    • Introduced Pinot metadata provider support for split management and filtering, including Uber Pinot variants.
    • Enabled Top-N query optimization to improve performance for limit-based queries.
    • Extended configuration enhancements to support additional provider types for metadata and split operations.
  • Chores

    • Updated workflow configurations to support snapshot-based branch deployments and runner-group assignments.
    • Added dependency version management for YAML processing libraries.

jackluo923 and others added 11 commits October 28, 2025 12:00
…ilds

Change Docker image tagging from static :dev tags to dynamic branch-based tags,
and update all workflow conditions to accept any branch with the
release-0.293-clp-connector-snapshot prefix.

Changes:
- Docker image tags now use actual branch name via type=ref,event=branch
  * Before: ghcr.io/repo/prestissimo-worker:dev
  * After: ghcr.io/repo/prestissimo-worker:release-0.293-clp-connector-snapshot-nov29
- Push conditions changed from exact match to prefix match using startsWith()
- Cancel-in-progress logic updated to exclude all snapshot prefix branches
- PR title checks now accept release-0.293-clp-connector-snapshot* pattern

Example: Pushing to branch "release-0.293-clp-connector-snapshot-nov29" will:
- Build and push images tagged with :release-0.293-clp-connector-snapshot-nov29
- Run all CI jobs to completion without cancellation
- Enable PR title validation for PRs targeting this branch

This allows multiple snapshot branches to coexist with unique image tags.
  Add alternative provider implementations for metadata and split discovery,
  enabling flexible deployment configurations without MySQL dependency.

  New providers:
  - ClpYamlMetadataProvider: Reads table/column metadata from YAML config files
    instead of querying MySQL database. Supports nested schemas and polymorphic
    types through hierarchical YAML structure.

  - ClpPinotSplitProvider: Queries Apache Pinot database for archive file paths
    via HTTP/JSON API instead of MySQL. Determines split types based on file
    extensions (.clp.zst for IR, otherwise ARCHIVE).

  Configuration:
  - Add clp.metadata-provider-type=YAML option for YAML-based metadata
  - Add clp.split-provider-type=PINOT option for Pinot-based split discovery
  - Add clp.metadata-yaml-path config for YAML metadata file location

  The YAML metadata provider parses a two-level structure:
  1. Main metadata file: maps connector → schema → table → schema_file_path
  2. Individual table schema files: define column names and types

  This provides deployment flexibility - users can now choose between:
  - Database-driven (MySQL for both metadata and splits)
  - Hybrid (YAML metadata + Pinot splits)
  - Or any combination based on their infrastructure

  Dependencies: Adds jackson-dataformat-yaml and snakeyaml 2.1
Ensure all timestamp literals are converted to nanoseconds before generating
KQL and SQL filters for pushdown, regardless of their original precision
(milliseconds, microseconds, etc.).

Problem:
Presto supports multiple timestamp precisions (TIMESTAMP with millisecond
precision, TIMESTAMP_MICROSECONDS, etc.), but CLP's query engine expects
timestamps in a consistent nanosecond format. Previously, timestamp literals
were passed through without precision normalization, causing incorrect
results when different timestamp types were used in queries.

Solution:
Add timestamp normalization logic in ClpFilterToKqlConverter:
- tryEnsureNanosecondTimestamp(): Detects TIMESTAMP and TIMESTAMP_MICROSECONDS types
- ensureNanosecondTimestamp(): Converts timestamp values to nanoseconds using:
  1. Extract epoch seconds via TimestampType.getEpochSecond()
  2. Extract nanosecond fraction via TimestampType.getNanos()
  3. Convert to total nanoseconds: SECONDS.toNanos(seconds) + nanosFraction

Applied to:
- BETWEEN operator: Both lower and upper bounds
- Comparison operators: All timestamp literal values (=, !=, <, >, <=, >=)

This ensures consistent timestamp representation in both KQL filters (sent to
CLP engine) and SQL filters (used for metadata filtering), regardless of the
timestamp precision used in the original Presto query.

Example:
Query: WHERE timestamp BETWEEN TIMESTAMP '2025-01-01' AND TIMESTAMP '2025-01-02'
Before: Literal values passed as milliseconds or microseconds
After: Both bounds normalized to nanosecond representation
   Implement a plan optimizer that pushes down LIMIT + ORDER BY operations
   to reduce split scanning when querying on timestamp metadata columns.

   The optimizer detects TopN patterns in query plans and uses archive
   metadata (timestamp bounds, message counts) to select only the minimum
   set of archives needed to guarantee correct results.

   Key features:
   - Detects and rewrites TopN → [Project] → [Filter] → TableScan patterns
   - Pushes TopN specifications through ClpTableLayoutHandle
   - Handles overlapping archives by merging into non-overlapping groups
   - Supports both ASC (earliest-first) and DESC (latest-first) ordering
   - Uses worst-case analysis to ensure correctness when archive ranges overlap

   Implementation:
   - ClpTopNSpec: Data structure for TopN specifications (limit + orderings)
   - ClpComputePushDown: Enhanced plan optimizer with TopN detection
   - ClpMySqlSplitProvider: Archive selection logic using metadata queries
   - Comprehensive test coverage for various TopN scenarios

   Performance: For queries like "SELECT * FROM logs ORDER BY timestamp DESC
   LIMIT 100", this eliminates scanning of archives that cannot contain
   the top results, significantly reducing I/O.
Implement TopN pushdown optimization in ClpPinotSplitProvider to minimize
archive scanning when executing LIMIT + ORDER BY queries on timestamp
metadata columns.

The optimization uses archive metadata (timestamp bounds and message counts)
to intelligently select only the minimum set of archives needed to guarantee
correct TopN results, significantly reducing I/O for time-range queries.

Implementation details:

Archive selection algorithm:
- Fetches archive metadata (file path, creation time, modification time, message count)
- Merges overlapping archives into non-overlapping groups by timestamp ranges
- For DESC ordering: includes newest group + older groups until limit is covered
- For ASC ordering: includes oldest group + newer groups until limit is covered
- Uses worst-case analysis to ensure correctness when archives overlap

Code structure:
- ClpPinotSplitProvider.listSplits(): Detects TopN specs and routes to optimized path
- selectTopNArchives(): Implements the archive selection algorithm
- toArchiveGroups(): Merges overlapping archives into logical components
- ArchiveMeta: Represents archive metadata with validation
- ArchiveGroup: Represents merged archive groups for overlap handling

ClpTopNSpec enhancements:
- Made fields private with proper encapsulation
- Added @JsonProperty annotations to getters for correct serialization
- Maintains immutability for thread-safety

Code quality improvements:
- Proper exception handling with context-aware error messages
- Input validation in constructors (bounds checking, null checks)
- Extracted determineSplitType() helper to eliminate duplication
- Made fields final where immutable (pinotDatabaseUrl, ArchiveMeta fields)
- Improved logging with table names and SQL queries for debugging
- Better encapsulation: private fields, getters-only access

Performance impact:
For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100",
this eliminates scanning of archives outside the time range of interest,
providing substantial I/O reduction for large archive sets.
Implements dynamic schema discovery from YAML metadata files, allowing the CLP connector
to support multiple schemas beyond just "default" (e.g., clp.dev.table, clp.prod.table).

Key changes:
- Add default listSchemaNames() method to ClpMetadataProvider interface for DRY principle
- Implement dynamic schema discovery in ClpYamlMetadataProvider with thread-safe caching
- Update ClpMetadata to delegate schema listing to metadata providers
- Add comprehensive error handling with graceful fallback to default schema
- Optimize performance with double-checked locking and ObjectMapper reuse
- Add 15+ unit tests covering all edge cases and error scenarios
- Fix checkstyle violations in UberClpPinotSplitProvider and ClpPinotSplitProvider

The implementation is backward compatible - existing single-schema setups continue to work
without configuration changes. Multi-schema support is activated automatically when the
YAML metadata file contains multiple schema entries.

Example YAML structure:
```yaml
clp:
  default:
    logs: /path/to/default/logs.yaml
  dev:
    test_logs: /path/to/dev/logs.yaml
  prod:
    production_logs: /path/to/prod/logs.yaml
```

Performance optimizations:
- Thread-safe caching prevents repeated YAML parsing
- Double-checked locking pattern for lazy initialization
- Reused ObjectMapper instances reduce object creation overhead
- Synchronized table schema map updates ensure thread safety

Testing:
- All 11 tests in TestClpYamlMetadataProvider pass
- All 4 tests in TestClpMultiSchema pass
- Fixed testListSchemaNamesNoCatalogField to handle missing catalog gracefully
@coderabbitai
Copy link

coderabbitai bot commented Nov 10, 2025

Walkthrough

This PR introduces YAML-based metadata discovery for CLP, adds Pinot split provider support with Uber-specific variants, implements Top-N optimization pushdown with metadata-only query support, and updates GitHub workflows to use branch prefix-based cancellation and group-based runners.

Changes

Cohort / File(s) Summary
GitHub Workflows Configuration
.github/workflows/docs.yml, .github/workflows/maven-checks.yml, .github/workflows/pr-title-checks.yaml, .github/workflows/prestissimo-worker-images-build.yml, .github/workflows/prestocpp-format-and-header-check.yml, .github/workflows/prestocpp-linux-build-and-unit-test.yml, .github/workflows/tests.yml
Updated cancel-in-progress logic to use startsWith checks for branch prefix release-0.293-clp-connector-snapshot instead of exact branch matches. Replaced fixed ubuntu runner specifications with group-based runners (group: Default). Updated Docker metadata tag generation from type=raw,value=dev to type=ref,event=branch. Adjusted push conditions for image builds to use prefix-based branch matching.
Maven Dependencies
presto-clp/pom.xml, presto-native-execution/pom.xml
Added Jackson YAML dataformat dependency and snakeyaml version 2.1 constraint to dependencyManagement for YAML processing support.
CLP Core Configuration
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
Extended ClpConfig with new metadataYamlPath configuration field and corresponding getter/setter. Extended MetadataProviderType, SplitFilterProviderType, and SplitProviderType enums to include YAML, PINOT, and PINOT_UBER options. Added CLP_UNSUPPORTED_TABLE_SCHEMA_YAML error code.
CLP Metadata Provider Abstraction
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
Added default listSchemaNames() method to interface returning single default schema. Modified ClpMetadata.listSchemaNames() to delegate to clpMetadataProvider instead of returning fixed default.
YAML Metadata Provider Implementation
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java
New class implementing YAML-based schema discovery with thread-safe caching. Parses YAML file from config path to enumerate schemas and tables, with fallback to default schema on missing/malformed files. Implements recursive column type extraction via collectTypes.
Dependency Injection & Module Configuration
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
Introduced centralized provider mappings via static ImmutableMap for split filter, metadata, and split providers. Replaced explicit if-else binding logic with dynamic map-based resolution, supporting new YAML, PINOT, and PINOT_UBER provider types.
CLP Split & Table Layout Handles
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java
Added equals() and hashCode() to ClpSplit. Extended ClpTableLayoutHandle with metadataQueryOnly boolean and topN Optional fields, updated constructors, accessors, and equality/hash/toString methods.
Top-N Optimization Specification
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java
New immutable data class defining Top-N specifications with limit and list of orderings (column name and order direction). Includes Jackson serialization support and validation.
Expression & Filter-to-KQL Conversion
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
Moved ClpExpression to optimization package and added pushDownVariables field with propagation through constructors and accessors. Enhanced ClpFilterToKqlConverter with nanosecond timestamp handling for TIMESTAMP and TIMESTAMP_MICROSECONDS types, and push-down variable set tracking across all expression translations.
Compute Pushdown Optimization
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
Extensive TopN pushdown handling via visitTopN: supports metadata-only TopN, validates ORDER BY expressions, remaps columns via buildOrderColumnName, merges with existing TopN specs, and propagates through table scan layouts. Enhanced processFilter with metadata-only query flagging via isMetadataQueryOnly. Added helper methods for ordering comparison, identifier validation, and dereference chain resolution.
MySQL Split Provider Enhancement
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
Added Top-N archive selection logic via selectTopNArchives, archive grouping/overlap merging, and archive metadata fetching. Introduced ARCHIVES_TABLE_NUM_MESSAGES metadata column and updated SQL generation to select all columns and use message counts.
Pinot Split Providers
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java
New ClpPinotSplitProvider class executing Pinot SQL queries to fetch splits with Top-N support and archive metadata merging. ClpUberPinotSplitProvider extends base provider to use Neutrino globalStatements endpoint and infers metadata table names as rta.logging.<table>.
Split Filter Provider Enhancements
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java
Abstract method remapColumnName() added to base class to support scope-based column name remapping. Updated checkContainsRequiredFilters to accept Set<String> instead of single expression. New ClpPinotSplitFilterProvider delegates to MySQL implementation. New ClpUberPinotSplitFilterProvider adds TEXT_MATCH transformation for equality predicates using merged text index.
UDF Rewriter Optimization
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
Refactored ClpUdfRewriter to use visitor pattern for plan traversal; added visitTableScan to augment outputs with UDF-produced variable mappings. Removed rewritePlanSubtree helper. Reformatted ClpPlanOptimizerProvider parameter list.
Test Infrastructure & Setup
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java
Extended ArchivesTableRow with messageCount field and updated table DDL/DML. Updated test imports and checkContainsRequiredFilters invocations to use set-based parameters.
YAML Metadata Tests
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java
New test classes validating YAML metadata provider discovery of single/multiple schemas, handling of missing/invalid paths, table enumeration, and integration with CLP catalog.
Top-N Optimization Tests
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java
New comprehensive test class for Top-N optimization flow including plan verification via PlanAssert and split set validation for various SQL scenarios. Includes plan ID freshening utility and custom ClpTableScanMatcher.
Filter-to-KQL Converter Tests
presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java
New unit test class validating metadata SQL generation and push-down expressions for string/numeric literals and special character escaping.
UDF Rewriter Tests
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
Updated existing test to include fare column in ORDER BY and extended expected plan columns.
Pinot Split Provider Tests
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java
New test classes validating Uber Pinot endpoint URL construction, metadata table name inference, TEXT_MATCH transformations, range mapping inheritance, and configuration loading.
Test Resource Files
presto-clp/src/test/resources/test-topn-split-filter.json, presto-clp/src/test/resources/test-pinot-split-filter.json
New JSON resource files defining split filter configurations for Top-N and Pinot testing with column mappings, range bounds, and required field markers.
Submodule Update
presto-native-execution/velox
Updated velox submodule reference to new commit.

Sequence Diagram(s)

sequenceDiagram
    participant Query as Query Optimizer
    participant TopN as TopN Visitor
    participant TableScan as TableScan
    participant Layout as ClpTableLayoutHandle
    participant SplitProvider as CLP Split Provider

    Query->>TopN: visitTopN(plan)
    TopN->>TopN: Extract ORDER BY cols
    TopN->>TopN: Build ClpTopNSpec
    TopN->>Layout: Create with metadataQueryOnly,<br/>topN fields
    TopN->>TableScan: Create with new layout
    TableScan->>SplitProvider: listSplits(layout)
    alt Top-N Specified
        SplitProvider->>SplitProvider: Fetch archive metadata
        SplitProvider->>SplitProvider: Merge overlaps
        SplitProvider->>SplitProvider: selectTopNArchives()
        SplitProvider-->>Query: Return minimal archive set
    else No Top-N
        SplitProvider->>SplitProvider: Fetch all splits
        SplitProvider-->>Query: Return full split set
    end
Loading
sequenceDiagram
    participant Config as ClpConfig
    participant Module as ClpModule (DI)
    participant ProviderMap as Static Provider Maps
    participant MetadataImpl as Metadata Provider

    Config->>Config: Set metadataYamlPath
    Module->>ProviderMap: Lookup METADATA_PROVIDER_MAPPINGS
    ProviderMap-->>Module: Return provider class for type
    Module->>MetadataImpl: Create provider instance
    alt MetadataProviderType.YAML
        MetadataImpl->>MetadataImpl: Parse YAML from path
        MetadataImpl->>MetadataImpl: Extract schemas/tables
        MetadataImpl-->>Module: Cached schema list
    else MetadataProviderType.MYSQL
        MetadataImpl->>MetadataImpl: Query database
        MetadataImpl-->>Module: Schema list from DB
    end
    Module-->>Config: Bind provider
Loading
sequenceDiagram
    participant Filter as Filter Pushdown
    participant Converter as ClpFilterToKqlConverter
    participant Expression as ClpExpression
    participant Layout as ClpTableLayoutHandle

    Filter->>Converter: Convert predicate to KQL
    Converter->>Converter: Translate column refs
    Converter->>Converter: Track pushDownVariables set
    Converter->>Expression: Create with pushDownVariables
    Expression-->>Converter: Return KQL + metadata SQL +<br/>pushDownVariables
    Converter->>Layout: Create with isMetadataQueryOnly<br/>flag based on variables
    Layout-->>Filter: Return updated handle
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~45 minutes

Areas requiring extra attention:

  • ClpComputePushDown.java: Dense TopN pushdown logic with multiple helper methods (buildOrderColumnName, areIdents, isDereferenceChainOverVariable, sameOrdering), complex ORDER BY remapping, and conditional metadata-only query flagging. Verify column name remapping through dereference chains and overlap with existing pushdown logic.
  • ClpYamlMetadataProvider.java: Thread-safe caching via volatile fields and double-checked locking, recursive YAML traversal via collectTypes, error-resilient defaults. Validate YAML parsing robustness and cache coherency under concurrent access.
  • ClpPinotSplitProvider.java: Top-N archive selection logic with overlap merging, archive grouping, and DESC/ASC ordering variations. Verify correctness of archive selection algorithm and HTTP communication with Pinot endpoint.
  • ClpUberPinotSplitFilterProvider.java: Regex-based TEXT_MATCH transformation on equality predicates. Verify regex correctness and handling of edge cases (quoted column names, special characters).
  • ClpFilterToKqlConverter.java: Extensive push-down variable propagation across all expression types (AND/OR/LIKE/BETWEEN/IN). Verify variable set accumulation is complete and consistent across all branches.
  • Module and Provider Bindings: New centralized provider mappings and dynamic resolution. Verify all new provider types are correctly registered and that null handling is comprehensive.
  • Test Coverage: New test classes TestClpTopN, TestClpYamlMetadata*, TestClpPinot* are comprehensive but complex. Verify test setup, plan matching logic, and resource cleanup.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues

Pre-merge checks and finishing touches

❌ Failed checks (2 warnings, 1 inconclusive)
Check name Status Explanation Resolution
Description check ⚠️ Warning The PR description is entirely a template with no substantive content filled in; all sections are empty placeholder text. Fill in the Description section with details on what changes are being made (YAML metadata provider, Pinot support, TopN optimization, CI workflow updates). Complete the Checklist and Validation performed sections with actual information about testing and validation performed.
Docstring Coverage ⚠️ Warning Docstring coverage is 45.18% which is insufficient. The required threshold is 80.00%. You can run @coderabbitai generate docstrings to improve docstring coverage.
Title check ❓ Inconclusive The title '[CI] Testinh' is vague and appears to contain a typo; it does not clearly describe the substantial changes made to the pull request. Revise the title to be more descriptive and accurate. The changes include significant updates to CI workflows, addition of YAML metadata provider support, Pinot split provider implementations, and TopN query optimization. Consider a title like 'feat: Add YAML metadata provider and Pinot split provider support with TopN optimization'.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)

88-88: Fix typographical error in JavaDoc.

"correh hsponding" should be "corresponding".

Apply this diff:

-     * @return a map from column handle to its correh hsponding scan-level variable
+     * @return a map from column handle to its corresponding scan-level variable
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and 16d742e.

📒 Files selected for processing (39)
  • .github/workflows/docs.yml (1 hunks)
  • .github/workflows/maven-checks.yml (3 hunks)
  • .github/workflows/pr-title-checks.yaml (1 hunks)
  • .github/workflows/prestissimo-worker-images-build.yml (5 hunks)
  • .github/workflows/prestocpp-format-and-header-check.yml (1 hunks)
  • .github/workflows/prestocpp-linux-build-and-unit-test.yml (4 hunks)
  • .github/workflows/tests.yml (1 hunks)
  • presto-clp/pom.xml (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (4 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (6 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (6 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (18 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/UberClpPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (5 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1 hunks)
  • presto-clp/src/test/resources/test-topn-split-filter.json (1 hunks)
  • presto-native-execution/pom.xml (1 hunks)
  • presto-native-execution/velox (1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
📚 Learning: 2025-11-08T02:05:14.212Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.

Applied to files:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
📚 Learning: 2025-09-12T14:47:20.734Z
Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.

Applied to files:

  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-06-04T14:50:43.554Z
Learnt from: anlowee
Repo: y-scope/presto PR: 7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

Applied to files:

  • .github/workflows/prestocpp-format-and-header-check.yml
  • .github/workflows/prestocpp-linux-build-and-unit-test.yml
  • .github/workflows/prestissimo-worker-images-build.yml
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
🧬 Code graph analysis (17)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
presto-native-execution/presto_cpp/presto_protocol/connector/hive/presto_protocol_hive.h (1)
  • Order (106-109)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
  • ClpYamlMetadataProvider (44-266)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1)
  • ClpPinotSplitProvider (49-369)
presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
  • ColumnMetadata (29-205)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-220)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-220)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (1)
  • ClpExpression (31-141)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-220)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
  • ClpTopNSpec (28-148)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
  • ClpTopNSpec (28-148)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
  • ClpSplit (33-111)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-220)
presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
  • PrestoException (18-56)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (32-70)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
  • ClpComputePushDown (63-484)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
  • ClpTopNSpec (28-148)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (34-180)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (50-187)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
  • ClpMetadataDbSetUp (46-276)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
  • ClpSplit (33-111)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/UberClpPinotSplitProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-220)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)
  • TestClpQueryBase (65-144)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (1)
  • ClpFilterToKqlConverter (103-986)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
  • ClpTableLayoutHandle (26-126)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
  • ClpSplitFilterProvider (50-187)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
  • ClpConnectorFactory (32-70)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
  • ClpTopNSpec (28-148)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
  • ClpYamlMetadataProvider (44-266)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1)
  • ClpPinotSplitProvider (49-369)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
  • ClpYamlMetadataProvider (44-266)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1)
  • Test (35-345)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-220)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
  • ClpMetadata (57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
  • ClpMySqlSplitProvider (39-300)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (1)
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h (1)
  • RowExpression (394-416)
🪛 actionlint (1.7.8)
.github/workflows/prestocpp-format-and-header-check.yml

23-23: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/prestocpp-linux-build-and-unit-test.yml

23-23: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


100-100: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


175-175: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)


251-251: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

.github/workflows/prestissimo-worker-images-build.yml

10-10: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file

(runner-label)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (17.0.13, :presto-main)
  • GitHub Check: test (17.0.13, :presto-main-base)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-main-base)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (46)
presto-clp/pom.xml (1)

78-81: Dependency added with version resolution deferred to parent POM.

The jackson-dataformat-yaml dependency is correctly positioned with other Jackson format/core dependencies and omits an explicit version, relying on the parent's dependencyManagement to define it. Verify that the parent POM (presto-root) declares a compatible version for this artifact.

presto-native-execution/pom.xml (1)

270-278: Verify transitive SnakeYAML dependency and clarify version constraint intent.

This module adds the same SnakeYAML constraint as presto-clp. Ensure this module actually depends on SnakeYAML transitively (either directly or through presto-clp runtime dependency on line 259–260). Additionally, address the "upper bound" comment semantics noted in the parallel presto-clp/pom.xml change.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)

35-38: Formatting change approved, but inconsistent with PR summary.

The constructor parameter formatting improvement enhances readability. However, the PR-level AI summary describes significant feature additions (YAML-based metadata discovery, Pinot split provider support, Top-N query optimisation, GitHub Actions workflow updates), none of which are reflected in this formatting-only change.

presto-native-execution/velox (1)

1-1: Velox submodule bump acknowledged.

Thanks for advancing the submodule pointer to 52bb2fedafd6b7ceb12bb60ef959d2a290e82f17; looks good from my side.

.github/workflows/pr-title-checks.yaml (1)

11-11: Branch trigger pattern update looks good.

The wildcard pattern correctly expands PR title validation to all snapshot-prefixed branches, aligning with the broader branch policy changes across workflows.

.github/workflows/tests.yml (1)

19-21: Prefix-based concurrency policy update is correct.

The new policy preserves in-progress jobs for snapshot branches while cancelling them elsewhere, enabling complete CI validation for each snapshot commit. The logic and comment are clear.

.github/workflows/maven-checks.yml (3)

17-19: Concurrency policy change is consistent.

The cancel-in-progress logic correctly aligns with the pattern used across other workflows in this PR.


101-101: Verify Docker image tagging behaviour change.

Changing from static type=raw,value=dev to dynamic type=ref,event=branch alters image tag generation. This will now produce branch-specific tags (e.g., based on branch name) instead of a fixed "dev" tag.

Please confirm:

  • Downstream systems don't depend on the "dev" tag
  • The dynamic branch-based tagging is the intended behaviour
  • Image versioning/lifecycle will work correctly with the new tagging scheme

112-113: Verify Docker push policy restriction.

The push condition now restricts image pushes to branches starting with release-0.293-clp-connector-snapshot. This is a narrower policy than before.

Please confirm:

  • Image pushes should only occur on snapshot branches (not the target branch release-0.293-clp-connector)
  • This change doesn't inadvertently block releases or production deployments
  • The prefix-based check is the intended long-term branch policy
.github/workflows/docs.yml (1)

23-25: Concurrency policy update is consistent.

The cancel-in-progress logic aligns with changes across other workflows, using the prefix-based exclusion pattern for snapshot branches.

.github/workflows/prestocpp-format-and-header-check.yml (2)

17-19: Concurrency policy update is consistent.

The cancel-in-progress logic aligns with the prefix-based exclusion pattern used across other workflows.


23-23: Custom runner label requires verification and configuration.

The runs-on value y-scope/Default is a custom self-hosted runner label. Static analysis tools flag it as unknown, indicating:

  1. The label may not be registered or properly configured in actionlint.yaml
  2. The runner must be available in your GitHub Actions environment
  3. The runner must have the necessary dependencies (container image: public.ecr.aws/oss-presto/velox-dev:check)

Please verify:

  • The y-scope/Default runner is registered in your GitHub Actions self-hosted runners
  • actionlint.yaml (if present) includes this label in its custom runner configuration
  • The runner has sufficient capacity and resources for this workflow

Note: This same label is used across 5 workflow files in this PR (also in prestocpp-linux-build-and-unit-test.yml and prestissimo-worker-images-build.yml).

.github/workflows/prestocpp-linux-build-and-unit-test.yml (2)

17-19: Concurrency policy update is consistent.

The cancel-in-progress logic aligns with the prefix-based exclusion pattern across all workflows in this PR.


23-23: Custom runner label used across all jobs—verify registration and configuration.

All four jobs in this workflow (lines 23, 100, 175, 251) are updated to use the custom runner label y-scope/Default. Static analysis flags this as an unknown runner label in each case.

Please verify:

  • The y-scope/Default runner is properly registered in your GitHub Actions environment and has capacity for concurrent job execution (4 jobs may run in parallel across matrix configurations)
  • actionlint.yaml is updated to recognize this custom label in its self-hosted runner configuration
  • Each runner instance has the necessary container runtime and dependencies for the containerized workflow steps
  • Resource constraints (CPU, memory) are sufficient, especially for the prestocpp-linux-build-for-test job which performs intensive compilation

Note: This is part of a coordinated change across 5 workflow files in this PR.

Also applies to: 100-100, 175-175, 251-251

.github/workflows/prestissimo-worker-images-build.yml (3)

10-10: Custom runner label requires verification.

The runs-on value y-scope/Default is a custom self-hosted runner label flagged as unknown by static analysis. Please verify runner registration and actionlint configuration as noted in review comments for other workflow files.


28-28: Docker image tagging behaviour changes—verify versioning scheme.

Both dependency and runtime image builds now use type=ref,event=branch tagging instead of the previous static approach. This will generate branch-based tags (e.g., based on the branch name) rather than fixed tags like "dev".

Confirm that:

  • Downstream image consumers can work with dynamically named images
  • The new tagging scheme aligns with your image versioning and lifecycle policies
  • Image references in deployment configurations or documentation are updated accordingly

Also applies to: 46-46


36-37: Verify Docker push policy restriction applies to both images.

Both dependency and runtime images are now restricted to push only when the branch starts with release-0.293-clp-connector-snapshot:

push: >-
  ${{github.event_name != 'pull_request'
  && startsWith(github.ref, 'refs/heads/release-0.293-clp-connector-snapshot')}}

This is a narrower policy. Please confirm:

  • Images should not be pushed to the target branch release-0.293-clp-connector in non-PR events
  • Image releases and deployments will not be affected by this restriction
  • The behaviour matches your CI/CD release strategy

Also applies to: 66-68

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2)

277-277: LGTM! Query and expected columns correctly aligned.

The addition of ORDER BY fare to the query and the corresponding inclusion of the fare column in the expected scan columns (line 297) are properly synchronized. When ordering by a column, it must be included in the table scan, and the test correctly verifies this behavior.

Also applies to: 297-297


269-300: The review comment is incorrect; the code organization is appropriate.

The codebase correctly separates test concerns. ClpTableLayoutHandle provides an overloaded 3-parameter constructor that defaults metadataQueryOnly to false and topN to Optional.empty(). The testClpGetJsonString test appropriately:

  • Uses the 3-parameter constructor at line 295 because it focuses on JSON_STRING functionality, not TopN optimization
  • Has a name matching its scope ("testClpGetJsonString")
  • Works alongside a dedicated test class (TestClpTopN.java) that comprehensively verifies TopN pushdown with the full 5-parameter constructor

Test name, constructor usage, and plan verification patterns are all correct for their intended purpose. No changes needed.

Likely an incorrect or invalid review comment.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (2)

121-132: LGTM! Proper visitor pattern delegation.

The refactoring to use accept(this, context) instead of a custom helper properly delegates subtree traversal to the visitor framework.


135-140: LGTM! Clean visitor implementation.

The refactoring properly delegates source traversal while rewriting the predicate expression.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)

16-16: LGTM!

The import update correctly reflects the package reorganization of ClpExpression to the optimization package.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)

31-31: LGTM!

The new error code is appropriately named and follows the existing pattern for external error types. This aligns with the YAML metadata provider functionality introduced in this PR.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)

97-110: LGTM!

The constructor validation ensures that limit is positive and orderings is non-empty, which are sensible constraints for a Top-N specification.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (2)

953-971: Consider error handling for invalid timestamp literals.

The ensureNanosecondTimestamp method calls Long.parseLong(literalString) (Line 966) which can throw NumberFormatException if the literal is not a valid long. While Presto should guarantee valid constants, adding error handling or documentation would improve robustness.

Consider adding a try-catch or documenting the assumption:

     private static String ensureNanosecondTimestamp(TimestampType type, String literalString)
     {
-        long literalNumber = Long.parseLong(literalString);
+        long literalNumber;
+        try {
+            literalNumber = Long.parseLong(literalString);
+        } catch (NumberFormatException e) {
+            throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION,
+                "Invalid timestamp literal: " + literalString, e);
+        }
         long seconds = type.getEpochSecond(literalNumber);
         long nanosecondFraction = type.getNanos(literalNumber);
         long nanoseconds = SECONDS.toNanos(seconds) + nanosecondFraction;
         return Long.toString(nanoseconds);
     }

698-738: LGTM!

The handleAnd method correctly accumulates pushDownVariables from all child expressions using ImmutableSet.Builder, ensuring proper variable tracking through AND operations.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (2)

46-56: LGTM!

The primary constructor properly ensures immutability by using ImmutableSet.copyOf() for the pushDownVariables parameter, which is a best practice for defensive copying.


137-140: LGTM!

The accessor returns the immutable set directly, which is safe since ImmutableSet cannot be modified.

presto-clp/src/test/resources/test-topn-split-filter.json (1)

1-14: LGTM!

The test resource defines a valid split filter configuration with timestamp range mapping, which supports the Top-N functionality introduced in this PR.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)

82-85: LGTM!

Delegating to the metadata provider enables dynamic schema name discovery, which is essential for supporting YAML-based metadata configurations introduced in this PR.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)

81-98: LGTM! Well-implemented equals and hashCode methods.

The implementation correctly follows best practices: same-reference check, null and class checks, and field-by-field comparison. All three fields (path, type, kqlQuery) are consistently used in both methods, ensuring proper contract adherence.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1)

60-61: LGTM! Test correctly exercises the updated API.

The test correctly adapts to the new checkContainsRequiredFilters signature, passing Set<String> of column names rather than filter expressions. The test validates that "level" is not a required filter (throws exception) whilst "msg.timestamp" is required (passes).

presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)

31-43: LGTM! Well-documented default method provides backward compatibility.

The default implementation returns a single-element list containing the default schema, which maintains backward compatibility for existing implementations whilst enabling new implementations (like ClpYamlMetadataProvider) to override for multi-schema support.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)

89-111: LGTM! Correctly implements scope-based column remapping.

The implementation properly aggregates mappings from the scope hierarchy (catalog → schema → table) and returns either the mapped bounds (lowerBound, upperBound) or the original column name. The logic is consistent with the existing remapSplitFilterPushDownExpression method.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)

19-19: LGTM! Properly wired new providers.

The additions correctly integrate the YAML metadata provider and Pinot split provider into the dependency injection framework, following the established pattern for provider registration with appropriate singleton scoping.


61-63: LGTM! YAML metadata provider binding correctly configured.

The binding follows the established pattern and properly registers ClpYamlMetadataProvider as the implementation when the YAML provider type is selected.


71-73: LGTM! Pinot split provider binding correctly configured.

The binding follows the established pattern and properly registers ClpPinotSplitProvider as the implementation when the PINOT provider type is selected.

presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (5)

35-49: LGTM! Proper test lifecycle management.

The test class correctly uses singleThreaded = true and implements proper cleanup of temporary files in the @AfterClass method.


54-104: LGTM! Comprehensive schema discovery test coverage.

The tests thoroughly validate schema name discovery for both single and multiple schema scenarios, properly exercising the YAML metadata provider's core functionality.


109-160: LGTM! Excellent error handling test coverage.

The tests comprehensively validate graceful degradation for error cases: missing YAML path, nonexistent files, and malformed YAML. All correctly verify fallback to the default schema.


188-249: LGTM! Thorough table handle validation.

The tests effectively validate table discovery for both single and multiple schema configurations, ensuring correct table-to-schema association.


254-328: LGTM! Edge case coverage is excellent.

The tests validate consistency across multiple calls, empty schemas, and special characters in schema names, ensuring robust behaviour across various scenarios.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)

156-166: LGTM! Configuration field properly implemented.

The new metadataYamlPath configuration field follows the established pattern with appropriate getter/setter and @Config annotation. The YAML metadata provider handles missing or invalid paths gracefully with fallback to default schema, so absence of validation here is acceptable.


206-218: LGTM! Enum extensions support new provider types.

The additions of YAML to MetadataProviderType and PINOT to SplitProviderType properly extend the configuration to support the new providers introduced in this PR.

presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (2)

94-108: LGTM! Well-documented abstract method for column remapping.

The new remapColumnName method provides a clean abstraction for scope-based column name remapping, with comprehensive Javadoc explaining the scope hierarchy and expected behaviour.


117-134: LGTM! Improved signature for required filter checking.

The signature change from String splitFilterPushDownExpression to Set<String> pushDownVariables is clearer and more direct, checking for column name presence rather than parsing expressions. The implementation correctly validates required columns against the provided set.

Comment on lines +163 to +171
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>2.1</version> <!-- upper bound -->
</dependency>
</dependencies>
</dependencyManagement>
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Clarify SnakeYAML version constraint intent.

The comment references "upper bound," but Maven's dependencyManagement with a static version (e.g., 2.1) sets an exact version, not a range constraint. If an upper bound is truly intended, use version-range syntax (e.g., [2.0,2.1]). If 2.1 is the enforced/pinned version, update the comment for clarity.

-            <version>2.1</version> <!-- upper bound -->
+            <version>2.1</version> <!-- pinned version for YAML processing -->
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>2.1</version> <!-- upper bound -->
</dependency>
</dependencies>
</dependencyManagement>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.yaml</groupId>
<artifactId>snakeyaml</artifactId>
<version>2.1</version> <!-- pinned version for YAML processing -->
</dependency>
</dependencies>
</dependencyManagement>
🤖 Prompt for AI Agents
In presto-clp/pom.xml around lines 163 to 171, the comment "upper bound" is
misleading because declaring <version>2.1</version> in dependencyManagement pins
an exact version rather than expressing a range; either change the version
element to a Maven version range (e.g., [2.0,2.1]) if you intend an upper bound,
or update the comment to say "pinned/enforced version 2.1" to reflect that this
is an exact version. Ensure the pom comment matches the chosen intent.

Comment on lines +311 to +316
ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
clpTableHandle,
kqlQuery,
metadataSqlQuery,
metadataColumnNames.equals(clpExpression.getPushDownVariables()),
Optional.empty());
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Block TopN pushdown when KQL is present

Line 312 currently sets metadataQueryOnly purely by comparing metadata column sets. If the query still has a KQL predicate (e.g., WHERE msg.timestamp BETWEEN … AND city = 'Calgary'), we still get metadataQueryOnly = true. The TopN rewriter then pushes the spec into the layout, and the split provider trims archives using only metadata ranges. Because that trimming ignores the row-level KQL filter, archives containing the actual top-N rows can be dropped, and the query returns fewer rows than the LIMIT demands. Please ensure metadata-only is true only when there is no KQL predicate and the pushed-down variables are a subset of the metadata columns.

-            splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), clpExpression.getPushDownVariables());
+            Set<String> pushDownVariables = clpExpression.getPushDownVariables();
+            splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), pushDownVariables);
             boolean hasMetadataFilter = metadataSqlQuery.isPresent() && !metadataSqlQuery.get().isEmpty();
             if (hasMetadataFilter) {
                 metadataSqlQuery = Optional.of(splitFilterProvider.remapSplitFilterPushDownExpression(tableScope, metadataSqlQuery.get()));
                 log.debug("Metadata SQL query: %s", metadataSqlQuery.get());
             }
 
             if (kqlQuery.isPresent() || hasMetadataFilter) {
                 if (kqlQuery.isPresent()) {
                     log.debug("KQL query: %s", kqlQuery.get());
                 }
 
-                ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
+                boolean metadataQueryOnly = !kqlQuery.isPresent() && metadataColumnNames.containsAll(pushDownVariables);
+                ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
                         clpTableHandle,
                         kqlQuery,
                         metadataSqlQuery,
-                        metadataColumnNames.equals(clpExpression.getPushDownVariables()),
+                        metadataQueryOnly,
                         Optional.empty());
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
around lines 311-316, the metadataQueryOnly flag is currently derived only by
comparing metadataColumnNames to the pushed-down variables, which incorrectly
allows metadata-only when a KQL predicate exists; change the condition so
metadataQueryOnly is true only when there is no KQL predicate (kqlQuery is null
or empty) AND the pushed-down variables are a subset of the metadata columns
(use metadataColumnNames.containsAll(clpExpression.getPushDownVariables())).
Construct the ClpTableLayoutHandle with that combined boolean instead of the
current equality check.

Comment on lines 483 to 489
else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) {
if (metadataFilterColumns.contains(variableName)) {
metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString);
metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString, ImmutableSet.of(variableName));
}
return new ClpExpression(format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery);
return new ClpExpression(
format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Fix format string argument mismatch.

Line 485 has a format string with 3 placeholders ("\"%s\" %s %s") but 4 arguments are provided, including an extra ImmutableSet.of(variableName) that should not be in the format call.

Apply this diff:

         else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) {
             if (metadataFilterColumns.contains(variableName)) {
-                metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString, ImmutableSet.of(variableName));
+                metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString);
             }
             return new ClpExpression(
                     format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName));
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) {
if (metadataFilterColumns.contains(variableName)) {
metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString);
metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString, ImmutableSet.of(variableName));
}
return new ClpExpression(format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery);
return new ClpExpression(
format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName));
}
else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) {
if (metadataFilterColumns.contains(variableName)) {
metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString);
}
return new ClpExpression(
format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName));
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 483 to 489, the format call used to build metadataSqlQuery supplies
four arguments while the format string only has three placeholders; remove the
extraneous ImmutableSet.of(variableName) from the format(...) call so format
receives exactly the three string arguments (variableName,
operator.getOperator(), literalString), and ensure the
ImmutableSet.of(variableName) remains passed only to the ClpExpression
constructor as the metadata set.

Comment on lines +47 to +54
@JsonCreator
public Ordering(
@JsonProperty("columns") List<String> columns,
@JsonProperty("order") Order order)
{
this.columns = requireNonNull(columns, "column is null");
this.order = requireNonNull(order, "order is null");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix typo in error message.

The error message on Line 52 says "column is null" but should be "columns is null" (plural) since the parameter is columns.

Apply this diff:

         @JsonCreator
         public Ordering(
                 @JsonProperty("columns") List<String> columns,
                 @JsonProperty("order") Order order)
         {
-            this.columns = requireNonNull(columns, "column is null");
+            this.columns = requireNonNull(columns, "columns is null");
             this.order = requireNonNull(order, "order is null");
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@JsonCreator
public Ordering(
@JsonProperty("columns") List<String> columns,
@JsonProperty("order") Order order)
{
this.columns = requireNonNull(columns, "column is null");
this.order = requireNonNull(order, "order is null");
}
@JsonCreator
public Ordering(
@JsonProperty("columns") List<String> columns,
@JsonProperty("order") Order order)
{
this.columns = requireNonNull(columns, "columns is null");
this.order = requireNonNull(order, "order is null");
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java
around lines 47 to 54, the requireNonNull call uses the message "column is null"
which is a typo; change the message to "columns is null" to match the parameter
name and plurality.

Comment on lines +142 to 164
@Override
public PlanNode visitTableScan(TableScanNode node, RewriteContext<Void> context)
{
Set<VariableReferenceExpression> outputVars = new LinkedHashSet<>(node.getOutputVariables());
Map<VariableReferenceExpression, ColumnHandle> newAssignments = new HashMap<>(node.getAssignments());

// Add any missing variables for known handles
globalColumnVarMap.forEach((handle, var) -> {
outputVars.add(var);
newAssignments.put(var, handle);
});

return new TableScanNode(
node.getSourceLocation(),
idAllocator.getNextId(),
node.getTable(),
new ArrayList<>(outputVars),
newAssignments,
node.getTableConstraints(),
node.getCurrentConstraint(),
node.getEnforcedConstraint(),
node.getCteMaterializationInfo());
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify correctness for join queries with multiple tables.

The visitTableScan method unconditionally injects ALL variables from globalColumnVarMap into every TableScanNode. In join scenarios with multiple tables, this could incorrectly add column handles from one table into another table's scan node, as ClpColumnHandle doesn't appear to have table-scoping information.

Based on learnings

Run the following script to check if there are multi-table join scenarios that could be affected:


🏁 Script executed:

#!/bin/bash
# Search for join-related test cases or usages that might involve multiple TableScanNodes

# Look for join-related tests in the CLP plugin
fd -e java -e sql --type f . presto-clp | xargs rg -nP -C5 '\b(join|JOIN)\b' -g '!**/target/**'

# Search for existing multi-table query handling in CLP optimizer/rewriter
ast-grep --pattern 'class $_ {
  $$$
  visitTableScan($_, $_) {
    $$$
  }
  $$$
}'

Length of output: 3074


I need to examine the ClpUdfRewriter implementation more closely to verify the concern about table-level filtering in the visitTableScan method.


🏁 Script executed:

#!/bin/bash
# Get the full ClpUdfRewriter class to understand globalColumnVarMap and visitTableScan context

cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java

# Also search for where globalColumnVarMap is declared and initialized
rg -nP 'globalColumnVarMap' presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java -B2 -A2

# Search for test files with JOIN and TableScan
fd -e java --type f . presto-clp | xargs rg -l 'JOIN.*TableScan|TableScan.*JOIN' -i

# Look for ClpColumnHandle to see if it has table info
ast-grep --pattern 'class ClpColumnHandle { $$$ }'

Length of output: 14856


I need to verify if ClpColumnHandle has any table-scoping information in its equality comparison.


🏁 Script executed:

#!/bin/bash
# Find and examine ClpColumnHandle class
fd -e java --type f . presto-clp | xargs rg -l 'class ClpColumnHandle'

# Get the full ClpColumnHandle class definition
fd -e java --type f . presto-clp | xargs cat | grep -A 50 'class ClpColumnHandle'

# Look for equals and hashCode methods in ClpColumnHandle
fd -e java --type f . presto-clp | xargs rg -nP 'class ClpColumnHandle|equals\(|hashCode\(' -A 3 -B 1

Length of output: 264994


The web search didn't provide direct information about ClpColumnHandle. Let me search more specifically for the file within the Presto repository context.


🏁 Script executed:

#!/bin/bash
# Read the ClpColumnHandle.java file directly
cat git/presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java

Length of output: 112



🏁 Script executed:

#!/bin/bash
# Find ClpColumnHandle.java file
fd -name "ClpColumnHandle.java" --type f

# Also check if presto-clp directory exists
ls -la presto-clp/src/main/java/com/facebook/presto/plugin/clp/ 2>/dev/null | head -20

Length of output: 1502



🏁 Script executed:

#!/bin/bash
# Read the ClpColumnHandle.java file
cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.java

Length of output: 2966


Appliquer un filtrage au niveau de la table dans visitTableScan pour éviter l'injection inter-tables.

La méthode visitTableScan ajoute sans condition TOUS les handles de colonne de globalColumnVarMap à chaque TableScanNode. Puisque ClpColumnHandle n'inclut pas d'identifiant de table (l'égalité repose uniquement sur le nom de colonne et le type), dans les requêtes JOIN avec plusieurs tables, cela injectera incorrectement des handles de colonne d'une table dans le nœud de scan d'une autre table.

Comme l'indique l'apprentissage pour ClpPlanOptimizer, chaque TableScanNode doit déterminer sa portée à partir de son source immédiate. Appliquez un filtrage pour assurer que seuls les handles appartenant à la table actuelle sont ajoutés.

Comment on lines +141 to +153
private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
{
List<ArchiveMeta> list = new ArrayList<>();
try (Connection connection = getConnection();
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery()) {
while (rs.next()) {
list.add(new ArchiveMeta(
rs.getString(ARCHIVES_TABLE_COLUMN_ID),
rs.getLong(ordering.getColumns().get(0)),
rs.getLong(ordering.getColumns().get(1)),
rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Guard against single-column top-N metadata

Top-N pushdown routinely produces an ordering with exactly one element (the column named in ORDER BY). In that case the new code hits ordering.getColumns().get(1) before even talking to MySQL, throwing IndexOutOfBoundsException and killing the query instead of falling back to a full scan. We need to validate the ordering shape (or otherwise derive the range columns) before dereferencing to keep production queries alive.

@@
-    private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
-    {
-        List<ArchiveMeta> list = new ArrayList<>();
+    private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
+    {
+        List<String> orderingColumns = ordering.getColumns();
+        if (orderingColumns.size() < 2) {
+            throw new IllegalArgumentException("Top-N ordering must expose both lower and upper bound metadata columns: " + orderingColumns);
+        }
+        String lowerBoundColumn = orderingColumns.get(0);
+        String upperBoundColumn = orderingColumns.get(1);
+
+        List<ArchiveMeta> list = new ArrayList<>();
@@
-                list.add(new ArchiveMeta(
-                        rs.getString(ARCHIVES_TABLE_COLUMN_ID),
-                        rs.getLong(ordering.getColumns().get(0)),
-                        rs.getLong(ordering.getColumns().get(1)),
-                        rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));
+                list.add(new ArchiveMeta(
+                        rs.getString(ARCHIVES_TABLE_COLUMN_ID),
+                        rs.getLong(lowerBoundColumn),
+                        rs.getLong(upperBoundColumn),
+                        rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
{
List<ArchiveMeta> list = new ArrayList<>();
try (Connection connection = getConnection();
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery()) {
while (rs.next()) {
list.add(new ArchiveMeta(
rs.getString(ARCHIVES_TABLE_COLUMN_ID),
rs.getLong(ordering.getColumns().get(0)),
rs.getLong(ordering.getColumns().get(1)),
rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));
}
private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
{
List<String> orderingColumns = ordering.getColumns();
if (orderingColumns.size() < 2) {
throw new IllegalArgumentException("Top-N ordering must expose both lower and upper bound metadata columns: " + orderingColumns);
}
String lowerBoundColumn = orderingColumns.get(0);
String upperBoundColumn = orderingColumns.get(1);
List<ArchiveMeta> list = new ArrayList<>();
try (Connection connection = getConnection();
PreparedStatement stmt = connection.prepareStatement(query);
ResultSet rs = stmt.executeQuery()) {
while (rs.next()) {
list.add(new ArchiveMeta(
rs.getString(ARCHIVES_TABLE_COLUMN_ID),
rs.getLong(lowerBoundColumn),
rs.getLong(upperBoundColumn),
rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));
}

Comment on lines +58 to +62
@Override
protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException
{
return new URL(config.getMetadataDbUrl() + "/v1/globalStatements");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick | 🔵 Trivial

Consider safer URL construction.

Direct string concatenation may produce malformed URLs if getMetadataDbUrl() ends with a trailing slash (e.g., http://example.com//v1/globalStatements).

Consider using URL constructor chaining or normalizing the base URL:

     protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException
     {
-        return new URL(config.getMetadataDbUrl() + "/v1/globalStatements");
+        String baseUrl = config.getMetadataDbUrl();
+        if (baseUrl.endsWith("/")) {
+            baseUrl = baseUrl.substring(0, baseUrl.length() - 1);
+        }
+        return new URL(baseUrl + "/v1/globalStatements");
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@Override
protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException
{
return new URL(config.getMetadataDbUrl() + "/v1/globalStatements");
}
@Override
protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException
{
String baseUrl = config.getMetadataDbUrl();
if (baseUrl.endsWith("/")) {
baseUrl = baseUrl.substring(0, baseUrl.length() - 1);
}
return new URL(baseUrl + "/v1/globalStatements");
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/UberClpPinotSplitProvider.java
around lines 58 to 62, building the endpoint by concatenating strings can
produce malformed URLs when config.getMetadataDbUrl() has a trailing slash;
normalize or resolve the path instead: parse the base with new URL(base) or new
URI(base) and then resolve or construct the endpoint using URL/URI resolution
(e.g., new URL(baseUrl, "/v1/globalStatements") or
baseUri.resolve("/v1/globalStatements").toURL()), ensuring you handle or
propagate MalformedURLException appropriately.

Comment on lines +39 to +128
private static final String PINOT_BROKER_URL = "http://localhost:8099";
private static final String METADATA_YAML_PATH = "/home/xiaochong-dev/presto-e2e/pinot/tables-schema.yaml";
private static final String TABLE_NAME = "cockroachdb";
private ClpMetadata metadata;
private ClpSplitProvider clpSplitProvider;

@BeforeTest
public void setUp()
{
ClpConfig config = new ClpConfig()
.setPolymorphicTypeEnabled(true)
.setMetadataDbUrl(PINOT_BROKER_URL)
.setMetadataProviderType(YAML)
.setMetadataYamlPath(METADATA_YAML_PATH);
ClpMetadataProvider metadataProvider = new ClpYamlMetadataProvider(config);
metadata = new ClpMetadata(config, metadataProvider);
clpSplitProvider = new ClpPinotSplitProvider(config);
}

@Test
public void testListSchemaNames()
{
assertEquals(metadata.listSchemaNames(SESSION), ImmutableList.of(DEFAULT_SCHEMA_NAME));
}

@Test
public void testListTables()
{
ImmutableSet.Builder<SchemaTableName> builder = ImmutableSet.builder();
builder.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME));
assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), builder.build());
}

@Test
public void testListSplits()
{
ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
new ClpTableHandle(new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME), ""),
Optional.empty(),
Optional.empty());
List<ClpSplit> result = clpSplitProvider.listSplits(layoutHandle);
System.out.println("Hello world");
}

@Test
public void testGetTableMetadata()
{
ClpTableHandle clpTableHandle = (ClpTableHandle) metadata.getTableHandle(SESSION, new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME));
ConnectorTableMetadata tableMetadata = metadata.getTableMetadata(SESSION, clpTableHandle);
// ImmutableSet<ColumnMetadata> columnMetadata = ImmutableSet.<ColumnMetadata>builder()
// .add(ColumnMetadata.builder()
// .setName("a_bigint")
// .setType(BIGINT)
// .setNullable(true)
// .build())
// .add(ColumnMetadata.builder()
// .setName("a_varchar")
// .setType(VARCHAR)
// .setNullable(true)
// .build())
// .add(ColumnMetadata.builder()
// .setName("b_double")
// .setType(DOUBLE)
// .setNullable(true)
// .build())
// .add(ColumnMetadata.builder()
// .setName("b_varchar")
// .setType(VARCHAR)
// .setNullable(true)
// .build())
// .add(ColumnMetadata.builder()
// .setName("c")
// .setType(RowType.from(ImmutableList.of(
// RowType.field("d", BOOLEAN),
// RowType.field("e", VARCHAR))))
// .setNullable(true)
// .build())
// .add(ColumnMetadata.builder()
// .setName("f")
// .setType(RowType.from(ImmutableList.of(
// RowType.field("g",
// RowType.from(ImmutableList.of(
// RowType.field("h", new ArrayType(VARCHAR))))))))
// .setNullable(true)
// .build())
// .build();
// assertEquals(columnMetadata, ImmutableSet.copyOf(tableMetadata.getColumns()));
ImmutableSet<ColumnMetadata> actual = ImmutableSet.copyOf(tableMetadata.getColumns());
System.out.println("Hello world");
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Make the YAML metadata test hermetic

This test can’t pass on CI: it hard-codes a developer-specific path (/home/xiaochong-dev/...) and constructs a real ClpPinotSplitProvider, which immediately tries to POST to a Pinot broker on localhost:8099. Neither resource exists in automation, so the suite now fails before asserting anything. Please rewrite it to follow the pattern we use in TestClpMultiSchema—generate temporary YAML fixtures and exercise metadata using in-memory or stubbed providers—or quarantine it behind an integration-only guard. As written it blocks every clean build while providing no coverage.

@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector-snapshot branch from ba1a39a to 68549dc Compare November 10, 2025 06:21
@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector branch 2 times, most recently from e5bfe33 to ddfca65 Compare November 10, 2025 13:38
@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector-snapshot branch from 8f3858a to 5adb10a Compare November 10, 2025 14:05
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68549dc and 5adb10a.

📒 Files selected for processing (2)
  • .github/workflows/maven-checks.yml (4 hunks)
  • .github/workflows/tests.yml (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-04T14:50:43.554Z
Learnt from: anlowee
Repo: y-scope/presto PR: 7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.

Applied to files:

  • .github/workflows/tests.yml
🪛 YAMLlint (1.37.1)
.github/workflows/maven-checks.yml

[error] 26-26: too many spaces inside brackets

(brackets)


[error] 26-26: too many spaces inside brackets

(brackets)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (5)
.github/workflows/tests.yml (1)

19-21: Prefix-based concurrency exclusion is logically sound.

The startsWith logic correctly excludes branches with the release-0.293-clp-connector-snapshot prefix from in-progress job cancellation, ensuring each commit to these branches is fully validated.

.github/workflows/maven-checks.yml (4)

17-19: Prefix-based concurrency exclusion is logically sound.

The startsWith logic correctly excludes branches with the release-0.293-clp-connector-snapshot prefix from in-progress job cancellation, consistent with the pattern applied across related workflows.


114-115: Push condition update aligns with snapshot branch strategy.

The condition changed from exact ref match to prefix-based startsWith for release snapshot branches, aligning with the broader concurrency and branch exclusion strategy across workflows.


27-28: Formatting is consistent with repository patterns; runner group configuration requires manual verification.

The group: Default format used in maven-checks.yml at lines 27–28 and 72–73 aligns with the established multi-line runs-on: pattern across the repository (prestocpp-linux-build-and-unit-test.yml, prestocpp-format-and-header-check.yml, prestissimo-worker-images-build.yml, and tests.yml). The YAML indentation and formatting are consistent throughout.

However, verify that the Default runner group is actually configured in your GitHub Actions environment settings. This cannot be confirmed from the codebase alone.


103-103: Docker image metadata tagging format verified as correct.

The type=ref,event=branch format is valid for docker/metadata-action and generates tags from Git references when triggered by branch events. For branch events, the action sanitizes branch references into tags (e.g., refs/heads/my/branch → my-branch). The implementation matches the documented behavior for branch-based Docker builds.

strategy:
fail-fast: false
matrix:
java: [ 8.0.442, 17.0.13 ]
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Fix YAML bracket spacing to align with linting standards.

YAMLlint flags excessive whitespace inside brackets. Reduce spacing to meet linting standards.

-        java: [ 8.0.442, 17.0.13 ]
+        java: [8.0.442, 17.0.13]
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
java: [ 8.0.442, 17.0.13 ]
java: [8.0.442, 17.0.13]
🧰 Tools
🪛 YAMLlint (1.37.1)

[error] 26-26: too many spaces inside brackets

(brackets)


[error] 26-26: too many spaces inside brackets

(brackets)

🤖 Prompt for AI Agents
.github/workflows/maven-checks.yml around line 26: the array spacing "java: [
8.0.442, 17.0.13 ]" violates YAML lint rules; remove the extra spaces inside the
brackets so the array uses standard spacing like "java: [8.0.442, 17.0.13]" to
satisfy the linter.


test:
runs-on: ubuntu-latest
runs-on: ${{ github.repository_owner == 'y-scope' && fromJSON('{"group":"Default"}') || 'ubuntu-latest' }}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

Verify grouped runner syntax for conditional assignment.

The runs-on expression uses fromJSON('{"group":"Default"}') to conditionally select a runner group when the repository owner is y-scope. Verify that this syntax is the correct and documented approach for GitHub Actions grouped runners in your runner configuration.


🌐 Web query:

GitHub Actions grouped runners conditional assignment syntax fromJSON

💡 Result:

Use fromJSON to turn a job output (JSON string) into a matrix or to index into an object for runs-on. Common patterns:

  • Dynamic matrix from a JSON array output:
    jobs:
    dynamic:
    needs: generate
    strategy:
    matrix:
    runner: ${{ fromJSON(needs.generate.outputs.runners) }}
    runs-on: ${{ matrix.runner }}

  • Pick runs-on per matrix entry from a JSON object output:
    jobs:
    test:
    needs: setup
    strategy:
    matrix:
    run: ['app1','app2','app3']
    runs-on: ${{ fromJSON(needs.setup.outputs.runners)[matrix.run] }}

  • Guard a job only when the JSON array is non-empty:
    if: ${{ fromJSON(needs.setup.outputs.runners) != '[]' }}

References: GitHub Actions docs (matrix/fromJSON examples) and community examples. [1][2]

Sources
[1] GitHub Docs — Running variations of jobs in a workflow (matrix/fromJSON examples).
[2] Stack Overflow — using fromJSON to select runs-on from prior job outputs.


Correct the runs-on grouped runner syntax—use proper YAML structure instead of fromJSON.

The documented syntax for GitHub Actions grouped runners uses YAML object notation: runs-on: group: ubuntu-runners or combined with labels runs-on: group: ubuntu-runners labels: ubuntu-20.04-16core. The current expression using fromJSON('{"group":"Default"}') does not match this pattern. The runs-on value expects a runner label or the name of a runner group, not a parsed JSON object.

For conditional assignment, use a string expression instead:

runs-on: ${{ github.repository_owner == 'y-scope' && 'Default' || 'ubuntu-latest' }}

Or if targeting a named runner group that requires the group: syntax, structure it as a conditional job or use a different approach.

🤖 Prompt for AI Agents
.github/workflows/tests.yml around line 44: the runs-on uses
fromJSON('{"group":"Default"}') which is invalid for runner groups; replace the
expression with a proper string conditional or the YAML group form — e.g. change
the runs-on to a string conditional like ${{ github.repository_owner ==
'y-scope' && 'Default' || 'ubuntu-latest' }} or, if you must target a runner
group object, restructure the job definition to use the group: key under runs-on
(conditionalize the whole job or use separate jobs) so runs-on receives either a
label string or a proper group: mapping.

@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector branch from ddfca65 to bfa18ff Compare November 10, 2025 15:07
@jackluo923 jackluo923 force-pushed the release-0.293-clp-connector-snapshot branch from dd5c838 to 3105485 Compare November 10, 2025 16:12
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between dd5c838 and 519a803.

📒 Files selected for processing (10)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/resources/test-pinot-split-filter.json (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.

Applied to files:

  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java
🧬 Code graph analysis (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1)
  • Test (35-244)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
  • ClpYamlMetadataProvider (44-266)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
  • ClpMySqlSplitProvider (39-300)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1)
  • ClpPinotSplitProvider (51-432)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1)
  • ClpUberPinotSplitProvider (37-123)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
  • ClpMySqlSplitFilterProvider (34-180)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1)
  • ClpPinotSplitFilterProvider (25-33)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1)
  • ClpUberPinotSplitFilterProvider (42-122)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1)
  • Test (31-155)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
  • ClpConfig (21-223)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
  • ClpTopNSpec (28-148)
presto-native-execution/presto_cpp/presto_protocol/connector/clp/presto_protocol_clp.h (1)
  • SplitType (52-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
  • GitHub Check: test (8.0.442, :presto-main)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
  • GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
  • GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: prestissimo-worker-images-build
  • GitHub Check: prestocpp-linux-build-for-test
  • GitHub Check: maven-checks (8.0.442)
  • GitHub Check: maven-checks (17.0.13)
  • GitHub Check: prestissimo-worker-images-build

Comment on lines +340 to +392
private static List<JsonNode> getQueryResult(URL url, String sql)
{
try {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Accept", "application/json");
conn.setDoOutput(true);
conn.setConnectTimeout((int) SECONDS.toMillis(5));
conn.setReadTimeout((int) SECONDS.toMillis(30));

log.info("Executing Pinot query: %s", sql);
ObjectMapper mapper = new ObjectMapper();
String body = format("{\"sql\": %s }", mapper.writeValueAsString(sql));
try (OutputStream os = conn.getOutputStream()) {
os.write(body.getBytes(StandardCharsets.UTF_8));
}

int code = conn.getResponseCode();
InputStream is = (code >= 200 && code < 300) ? conn.getInputStream() : conn.getErrorStream();
if (is == null) {
throw new IOException("Pinot HTTP " + code + " with empty body");
}

JsonNode root;
try (InputStream in = is) {
root = mapper.readTree(in);
}
JsonNode resultTable = root.get("resultTable");
if (resultTable == null) {
throw new IllegalStateException("Pinot query response missing 'resultTable' field");
}
JsonNode rows = resultTable.get("rows");
if (rows == null) {
throw new IllegalStateException("Pinot query response missing 'rows' field in resultTable");
}
ImmutableList.Builder<JsonNode> resultBuilder = ImmutableList.builder();
for (Iterator<JsonNode> it = rows.elements(); it.hasNext(); ) {
JsonNode row = it.next();
resultBuilder.add(row);
}
List<JsonNode> results = resultBuilder.build();
log.debug("Number of results: %s", results.size());
return results;
}
catch (IOException e) {
log.error(e, "IO error executing Pinot query: %s", sql);
return Collections.emptyList();
}
catch (Exception e) {
log.error(e, "Unexpected error executing Pinot query: %s", sql);
return Collections.emptyList();
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Do not drop Pinot query failures. Returning an empty list when the HTTP call or JSON parsing fails makes listSplits believe there are zero splits, so the coordinator will quietly run the query against no data instead of surfacing the underlying failure. That is silent data loss. Please log and rethrow so the query fails fast instead of proceeding.

Apply this diff to rethrow:

         catch (IOException e) {
             log.error(e, "IO error executing Pinot query: %s", sql);
-            return Collections.emptyList();
+            throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e);
         }
         catch (Exception e) {
             log.error(e, "Unexpected error executing Pinot query: %s", sql);
-            return Collections.emptyList();
+            throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e);
         }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
private static List<JsonNode> getQueryResult(URL url, String sql)
{
try {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Accept", "application/json");
conn.setDoOutput(true);
conn.setConnectTimeout((int) SECONDS.toMillis(5));
conn.setReadTimeout((int) SECONDS.toMillis(30));
log.info("Executing Pinot query: %s", sql);
ObjectMapper mapper = new ObjectMapper();
String body = format("{\"sql\": %s }", mapper.writeValueAsString(sql));
try (OutputStream os = conn.getOutputStream()) {
os.write(body.getBytes(StandardCharsets.UTF_8));
}
int code = conn.getResponseCode();
InputStream is = (code >= 200 && code < 300) ? conn.getInputStream() : conn.getErrorStream();
if (is == null) {
throw new IOException("Pinot HTTP " + code + " with empty body");
}
JsonNode root;
try (InputStream in = is) {
root = mapper.readTree(in);
}
JsonNode resultTable = root.get("resultTable");
if (resultTable == null) {
throw new IllegalStateException("Pinot query response missing 'resultTable' field");
}
JsonNode rows = resultTable.get("rows");
if (rows == null) {
throw new IllegalStateException("Pinot query response missing 'rows' field in resultTable");
}
ImmutableList.Builder<JsonNode> resultBuilder = ImmutableList.builder();
for (Iterator<JsonNode> it = rows.elements(); it.hasNext(); ) {
JsonNode row = it.next();
resultBuilder.add(row);
}
List<JsonNode> results = resultBuilder.build();
log.debug("Number of results: %s", results.size());
return results;
}
catch (IOException e) {
log.error(e, "IO error executing Pinot query: %s", sql);
return Collections.emptyList();
}
catch (Exception e) {
log.error(e, "Unexpected error executing Pinot query: %s", sql);
return Collections.emptyList();
}
private static List<JsonNode> getQueryResult(URL url, String sql)
{
try {
HttpURLConnection conn = (HttpURLConnection) url.openConnection();
conn.setRequestMethod("POST");
conn.setRequestProperty("Content-Type", "application/json");
conn.setRequestProperty("Accept", "application/json");
conn.setDoOutput(true);
conn.setConnectTimeout((int) SECONDS.toMillis(5));
conn.setReadTimeout((int) SECONDS.toMillis(30));
log.info("Executing Pinot query: %s", sql);
ObjectMapper mapper = new ObjectMapper();
String body = format("{\"sql\": %s }", mapper.writeValueAsString(sql));
try (OutputStream os = conn.getOutputStream()) {
os.write(body.getBytes(StandardCharsets.UTF_8));
}
int code = conn.getResponseCode();
InputStream is = (code >= 200 && code < 300) ? conn.getInputStream() : conn.getErrorStream();
if (is == null) {
throw new IOException("Pinot HTTP " + code + " with empty body");
}
JsonNode root;
try (InputStream in = is) {
root = mapper.readTree(in);
}
JsonNode resultTable = root.get("resultTable");
if (resultTable == null) {
throw new IllegalStateException("Pinot query response missing 'resultTable' field");
}
JsonNode rows = resultTable.get("rows");
if (rows == null) {
throw new IllegalStateException("Pinot query response missing 'rows' field in resultTable");
}
ImmutableList.Builder<JsonNode> resultBuilder = ImmutableList.builder();
for (Iterator<JsonNode> it = rows.elements(); it.hasNext(); ) {
JsonNode row = it.next();
resultBuilder.add(row);
}
List<JsonNode> results = resultBuilder.build();
log.debug("Number of results: %s", results.size());
return results;
}
catch (IOException e) {
log.error(e, "IO error executing Pinot query: %s", sql);
throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e);
}
catch (Exception e) {
log.error(e, "Unexpected error executing Pinot query: %s", sql);
throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e);
}
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java
around lines 340 to 392, the method currently catches IO/other exceptions, logs
them and returns Collections.emptyList(), which silences Pinot failures and
causes silent data loss; change the catch blocks so after logging you rethrow
the original exception (or wrap it) instead of returning an empty list — e.g.,
remove the return Collections.emptyList() and add "throw e;" (or "throw new
IOException(\"Error executing Pinot query: \" + sql, e);" for checked-to-checked
wrapping) in the IOException catch and rethrow a RuntimeException wrapping the
exception in the generic catch; preserve logging and stack trace.

Comment on lines +79 to +88
// Test with trailing slash
config.setMetadataDbUrl("https://neutrino.uber.com/");
URL result = (URL) method.invoke(splitProvider, config);
assertEquals(result.toString(), "https://neutrino.uber.com//v1/globalStatements");

// Test without protocol (should work as URL constructor handles it)
config.setMetadataDbUrl("http://neutrino-dev.uber.com");
result = (URL) method.invoke(splitProvider, config);
assertEquals(result.toString(), "http://neutrino-dev.uber.com/v1/globalStatements");

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Normalise the Neutrino endpoint URL

When the metadata base URL already ends with “/”, this concatenation yields //v1/globalStatements. That path relies on server-specific behaviour and can produce 404s. Trim trailing slashes before appending the Neutrino endpoint and update the expectation accordingly.

Apply this diff to the test:

-        assertEquals(result.toString(), "https://neutrino.uber.com//v1/globalStatements");
+        assertEquals(result.toString(), "https://neutrino.uber.com/v1/globalStatements");

Then update ClpUberPinotSplitProvider.buildPinotSqlQueryEndpointUrl (and the base ClpPinotSplitProvider implementation) to strip trailing slashes before adding /v1/globalStatements.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Test with trailing slash
config.setMetadataDbUrl("https://neutrino.uber.com/");
URL result = (URL) method.invoke(splitProvider, config);
assertEquals(result.toString(), "https://neutrino.uber.com//v1/globalStatements");
// Test without protocol (should work as URL constructor handles it)
config.setMetadataDbUrl("http://neutrino-dev.uber.com");
result = (URL) method.invoke(splitProvider, config);
assertEquals(result.toString(), "http://neutrino-dev.uber.com/v1/globalStatements");
// Test with trailing slash
config.setMetadataDbUrl("https://neutrino.uber.com/");
URL result = (URL) method.invoke(splitProvider, config);
assertEquals(result.toString(), "https://neutrino.uber.com/v1/globalStatements");
// Test without protocol (should work as URL constructor handles it)
config.setMetadataDbUrl("http://neutrino-dev.uber.com");
result = (URL) method.invoke(splitProvider, config);
assertEquals(result.toString(), "http://neutrino-dev.uber.com/v1/globalStatements");
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java
around lines 79-88, the test currently expects a double-slash when metadataDbUrl
ends with “/”; change the test to expect a single “/v1/globalStatements” and
update the production code: in
ClpUberPinotSplitProvider.buildPinotSqlQueryEndpointUrl (and the shared base
ClpPinotSplitProvider implementation) trim any trailing slashes from
metadataDbUrl before concatenating the path, then append "/v1/globalStatements"
(ensuring exactly one slash between host and path); keep existing behavior when
the URL already lacks a trailing slash and preserve protocol handling.

@jackluo923 jackluo923 closed this Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants