[CI] Testinh#91
Conversation
…ilds Change Docker image tagging from static :dev tags to dynamic branch-based tags, and update all workflow conditions to accept any branch with the release-0.293-clp-connector-snapshot prefix. Changes: - Docker image tags now use actual branch name via type=ref,event=branch * Before: ghcr.io/repo/prestissimo-worker:dev * After: ghcr.io/repo/prestissimo-worker:release-0.293-clp-connector-snapshot-nov29 - Push conditions changed from exact match to prefix match using startsWith() - Cancel-in-progress logic updated to exclude all snapshot prefix branches - PR title checks now accept release-0.293-clp-connector-snapshot* pattern Example: Pushing to branch "release-0.293-clp-connector-snapshot-nov29" will: - Build and push images tagged with :release-0.293-clp-connector-snapshot-nov29 - Run all CI jobs to completion without cancellation - Enable PR title validation for PRs targeting this branch This allows multiple snapshot branches to coexist with unique image tags.
Add alternative provider implementations for metadata and split discovery,
enabling flexible deployment configurations without MySQL dependency.
New providers:
- ClpYamlMetadataProvider: Reads table/column metadata from YAML config files
instead of querying MySQL database. Supports nested schemas and polymorphic
types through hierarchical YAML structure.
- ClpPinotSplitProvider: Queries Apache Pinot database for archive file paths
via HTTP/JSON API instead of MySQL. Determines split types based on file
extensions (.clp.zst for IR, otherwise ARCHIVE).
Configuration:
- Add clp.metadata-provider-type=YAML option for YAML-based metadata
- Add clp.split-provider-type=PINOT option for Pinot-based split discovery
- Add clp.metadata-yaml-path config for YAML metadata file location
The YAML metadata provider parses a two-level structure:
1. Main metadata file: maps connector → schema → table → schema_file_path
2. Individual table schema files: define column names and types
This provides deployment flexibility - users can now choose between:
- Database-driven (MySQL for both metadata and splits)
- Hybrid (YAML metadata + Pinot splits)
- Or any combination based on their infrastructure
Dependencies: Adds jackson-dataformat-yaml and snakeyaml 2.1
Ensure all timestamp literals are converted to nanoseconds before generating KQL and SQL filters for pushdown, regardless of their original precision (milliseconds, microseconds, etc.). Problem: Presto supports multiple timestamp precisions (TIMESTAMP with millisecond precision, TIMESTAMP_MICROSECONDS, etc.), but CLP's query engine expects timestamps in a consistent nanosecond format. Previously, timestamp literals were passed through without precision normalization, causing incorrect results when different timestamp types were used in queries. Solution: Add timestamp normalization logic in ClpFilterToKqlConverter: - tryEnsureNanosecondTimestamp(): Detects TIMESTAMP and TIMESTAMP_MICROSECONDS types - ensureNanosecondTimestamp(): Converts timestamp values to nanoseconds using: 1. Extract epoch seconds via TimestampType.getEpochSecond() 2. Extract nanosecond fraction via TimestampType.getNanos() 3. Convert to total nanoseconds: SECONDS.toNanos(seconds) + nanosFraction Applied to: - BETWEEN operator: Both lower and upper bounds - Comparison operators: All timestamp literal values (=, !=, <, >, <=, >=) This ensures consistent timestamp representation in both KQL filters (sent to CLP engine) and SQL filters (used for metadata filtering), regardless of the timestamp precision used in the original Presto query. Example: Query: WHERE timestamp BETWEEN TIMESTAMP '2025-01-01' AND TIMESTAMP '2025-01-02' Before: Literal values passed as milliseconds or microseconds After: Both bounds normalized to nanosecond representation
Implement a plan optimizer that pushes down LIMIT + ORDER BY operations to reduce split scanning when querying on timestamp metadata columns. The optimizer detects TopN patterns in query plans and uses archive metadata (timestamp bounds, message counts) to select only the minimum set of archives needed to guarantee correct results. Key features: - Detects and rewrites TopN → [Project] → [Filter] → TableScan patterns - Pushes TopN specifications through ClpTableLayoutHandle - Handles overlapping archives by merging into non-overlapping groups - Supports both ASC (earliest-first) and DESC (latest-first) ordering - Uses worst-case analysis to ensure correctness when archive ranges overlap Implementation: - ClpTopNSpec: Data structure for TopN specifications (limit + orderings) - ClpComputePushDown: Enhanced plan optimizer with TopN detection - ClpMySqlSplitProvider: Archive selection logic using metadata queries - Comprehensive test coverage for various TopN scenarios Performance: For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100", this eliminates scanning of archives that cannot contain the top results, significantly reducing I/O.
Implement TopN pushdown optimization in ClpPinotSplitProvider to minimize archive scanning when executing LIMIT + ORDER BY queries on timestamp metadata columns. The optimization uses archive metadata (timestamp bounds and message counts) to intelligently select only the minimum set of archives needed to guarantee correct TopN results, significantly reducing I/O for time-range queries. Implementation details: Archive selection algorithm: - Fetches archive metadata (file path, creation time, modification time, message count) - Merges overlapping archives into non-overlapping groups by timestamp ranges - For DESC ordering: includes newest group + older groups until limit is covered - For ASC ordering: includes oldest group + newer groups until limit is covered - Uses worst-case analysis to ensure correctness when archives overlap Code structure: - ClpPinotSplitProvider.listSplits(): Detects TopN specs and routes to optimized path - selectTopNArchives(): Implements the archive selection algorithm - toArchiveGroups(): Merges overlapping archives into logical components - ArchiveMeta: Represents archive metadata with validation - ArchiveGroup: Represents merged archive groups for overlap handling ClpTopNSpec enhancements: - Made fields private with proper encapsulation - Added @JsonProperty annotations to getters for correct serialization - Maintains immutability for thread-safety Code quality improvements: - Proper exception handling with context-aware error messages - Input validation in constructors (bounds checking, null checks) - Extracted determineSplitType() helper to eliminate duplication - Made fields final where immutable (pinotDatabaseUrl, ArchiveMeta fields) - Improved logging with table names and SQL queries for debugging - Better encapsulation: private fields, getters-only access Performance impact: For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100", this eliminates scanning of archives outside the time range of interest, providing substantial I/O reduction for large archive sets.
Implements dynamic schema discovery from YAML metadata files, allowing the CLP connector
to support multiple schemas beyond just "default" (e.g., clp.dev.table, clp.prod.table).
Key changes:
- Add default listSchemaNames() method to ClpMetadataProvider interface for DRY principle
- Implement dynamic schema discovery in ClpYamlMetadataProvider with thread-safe caching
- Update ClpMetadata to delegate schema listing to metadata providers
- Add comprehensive error handling with graceful fallback to default schema
- Optimize performance with double-checked locking and ObjectMapper reuse
- Add 15+ unit tests covering all edge cases and error scenarios
- Fix checkstyle violations in UberClpPinotSplitProvider and ClpPinotSplitProvider
The implementation is backward compatible - existing single-schema setups continue to work
without configuration changes. Multi-schema support is activated automatically when the
YAML metadata file contains multiple schema entries.
Example YAML structure:
```yaml
clp:
default:
logs: /path/to/default/logs.yaml
dev:
test_logs: /path/to/dev/logs.yaml
prod:
production_logs: /path/to/prod/logs.yaml
```
Performance optimizations:
- Thread-safe caching prevents repeated YAML parsing
- Double-checked locking pattern for lazy initialization
- Reused ObjectMapper instances reduce object creation overhead
- Synchronized table schema map updates ensure thread safety
Testing:
- All 11 tests in TestClpYamlMetadataProvider pass
- All 4 tests in TestClpMultiSchema pass
- Fixed testListSchemaNamesNoCatalogField to handle missing catalog gracefully
WalkthroughThis PR introduces YAML-based metadata discovery for CLP, adds Pinot split provider support with Uber-specific variants, implements Top-N optimization pushdown with metadata-only query support, and updates GitHub workflows to use branch prefix-based cancellation and group-based runners. Changes
Sequence Diagram(s)sequenceDiagram
participant Query as Query Optimizer
participant TopN as TopN Visitor
participant TableScan as TableScan
participant Layout as ClpTableLayoutHandle
participant SplitProvider as CLP Split Provider
Query->>TopN: visitTopN(plan)
TopN->>TopN: Extract ORDER BY cols
TopN->>TopN: Build ClpTopNSpec
TopN->>Layout: Create with metadataQueryOnly,<br/>topN fields
TopN->>TableScan: Create with new layout
TableScan->>SplitProvider: listSplits(layout)
alt Top-N Specified
SplitProvider->>SplitProvider: Fetch archive metadata
SplitProvider->>SplitProvider: Merge overlaps
SplitProvider->>SplitProvider: selectTopNArchives()
SplitProvider-->>Query: Return minimal archive set
else No Top-N
SplitProvider->>SplitProvider: Fetch all splits
SplitProvider-->>Query: Return full split set
end
sequenceDiagram
participant Config as ClpConfig
participant Module as ClpModule (DI)
participant ProviderMap as Static Provider Maps
participant MetadataImpl as Metadata Provider
Config->>Config: Set metadataYamlPath
Module->>ProviderMap: Lookup METADATA_PROVIDER_MAPPINGS
ProviderMap-->>Module: Return provider class for type
Module->>MetadataImpl: Create provider instance
alt MetadataProviderType.YAML
MetadataImpl->>MetadataImpl: Parse YAML from path
MetadataImpl->>MetadataImpl: Extract schemas/tables
MetadataImpl-->>Module: Cached schema list
else MetadataProviderType.MYSQL
MetadataImpl->>MetadataImpl: Query database
MetadataImpl-->>Module: Schema list from DB
end
Module-->>Config: Bind provider
sequenceDiagram
participant Filter as Filter Pushdown
participant Converter as ClpFilterToKqlConverter
participant Expression as ClpExpression
participant Layout as ClpTableLayoutHandle
Filter->>Converter: Convert predicate to KQL
Converter->>Converter: Translate column refs
Converter->>Converter: Track pushDownVariables set
Converter->>Expression: Create with pushDownVariables
Expression-->>Converter: Return KQL + metadata SQL +<br/>pushDownVariables
Converter->>Layout: Create with isMetadataQueryOnly<br/>flag based on variables
Layout-->>Filter: Return updated handle
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Areas requiring extra attention:
Possibly related PRs
Suggested reviewers
Pre-merge checks and finishing touches❌ Failed checks (2 warnings, 1 inconclusive)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 8
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1)
88-88: Fix typographical error in JavaDoc."correh hsponding" should be "corresponding".
Apply this diff:
- * @return a map from column handle to its correh hsponding scan-level variable + * @return a map from column handle to its corresponding scan-level variable
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (39)
.github/workflows/docs.yml(1 hunks).github/workflows/maven-checks.yml(3 hunks).github/workflows/pr-title-checks.yaml(1 hunks).github/workflows/prestissimo-worker-images-build.yml(5 hunks).github/workflows/prestocpp-format-and-header-check.yml(1 hunks).github/workflows/prestocpp-linux-build-and-unit-test.yml(4 hunks).github/workflows/tests.yml(1 hunks)presto-clp/pom.xml(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(4 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java(5 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java(6 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java(6 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java(18 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java(5 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/UberClpPinotSplitProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java(2 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java(5 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java(2 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java(1 hunks)presto-clp/src/test/resources/test-topn-split-filter.json(1 hunks)presto-native-execution/pom.xml(1 hunks)presto-native-execution/velox(1 hunks)
🧰 Additional context used
🧠 Learnings (6)
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java
📚 Learning: 2025-07-30T15:27:04.862Z
Learnt from: anlowee
Repo: y-scope/presto PR: 46
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpPlanOptimizer.java:105-105
Timestamp: 2025-07-30T15:27:04.862Z
Learning: In ClpPlanOptimizer, the local construction of tableScope in visitFilter() method is necessary and correct because join queries can have multiple TableScanNodes for different tables. Each FilterNode needs to determine its scope from its immediate source TableScanNode, not from a shared instance field that gets overwritten.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
📚 Learning: 2025-11-08T02:05:14.212Z
Learnt from: wraymo
Repo: y-scope/presto PR: 89
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:835-884
Timestamp: 2025-11-08T02:05:14.212Z
Learning: In ClpFilterToKqlConverter's handleOr method, when an OR expression mixes metadata-only predicates with KQL-only predicates, dropping the metadata predicate is expected behavior. Metadata columns are used for archive selection (split filtering stage) and cannot be evaluated at the row level, so in mixed OR scenarios, only the KQL predicates are pushed down and all archives are scanned. This is a design trade-off, not a correctness issue.
Applied to files:
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
📚 Learning: 2025-09-12T14:47:20.734Z
Learnt from: wraymo
Repo: y-scope/presto PR: 64
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java:419-483
Timestamp: 2025-09-12T14:47:20.734Z
Learning: In Presto, SQL validation (including type compatibility checks) happens before reaching the connector optimizer, so invalid SQL with type mismatches will be rejected upstream rather than needing validation in the connector-specific optimization code.
Applied to files:
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
📚 Learning: 2025-06-04T14:50:43.554Z
Learnt from: anlowee
Repo: y-scope/presto PR: 7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
Applied to files:
.github/workflows/prestocpp-format-and-header-check.yml.github/workflows/prestocpp-linux-build-and-unit-test.yml.github/workflows/prestissimo-worker-images-build.yml
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
🧬 Code graph analysis (17)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
presto-native-execution/presto_cpp/presto_protocol/connector/hive/presto_protocol_hive.h (1)
Order(106-109)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (5)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
ClpYamlMetadataProvider(44-266)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1)
ClpPinotSplitProvider(49-369)presto-spi/src/main/java/com/facebook/presto/spi/ColumnMetadata.java (1)
ColumnMetadata(29-205)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-220)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-220)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (1)
ClpExpression(31-141)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-220)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
ClpTopNSpec(28-148)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
ClpTopNSpec(28-148)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit(33-111)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-220)presto-spi/src/main/java/com/facebook/presto/spi/PrestoException.java (1)
PrestoException(18-56)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
ClpConnectorFactory(32-70)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (6)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (1)
ClpComputePushDown(63-484)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
ClpTopNSpec(28-148)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(34-180)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(50-187)presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
ClpMetadataDbSetUp(46-276)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
ClpSplit(33-111)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/UberClpPinotSplitProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-220)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (2)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpQueryBase.java (1)
TestClpQueryBase(65-144)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (1)
ClpFilterToKqlConverter(103-986)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
ClpTableLayoutHandle(26-126)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1)
ClpSplitFilterProvider(50-187)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConnectorFactory.java (1)
ClpConnectorFactory(32-70)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
ClpTopNSpec(28-148)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
ClpYamlMetadataProvider(44-266)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1)
ClpPinotSplitProvider(49-369)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (4)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
ClpYamlMetadataProvider(44-266)presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1)
Test(35-345)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-220)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
ClpMetadata(57-206)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider(39-300)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (1)
presto-native-execution/presto_cpp/presto_protocol/core/presto_protocol_core.h (1)
RowExpression(394-416)
🪛 actionlint (1.7.8)
.github/workflows/prestocpp-format-and-header-check.yml
23-23: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
.github/workflows/prestocpp-linux-build-and-unit-test.yml
23-23: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
100-100: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
175-175: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
251-251: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
.github/workflows/prestissimo-worker-images-build.yml
10-10: label "y-scope/Default" is unknown. available labels are "windows-latest", "windows-latest-8-cores", "windows-2025", "windows-2022", "windows-11-arm", "ubuntu-latest", "ubuntu-latest-4-cores", "ubuntu-latest-8-cores", "ubuntu-latest-16-cores", "ubuntu-24.04", "ubuntu-24.04-arm", "ubuntu-22.04", "ubuntu-22.04-arm", "macos-latest", "macos-latest-xl", "macos-latest-xlarge", "macos-latest-large", "macos-26-xlarge", "macos-26", "macos-15-intel", "macos-15-xlarge", "macos-15-large", "macos-15", "macos-14-xl", "macos-14-xlarge", "macos-14-large", "macos-14", "macos-13-xl", "macos-13-xlarge", "macos-13-large", "macos-13", "self-hosted", "x64", "arm", "arm64", "linux", "macos", "windows". if it is a custom label for self-hosted runner, set list of labels in actionlint.yaml config file
(runner-label)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (17)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (17.0.13, :presto-main)
- GitHub Check: test (17.0.13, :presto-main-base)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-main-base)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
🔇 Additional comments (46)
presto-clp/pom.xml (1)
78-81: Dependency added with version resolution deferred to parent POM.The
jackson-dataformat-yamldependency is correctly positioned with other Jackson format/core dependencies and omits an explicit version, relying on the parent's dependencyManagement to define it. Verify that the parent POM (presto-root) declares a compatible version for this artifact.presto-native-execution/pom.xml (1)
270-278: Verify transitive SnakeYAML dependency and clarify version constraint intent.This module adds the same SnakeYAML constraint as
presto-clp. Ensure this module actually depends on SnakeYAML transitively (either directly or throughpresto-clpruntime dependency on line 259–260). Additionally, address the "upper bound" comment semantics noted in the parallelpresto-clp/pom.xmlchange.presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1)
35-38: Formatting change approved, but inconsistent with PR summary.The constructor parameter formatting improvement enhances readability. However, the PR-level AI summary describes significant feature additions (YAML-based metadata discovery, Pinot split provider support, Top-N query optimisation, GitHub Actions workflow updates), none of which are reflected in this formatting-only change.
presto-native-execution/velox (1)
1-1: Velox submodule bump acknowledged.Thanks for advancing the submodule pointer to 52bb2fedafd6b7ceb12bb60ef959d2a290e82f17; looks good from my side.
.github/workflows/pr-title-checks.yaml (1)
11-11: Branch trigger pattern update looks good.The wildcard pattern correctly expands PR title validation to all snapshot-prefixed branches, aligning with the broader branch policy changes across workflows.
.github/workflows/tests.yml (1)
19-21: Prefix-based concurrency policy update is correct.The new policy preserves in-progress jobs for snapshot branches while cancelling them elsewhere, enabling complete CI validation for each snapshot commit. The logic and comment are clear.
.github/workflows/maven-checks.yml (3)
17-19: Concurrency policy change is consistent.The cancel-in-progress logic correctly aligns with the pattern used across other workflows in this PR.
101-101: Verify Docker image tagging behaviour change.Changing from static
type=raw,value=devto dynamictype=ref,event=branchalters image tag generation. This will now produce branch-specific tags (e.g., based on branch name) instead of a fixed "dev" tag.Please confirm:
- Downstream systems don't depend on the "dev" tag
- The dynamic branch-based tagging is the intended behaviour
- Image versioning/lifecycle will work correctly with the new tagging scheme
112-113: Verify Docker push policy restriction.The push condition now restricts image pushes to branches starting with
release-0.293-clp-connector-snapshot. This is a narrower policy than before.Please confirm:
- Image pushes should only occur on snapshot branches (not the target branch
release-0.293-clp-connector)- This change doesn't inadvertently block releases or production deployments
- The prefix-based check is the intended long-term branch policy
.github/workflows/docs.yml (1)
23-25: Concurrency policy update is consistent.The cancel-in-progress logic aligns with changes across other workflows, using the prefix-based exclusion pattern for snapshot branches.
.github/workflows/prestocpp-format-and-header-check.yml (2)
17-19: Concurrency policy update is consistent.The cancel-in-progress logic aligns with the prefix-based exclusion pattern used across other workflows.
23-23: Custom runner label requires verification and configuration.The runs-on value
y-scope/Defaultis a custom self-hosted runner label. Static analysis tools flag it as unknown, indicating:
- The label may not be registered or properly configured in actionlint.yaml
- The runner must be available in your GitHub Actions environment
- The runner must have the necessary dependencies (container image:
public.ecr.aws/oss-presto/velox-dev:check)Please verify:
- The
y-scope/Defaultrunner is registered in your GitHub Actions self-hosted runners- actionlint.yaml (if present) includes this label in its custom runner configuration
- The runner has sufficient capacity and resources for this workflow
Note: This same label is used across 5 workflow files in this PR (also in prestocpp-linux-build-and-unit-test.yml and prestissimo-worker-images-build.yml).
.github/workflows/prestocpp-linux-build-and-unit-test.yml (2)
17-19: Concurrency policy update is consistent.The cancel-in-progress logic aligns with the prefix-based exclusion pattern across all workflows in this PR.
23-23: Custom runner label used across all jobs—verify registration and configuration.All four jobs in this workflow (lines 23, 100, 175, 251) are updated to use the custom runner label
y-scope/Default. Static analysis flags this as an unknown runner label in each case.Please verify:
- The
y-scope/Defaultrunner is properly registered in your GitHub Actions environment and has capacity for concurrent job execution (4 jobs may run in parallel across matrix configurations)- actionlint.yaml is updated to recognize this custom label in its self-hosted runner configuration
- Each runner instance has the necessary container runtime and dependencies for the containerized workflow steps
- Resource constraints (CPU, memory) are sufficient, especially for the prestocpp-linux-build-for-test job which performs intensive compilation
Note: This is part of a coordinated change across 5 workflow files in this PR.
Also applies to: 100-100, 175-175, 251-251
.github/workflows/prestissimo-worker-images-build.yml (3)
10-10: Custom runner label requires verification.The runs-on value
y-scope/Defaultis a custom self-hosted runner label flagged as unknown by static analysis. Please verify runner registration and actionlint configuration as noted in review comments for other workflow files.
28-28: Docker image tagging behaviour changes—verify versioning scheme.Both dependency and runtime image builds now use
type=ref,event=branchtagging instead of the previous static approach. This will generate branch-based tags (e.g., based on the branch name) rather than fixed tags like "dev".Confirm that:
- Downstream image consumers can work with dynamically named images
- The new tagging scheme aligns with your image versioning and lifecycle policies
- Image references in deployment configurations or documentation are updated accordingly
Also applies to: 46-46
36-37: Verify Docker push policy restriction applies to both images.Both dependency and runtime images are now restricted to push only when the branch starts with
release-0.293-clp-connector-snapshot:push: >- ${{github.event_name != 'pull_request' && startsWith(github.ref, 'refs/heads/release-0.293-clp-connector-snapshot')}}This is a narrower policy. Please confirm:
- Images should not be pushed to the target branch
release-0.293-clp-connectorin non-PR events- Image releases and deployments will not be affected by this restriction
- The behaviour matches your CI/CD release strategy
Also applies to: 66-68
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2)
277-277: LGTM! Query and expected columns correctly aligned.The addition of
ORDER BY fareto the query and the corresponding inclusion of thefarecolumn in the expected scan columns (line 297) are properly synchronized. When ordering by a column, it must be included in the table scan, and the test correctly verifies this behavior.Also applies to: 297-297
269-300: The review comment is incorrect; the code organization is appropriate.The codebase correctly separates test concerns. ClpTableLayoutHandle provides an overloaded 3-parameter constructor that defaults
metadataQueryOnlyto false andtopNtoOptional.empty(). The testClpGetJsonString test appropriately:
- Uses the 3-parameter constructor at line 295 because it focuses on JSON_STRING functionality, not TopN optimization
- Has a name matching its scope ("testClpGetJsonString")
- Works alongside a dedicated test class (TestClpTopN.java) that comprehensively verifies TopN pushdown with the full 5-parameter constructor
Test name, constructor usage, and plan verification patterns are all correct for their intended purpose. No changes needed.
Likely an incorrect or invalid review comment.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (2)
121-132: LGTM! Proper visitor pattern delegation.The refactoring to use
accept(this, context)instead of a custom helper properly delegates subtree traversal to the visitor framework.
135-140: LGTM! Clean visitor implementation.The refactoring properly delegates source traversal while rewriting the predicate expression.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1)
16-16: LGTM!The import update correctly reflects the package reorganization of
ClpExpressionto the optimization package.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1)
31-31: LGTM!The new error code is appropriately named and follows the existing pattern for external error types. This aligns with the YAML metadata provider functionality introduced in this PR.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
97-110: LGTM!The constructor validation ensures that
limitis positive andorderingsis non-empty, which are sensible constraints for a Top-N specification.presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (2)
953-971: Consider error handling for invalid timestamp literals.The
ensureNanosecondTimestampmethod callsLong.parseLong(literalString)(Line 966) which can throwNumberFormatExceptionif the literal is not a valid long. While Presto should guarantee valid constants, adding error handling or documentation would improve robustness.Consider adding a try-catch or documenting the assumption:
private static String ensureNanosecondTimestamp(TimestampType type, String literalString) { - long literalNumber = Long.parseLong(literalString); + long literalNumber; + try { + literalNumber = Long.parseLong(literalString); + } catch (NumberFormatException e) { + throw new PrestoException(CLP_PUSHDOWN_UNSUPPORTED_EXPRESSION, + "Invalid timestamp literal: " + literalString, e); + } long seconds = type.getEpochSecond(literalNumber); long nanosecondFraction = type.getNanos(literalNumber); long nanoseconds = SECONDS.toNanos(seconds) + nanosecondFraction; return Long.toString(nanoseconds); }
698-738: LGTM!The
handleAndmethod correctly accumulatespushDownVariablesfrom all child expressions usingImmutableSet.Builder, ensuring proper variable tracking through AND operations.presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (2)
46-56: LGTM!The primary constructor properly ensures immutability by using
ImmutableSet.copyOf()for thepushDownVariablesparameter, which is a best practice for defensive copying.
137-140: LGTM!The accessor returns the immutable set directly, which is safe since
ImmutableSetcannot be modified.presto-clp/src/test/resources/test-topn-split-filter.json (1)
1-14: LGTM!The test resource defines a valid split filter configuration with timestamp range mapping, which supports the Top-N functionality introduced in this PR.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1)
82-85: LGTM!Delegating to the metadata provider enables dynamic schema name discovery, which is essential for supporting YAML-based metadata configurations introduced in this PR.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (1)
81-98: LGTM! Well-implemented equals and hashCode methods.The implementation correctly follows best practices: same-reference check, null and class checks, and field-by-field comparison. All three fields (path, type, kqlQuery) are consistently used in both methods, ensuring proper contract adherence.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1)
60-61: LGTM! Test correctly exercises the updated API.The test correctly adapts to the new
checkContainsRequiredFilterssignature, passingSet<String>of column names rather than filter expressions. The test validates that "level" is not a required filter (throws exception) whilst "msg.timestamp" is required (passes).presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1)
31-43: LGTM! Well-documented default method provides backward compatibility.The default implementation returns a single-element list containing the default schema, which maintains backward compatibility for existing implementations whilst enabling new implementations (like ClpYamlMetadataProvider) to override for multi-schema support.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
89-111: LGTM! Correctly implements scope-based column remapping.The implementation properly aggregates mappings from the scope hierarchy (catalog → schema → table) and returns either the mapped bounds (lowerBound, upperBound) or the original column name. The logic is consistent with the existing
remapSplitFilterPushDownExpressionmethod.presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3)
19-19: LGTM! Properly wired new providers.The additions correctly integrate the YAML metadata provider and Pinot split provider into the dependency injection framework, following the established pattern for provider registration with appropriate singleton scoping.
61-63: LGTM! YAML metadata provider binding correctly configured.The binding follows the established pattern and properly registers ClpYamlMetadataProvider as the implementation when the YAML provider type is selected.
71-73: LGTM! Pinot split provider binding correctly configured.The binding follows the established pattern and properly registers ClpPinotSplitProvider as the implementation when the PINOT provider type is selected.
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (5)
35-49: LGTM! Proper test lifecycle management.The test class correctly uses
singleThreaded = trueand implements proper cleanup of temporary files in the@AfterClassmethod.
54-104: LGTM! Comprehensive schema discovery test coverage.The tests thoroughly validate schema name discovery for both single and multiple schema scenarios, properly exercising the YAML metadata provider's core functionality.
109-160: LGTM! Excellent error handling test coverage.The tests comprehensively validate graceful degradation for error cases: missing YAML path, nonexistent files, and malformed YAML. All correctly verify fallback to the default schema.
188-249: LGTM! Thorough table handle validation.The tests effectively validate table discovery for both single and multiple schema configurations, ensuring correct table-to-schema association.
254-328: LGTM! Edge case coverage is excellent.The tests validate consistency across multiple calls, empty schemas, and special characters in schema names, ensuring robust behaviour across various scenarios.
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (2)
156-166: LGTM! Configuration field properly implemented.The new
metadataYamlPathconfiguration field follows the established pattern with appropriate getter/setter and@Configannotation. The YAML metadata provider handles missing or invalid paths gracefully with fallback to default schema, so absence of validation here is acceptable.
206-218: LGTM! Enum extensions support new provider types.The additions of
YAMLtoMetadataProviderTypeandPINOTtoSplitProviderTypeproperly extend the configuration to support the new providers introduced in this PR.presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (2)
94-108: LGTM! Well-documented abstract method for column remapping.The new
remapColumnNamemethod provides a clean abstraction for scope-based column name remapping, with comprehensive Javadoc explaining the scope hierarchy and expected behaviour.
117-134: LGTM! Improved signature for required filter checking.The signature change from
String splitFilterPushDownExpressiontoSet<String> pushDownVariablesis clearer and more direct, checking for column name presence rather than parsing expressions. The implementation correctly validates required columns against the provided set.
| <dependencyManagement> | ||
| <dependencies> | ||
| <dependency> | ||
| <groupId>org.yaml</groupId> | ||
| <artifactId>snakeyaml</artifactId> | ||
| <version>2.1</version> <!-- upper bound --> | ||
| </dependency> | ||
| </dependencies> | ||
| </dependencyManagement> |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Clarify SnakeYAML version constraint intent.
The comment references "upper bound," but Maven's dependencyManagement with a static version (e.g., 2.1) sets an exact version, not a range constraint. If an upper bound is truly intended, use version-range syntax (e.g., [2.0,2.1]). If 2.1 is the enforced/pinned version, update the comment for clarity.
- <version>2.1</version> <!-- upper bound -->
+ <version>2.1</version> <!-- pinned version for YAML processing -->📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| <dependencyManagement> | |
| <dependencies> | |
| <dependency> | |
| <groupId>org.yaml</groupId> | |
| <artifactId>snakeyaml</artifactId> | |
| <version>2.1</version> <!-- upper bound --> | |
| </dependency> | |
| </dependencies> | |
| </dependencyManagement> | |
| <dependencyManagement> | |
| <dependencies> | |
| <dependency> | |
| <groupId>org.yaml</groupId> | |
| <artifactId>snakeyaml</artifactId> | |
| <version>2.1</version> <!-- pinned version for YAML processing --> | |
| </dependency> | |
| </dependencies> | |
| </dependencyManagement> |
🤖 Prompt for AI Agents
In presto-clp/pom.xml around lines 163 to 171, the comment "upper bound" is
misleading because declaring <version>2.1</version> in dependencyManagement pins
an exact version rather than expressing a range; either change the version
element to a Maven version range (e.g., [2.0,2.1]) if you intend an upper bound,
or update the comment to say "pinned/enforced version 2.1" to reflect that this
is an exact version. Ensure the pom comment matches the chosen intent.
| ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle( | ||
| clpTableHandle, | ||
| kqlQuery, | ||
| metadataSqlQuery, | ||
| metadataColumnNames.equals(clpExpression.getPushDownVariables()), | ||
| Optional.empty()); |
There was a problem hiding this comment.
Block TopN pushdown when KQL is present
Line 312 currently sets metadataQueryOnly purely by comparing metadata column sets. If the query still has a KQL predicate (e.g., WHERE msg.timestamp BETWEEN … AND city = 'Calgary'), we still get metadataQueryOnly = true. The TopN rewriter then pushes the spec into the layout, and the split provider trims archives using only metadata ranges. Because that trimming ignores the row-level KQL filter, archives containing the actual top-N rows can be dropped, and the query returns fewer rows than the LIMIT demands. Please ensure metadata-only is true only when there is no KQL predicate and the pushed-down variables are a subset of the metadata columns.
- splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), clpExpression.getPushDownVariables());
+ Set<String> pushDownVariables = clpExpression.getPushDownVariables();
+ splitFilterProvider.checkContainsRequiredFilters(ImmutableSet.of(tableScope), pushDownVariables);
boolean hasMetadataFilter = metadataSqlQuery.isPresent() && !metadataSqlQuery.get().isEmpty();
if (hasMetadataFilter) {
metadataSqlQuery = Optional.of(splitFilterProvider.remapSplitFilterPushDownExpression(tableScope, metadataSqlQuery.get()));
log.debug("Metadata SQL query: %s", metadataSqlQuery.get());
}
if (kqlQuery.isPresent() || hasMetadataFilter) {
if (kqlQuery.isPresent()) {
log.debug("KQL query: %s", kqlQuery.get());
}
- ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
+ boolean metadataQueryOnly = !kqlQuery.isPresent() && metadataColumnNames.containsAll(pushDownVariables);
+ ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle(
clpTableHandle,
kqlQuery,
metadataSqlQuery,
- metadataColumnNames.equals(clpExpression.getPushDownVariables()),
+ metadataQueryOnly,
Optional.empty());🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java
around lines 311-316, the metadataQueryOnly flag is currently derived only by
comparing metadataColumnNames to the pushed-down variables, which incorrectly
allows metadata-only when a KQL predicate exists; change the condition so
metadataQueryOnly is true only when there is no KQL predicate (kqlQuery is null
or empty) AND the pushed-down variables are a subset of the metadata columns
(use metadataColumnNames.containsAll(clpExpression.getPushDownVariables())).
Construct the ClpTableLayoutHandle with that combined boolean instead of the
current equality check.
| else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) { | ||
| if (metadataFilterColumns.contains(variableName)) { | ||
| metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString); | ||
| metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString, ImmutableSet.of(variableName)); | ||
| } | ||
| return new ClpExpression(format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery); | ||
| return new ClpExpression( | ||
| format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName)); | ||
| } |
There was a problem hiding this comment.
Fix format string argument mismatch.
Line 485 has a format string with 3 placeholders ("\"%s\" %s %s") but 4 arguments are provided, including an extra ImmutableSet.of(variableName) that should not be in the format call.
Apply this diff:
else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) {
if (metadataFilterColumns.contains(variableName)) {
- metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString, ImmutableSet.of(variableName));
+ metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString);
}
return new ClpExpression(
format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName));
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) { | |
| if (metadataFilterColumns.contains(variableName)) { | |
| metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString); | |
| metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString, ImmutableSet.of(variableName)); | |
| } | |
| return new ClpExpression(format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery); | |
| return new ClpExpression( | |
| format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName)); | |
| } | |
| else if (LOGICAL_BINARY_OPS_FILTER.contains(operator) && !(literalType instanceof VarcharType)) { | |
| if (metadataFilterColumns.contains(variableName)) { | |
| metadataSqlQuery = format("\"%s\" %s %s", variableName, operator.getOperator(), literalString); | |
| } | |
| return new ClpExpression( | |
| format("%s %s %s", variableName, operator.getOperator(), literalString), metadataSqlQuery, ImmutableSet.of(variableName)); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java
around lines 483 to 489, the format call used to build metadataSqlQuery supplies
four arguments while the format string only has three placeholders; remove the
extraneous ImmutableSet.of(variableName) from the format(...) call so format
receives exactly the three string arguments (variableName,
operator.getOperator(), literalString), and ensure the
ImmutableSet.of(variableName) remains passed only to the ClpExpression
constructor as the metadata set.
| @JsonCreator | ||
| public Ordering( | ||
| @JsonProperty("columns") List<String> columns, | ||
| @JsonProperty("order") Order order) | ||
| { | ||
| this.columns = requireNonNull(columns, "column is null"); | ||
| this.order = requireNonNull(order, "order is null"); | ||
| } |
There was a problem hiding this comment.
Fix typo in error message.
The error message on Line 52 says "column is null" but should be "columns is null" (plural) since the parameter is columns.
Apply this diff:
@JsonCreator
public Ordering(
@JsonProperty("columns") List<String> columns,
@JsonProperty("order") Order order)
{
- this.columns = requireNonNull(columns, "column is null");
+ this.columns = requireNonNull(columns, "columns is null");
this.order = requireNonNull(order, "order is null");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @JsonCreator | |
| public Ordering( | |
| @JsonProperty("columns") List<String> columns, | |
| @JsonProperty("order") Order order) | |
| { | |
| this.columns = requireNonNull(columns, "column is null"); | |
| this.order = requireNonNull(order, "order is null"); | |
| } | |
| @JsonCreator | |
| public Ordering( | |
| @JsonProperty("columns") List<String> columns, | |
| @JsonProperty("order") Order order) | |
| { | |
| this.columns = requireNonNull(columns, "columns is null"); | |
| this.order = requireNonNull(order, "order is null"); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java
around lines 47 to 54, the requireNonNull call uses the message "column is null"
which is a typo; change the message to "columns is null" to match the parameter
name and plurality.
| @Override | ||
| public PlanNode visitTableScan(TableScanNode node, RewriteContext<Void> context) | ||
| { | ||
| Set<VariableReferenceExpression> outputVars = new LinkedHashSet<>(node.getOutputVariables()); | ||
| Map<VariableReferenceExpression, ColumnHandle> newAssignments = new HashMap<>(node.getAssignments()); | ||
|
|
||
| // Add any missing variables for known handles | ||
| globalColumnVarMap.forEach((handle, var) -> { | ||
| outputVars.add(var); | ||
| newAssignments.put(var, handle); | ||
| }); | ||
|
|
||
| return new TableScanNode( | ||
| node.getSourceLocation(), | ||
| idAllocator.getNextId(), | ||
| node.getTable(), | ||
| new ArrayList<>(outputVars), | ||
| newAssignments, | ||
| node.getTableConstraints(), | ||
| node.getCurrentConstraint(), | ||
| node.getEnforcedConstraint(), | ||
| node.getCteMaterializationInfo()); | ||
| } |
There was a problem hiding this comment.
🧩 Analysis chain
Verify correctness for join queries with multiple tables.
The visitTableScan method unconditionally injects ALL variables from globalColumnVarMap into every TableScanNode. In join scenarios with multiple tables, this could incorrectly add column handles from one table into another table's scan node, as ClpColumnHandle doesn't appear to have table-scoping information.
Based on learnings
Run the following script to check if there are multi-table join scenarios that could be affected:
🏁 Script executed:
#!/bin/bash
# Search for join-related test cases or usages that might involve multiple TableScanNodes
# Look for join-related tests in the CLP plugin
fd -e java -e sql --type f . presto-clp | xargs rg -nP -C5 '\b(join|JOIN)\b' -g '!**/target/**'
# Search for existing multi-table query handling in CLP optimizer/rewriter
ast-grep --pattern 'class $_ {
$$$
visitTableScan($_, $_) {
$$$
}
$$$
}'Length of output: 3074
I need to examine the ClpUdfRewriter implementation more closely to verify the concern about table-level filtering in the visitTableScan method.
🏁 Script executed:
#!/bin/bash
# Get the full ClpUdfRewriter class to understand globalColumnVarMap and visitTableScan context
cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java
# Also search for where globalColumnVarMap is declared and initialized
rg -nP 'globalColumnVarMap' presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java -B2 -A2
# Search for test files with JOIN and TableScan
fd -e java --type f . presto-clp | xargs rg -l 'JOIN.*TableScan|TableScan.*JOIN' -i
# Look for ClpColumnHandle to see if it has table info
ast-grep --pattern 'class ClpColumnHandle { $$$ }'Length of output: 14856
I need to verify if ClpColumnHandle has any table-scoping information in its equality comparison.
🏁 Script executed:
#!/bin/bash
# Find and examine ClpColumnHandle class
fd -e java --type f . presto-clp | xargs rg -l 'class ClpColumnHandle'
# Get the full ClpColumnHandle class definition
fd -e java --type f . presto-clp | xargs cat | grep -A 50 'class ClpColumnHandle'
# Look for equals and hashCode methods in ClpColumnHandle
fd -e java --type f . presto-clp | xargs rg -nP 'class ClpColumnHandle|equals\(|hashCode\(' -A 3 -B 1Length of output: 264994
The web search didn't provide direct information about ClpColumnHandle. Let me search more specifically for the file within the Presto repository context.
🏁 Script executed:
#!/bin/bash
# Read the ClpColumnHandle.java file directly
cat git/presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.javaLength of output: 112
🏁 Script executed:
#!/bin/bash
# Find ClpColumnHandle.java file
fd -name "ClpColumnHandle.java" --type f
# Also check if presto-clp directory exists
ls -la presto-clp/src/main/java/com/facebook/presto/plugin/clp/ 2>/dev/null | head -20Length of output: 1502
🏁 Script executed:
#!/bin/bash
# Read the ClpColumnHandle.java file
cat presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpColumnHandle.javaLength of output: 2966
Appliquer un filtrage au niveau de la table dans visitTableScan pour éviter l'injection inter-tables.
La méthode visitTableScan ajoute sans condition TOUS les handles de colonne de globalColumnVarMap à chaque TableScanNode. Puisque ClpColumnHandle n'inclut pas d'identifiant de table (l'égalité repose uniquement sur le nom de colonne et le type), dans les requêtes JOIN avec plusieurs tables, cela injectera incorrectement des handles de colonne d'une table dans le nœud de scan d'une autre table.
Comme l'indique l'apprentissage pour ClpPlanOptimizer, chaque TableScanNode doit déterminer sa portée à partir de son source immédiate. Appliquez un filtrage pour assurer que seuls les handles appartenant à la table actuelle sont ajoutés.
| private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering) | ||
| { | ||
| List<ArchiveMeta> list = new ArrayList<>(); | ||
| try (Connection connection = getConnection(); | ||
| PreparedStatement stmt = connection.prepareStatement(query); | ||
| ResultSet rs = stmt.executeQuery()) { | ||
| while (rs.next()) { | ||
| list.add(new ArchiveMeta( | ||
| rs.getString(ARCHIVES_TABLE_COLUMN_ID), | ||
| rs.getLong(ordering.getColumns().get(0)), | ||
| rs.getLong(ordering.getColumns().get(1)), | ||
| rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES))); | ||
| } |
There was a problem hiding this comment.
Guard against single-column top-N metadata
Top-N pushdown routinely produces an ordering with exactly one element (the column named in ORDER BY). In that case the new code hits ordering.getColumns().get(1) before even talking to MySQL, throwing IndexOutOfBoundsException and killing the query instead of falling back to a full scan. We need to validate the ordering shape (or otherwise derive the range columns) before dereferencing to keep production queries alive.
@@
- private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
- {
- List<ArchiveMeta> list = new ArrayList<>();
+ private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering)
+ {
+ List<String> orderingColumns = ordering.getColumns();
+ if (orderingColumns.size() < 2) {
+ throw new IllegalArgumentException("Top-N ordering must expose both lower and upper bound metadata columns: " + orderingColumns);
+ }
+ String lowerBoundColumn = orderingColumns.get(0);
+ String upperBoundColumn = orderingColumns.get(1);
+
+ List<ArchiveMeta> list = new ArrayList<>();
@@
- list.add(new ArchiveMeta(
- rs.getString(ARCHIVES_TABLE_COLUMN_ID),
- rs.getLong(ordering.getColumns().get(0)),
- rs.getLong(ordering.getColumns().get(1)),
- rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));
+ list.add(new ArchiveMeta(
+ rs.getString(ARCHIVES_TABLE_COLUMN_ID),
+ rs.getLong(lowerBoundColumn),
+ rs.getLong(upperBoundColumn),
+ rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES)));📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering) | |
| { | |
| List<ArchiveMeta> list = new ArrayList<>(); | |
| try (Connection connection = getConnection(); | |
| PreparedStatement stmt = connection.prepareStatement(query); | |
| ResultSet rs = stmt.executeQuery()) { | |
| while (rs.next()) { | |
| list.add(new ArchiveMeta( | |
| rs.getString(ARCHIVES_TABLE_COLUMN_ID), | |
| rs.getLong(ordering.getColumns().get(0)), | |
| rs.getLong(ordering.getColumns().get(1)), | |
| rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES))); | |
| } | |
| private List<ArchiveMeta> fetchArchiveMeta(String query, ClpTopNSpec.Ordering ordering) | |
| { | |
| List<String> orderingColumns = ordering.getColumns(); | |
| if (orderingColumns.size() < 2) { | |
| throw new IllegalArgumentException("Top-N ordering must expose both lower and upper bound metadata columns: " + orderingColumns); | |
| } | |
| String lowerBoundColumn = orderingColumns.get(0); | |
| String upperBoundColumn = orderingColumns.get(1); | |
| List<ArchiveMeta> list = new ArrayList<>(); | |
| try (Connection connection = getConnection(); | |
| PreparedStatement stmt = connection.prepareStatement(query); | |
| ResultSet rs = stmt.executeQuery()) { | |
| while (rs.next()) { | |
| list.add(new ArchiveMeta( | |
| rs.getString(ARCHIVES_TABLE_COLUMN_ID), | |
| rs.getLong(lowerBoundColumn), | |
| rs.getLong(upperBoundColumn), | |
| rs.getLong(ARCHIVES_TABLE_NUM_MESSAGES))); | |
| } |
| @Override | ||
| protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException | ||
| { | ||
| return new URL(config.getMetadataDbUrl() + "/v1/globalStatements"); | ||
| } |
There was a problem hiding this comment.
🧹 Nitpick | 🔵 Trivial
Consider safer URL construction.
Direct string concatenation may produce malformed URLs if getMetadataDbUrl() ends with a trailing slash (e.g., http://example.com//v1/globalStatements).
Consider using URL constructor chaining or normalizing the base URL:
protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException
{
- return new URL(config.getMetadataDbUrl() + "/v1/globalStatements");
+ String baseUrl = config.getMetadataDbUrl();
+ if (baseUrl.endsWith("/")) {
+ baseUrl = baseUrl.substring(0, baseUrl.length() - 1);
+ }
+ return new URL(baseUrl + "/v1/globalStatements");
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| @Override | |
| protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException | |
| { | |
| return new URL(config.getMetadataDbUrl() + "/v1/globalStatements"); | |
| } | |
| @Override | |
| protected URL buildPinotSqlQueryEndpointUrl(ClpConfig config) throws MalformedURLException | |
| { | |
| String baseUrl = config.getMetadataDbUrl(); | |
| if (baseUrl.endsWith("/")) { | |
| baseUrl = baseUrl.substring(0, baseUrl.length() - 1); | |
| } | |
| return new URL(baseUrl + "/v1/globalStatements"); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/UberClpPinotSplitProvider.java
around lines 58 to 62, building the endpoint by concatenating strings can
produce malformed URLs when config.getMetadataDbUrl() has a trailing slash;
normalize or resolve the path instead: parse the base with new URL(base) or new
URI(base) and then resolve or construct the endpoint using URL/URI resolution
(e.g., new URL(baseUrl, "/v1/globalStatements") or
baseUri.resolve("/v1/globalStatements").toURL()), ensuring you handle or
propagate MalformedURLException appropriately.
| private static final String PINOT_BROKER_URL = "http://localhost:8099"; | ||
| private static final String METADATA_YAML_PATH = "/home/xiaochong-dev/presto-e2e/pinot/tables-schema.yaml"; | ||
| private static final String TABLE_NAME = "cockroachdb"; | ||
| private ClpMetadata metadata; | ||
| private ClpSplitProvider clpSplitProvider; | ||
|
|
||
| @BeforeTest | ||
| public void setUp() | ||
| { | ||
| ClpConfig config = new ClpConfig() | ||
| .setPolymorphicTypeEnabled(true) | ||
| .setMetadataDbUrl(PINOT_BROKER_URL) | ||
| .setMetadataProviderType(YAML) | ||
| .setMetadataYamlPath(METADATA_YAML_PATH); | ||
| ClpMetadataProvider metadataProvider = new ClpYamlMetadataProvider(config); | ||
| metadata = new ClpMetadata(config, metadataProvider); | ||
| clpSplitProvider = new ClpPinotSplitProvider(config); | ||
| } | ||
|
|
||
| @Test | ||
| public void testListSchemaNames() | ||
| { | ||
| assertEquals(metadata.listSchemaNames(SESSION), ImmutableList.of(DEFAULT_SCHEMA_NAME)); | ||
| } | ||
|
|
||
| @Test | ||
| public void testListTables() | ||
| { | ||
| ImmutableSet.Builder<SchemaTableName> builder = ImmutableSet.builder(); | ||
| builder.add(new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME)); | ||
| assertEquals(new HashSet<>(metadata.listTables(SESSION, Optional.empty())), builder.build()); | ||
| } | ||
|
|
||
| @Test | ||
| public void testListSplits() | ||
| { | ||
| ClpTableLayoutHandle layoutHandle = new ClpTableLayoutHandle( | ||
| new ClpTableHandle(new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME), ""), | ||
| Optional.empty(), | ||
| Optional.empty()); | ||
| List<ClpSplit> result = clpSplitProvider.listSplits(layoutHandle); | ||
| System.out.println("Hello world"); | ||
| } | ||
|
|
||
| @Test | ||
| public void testGetTableMetadata() | ||
| { | ||
| ClpTableHandle clpTableHandle = (ClpTableHandle) metadata.getTableHandle(SESSION, new SchemaTableName(DEFAULT_SCHEMA_NAME, TABLE_NAME)); | ||
| ConnectorTableMetadata tableMetadata = metadata.getTableMetadata(SESSION, clpTableHandle); | ||
| // ImmutableSet<ColumnMetadata> columnMetadata = ImmutableSet.<ColumnMetadata>builder() | ||
| // .add(ColumnMetadata.builder() | ||
| // .setName("a_bigint") | ||
| // .setType(BIGINT) | ||
| // .setNullable(true) | ||
| // .build()) | ||
| // .add(ColumnMetadata.builder() | ||
| // .setName("a_varchar") | ||
| // .setType(VARCHAR) | ||
| // .setNullable(true) | ||
| // .build()) | ||
| // .add(ColumnMetadata.builder() | ||
| // .setName("b_double") | ||
| // .setType(DOUBLE) | ||
| // .setNullable(true) | ||
| // .build()) | ||
| // .add(ColumnMetadata.builder() | ||
| // .setName("b_varchar") | ||
| // .setType(VARCHAR) | ||
| // .setNullable(true) | ||
| // .build()) | ||
| // .add(ColumnMetadata.builder() | ||
| // .setName("c") | ||
| // .setType(RowType.from(ImmutableList.of( | ||
| // RowType.field("d", BOOLEAN), | ||
| // RowType.field("e", VARCHAR)))) | ||
| // .setNullable(true) | ||
| // .build()) | ||
| // .add(ColumnMetadata.builder() | ||
| // .setName("f") | ||
| // .setType(RowType.from(ImmutableList.of( | ||
| // RowType.field("g", | ||
| // RowType.from(ImmutableList.of( | ||
| // RowType.field("h", new ArrayType(VARCHAR)))))))) | ||
| // .setNullable(true) | ||
| // .build()) | ||
| // .build(); | ||
| // assertEquals(columnMetadata, ImmutableSet.copyOf(tableMetadata.getColumns())); | ||
| ImmutableSet<ColumnMetadata> actual = ImmutableSet.copyOf(tableMetadata.getColumns()); | ||
| System.out.println("Hello world"); | ||
| } |
There was a problem hiding this comment.
Make the YAML metadata test hermetic
This test can’t pass on CI: it hard-codes a developer-specific path (/home/xiaochong-dev/...) and constructs a real ClpPinotSplitProvider, which immediately tries to POST to a Pinot broker on localhost:8099. Neither resource exists in automation, so the suite now fails before asserting anything. Please rewrite it to follow the pattern we use in TestClpMultiSchema—generate temporary YAML fixtures and exercise metadata using in-memory or stubbed providers—or quarantine it behind an integration-only guard. As written it blocks every clean build while providing no coverage.
ba1a39a to
68549dc
Compare
e5bfe33 to
ddfca65
Compare
8f3858a to
5adb10a
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (2)
.github/workflows/maven-checks.yml(4 hunks).github/workflows/tests.yml(2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-06-04T14:50:43.554Z
Learnt from: anlowee
Repo: y-scope/presto PR: 7
File: .github/workflows/hive-tests.yml:33-35
Timestamp: 2025-06-04T14:50:43.554Z
Learning: In the y-scope/presto repository, GitHub workflows are extended from OSS Presto and should maintain consistent formatting with the upstream patterns, including YAML indentation styles for workflow steps.
Applied to files:
.github/workflows/tests.yml
🪛 YAMLlint (1.37.1)
.github/workflows/maven-checks.yml
[error] 26-26: too many spaces inside brackets
(brackets)
[error] 26-26: too many spaces inside brackets
(brackets)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-plan-determinism)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (17.0.13, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
🔇 Additional comments (5)
.github/workflows/tests.yml (1)
19-21: Prefix-based concurrency exclusion is logically sound.The
startsWithlogic correctly excludes branches with therelease-0.293-clp-connector-snapshotprefix from in-progress job cancellation, ensuring each commit to these branches is fully validated..github/workflows/maven-checks.yml (4)
17-19: Prefix-based concurrency exclusion is logically sound.The
startsWithlogic correctly excludes branches with therelease-0.293-clp-connector-snapshotprefix from in-progress job cancellation, consistent with the pattern applied across related workflows.
114-115: Push condition update aligns with snapshot branch strategy.The condition changed from exact ref match to prefix-based
startsWithfor release snapshot branches, aligning with the broader concurrency and branch exclusion strategy across workflows.
27-28: Formatting is consistent with repository patterns; runner group configuration requires manual verification.The
group: Defaultformat used in maven-checks.yml at lines 27–28 and 72–73 aligns with the established multi-lineruns-on:pattern across the repository (prestocpp-linux-build-and-unit-test.yml, prestocpp-format-and-header-check.yml, prestissimo-worker-images-build.yml, and tests.yml). The YAML indentation and formatting are consistent throughout.However, verify that the
Defaultrunner group is actually configured in your GitHub Actions environment settings. This cannot be confirmed from the codebase alone.
103-103: Docker image metadata tagging format verified as correct.The
type=ref,event=branchformat is valid for docker/metadata-action and generates tags from Git references when triggered by branch events. For branch events, the action sanitizes branch references into tags (e.g., refs/heads/my/branch → my-branch). The implementation matches the documented behavior for branch-based Docker builds.
| strategy: | ||
| fail-fast: false | ||
| matrix: | ||
| java: [ 8.0.442, 17.0.13 ] |
There was a problem hiding this comment.
Fix YAML bracket spacing to align with linting standards.
YAMLlint flags excessive whitespace inside brackets. Reduce spacing to meet linting standards.
- java: [ 8.0.442, 17.0.13 ]
+ java: [8.0.442, 17.0.13]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| java: [ 8.0.442, 17.0.13 ] | |
| java: [8.0.442, 17.0.13] |
🧰 Tools
🪛 YAMLlint (1.37.1)
[error] 26-26: too many spaces inside brackets
(brackets)
[error] 26-26: too many spaces inside brackets
(brackets)
🤖 Prompt for AI Agents
.github/workflows/maven-checks.yml around line 26: the array spacing "java: [
8.0.442, 17.0.13 ]" violates YAML lint rules; remove the extra spaces inside the
brackets so the array uses standard spacing like "java: [8.0.442, 17.0.13]" to
satisfy the linter.
.github/workflows/tests.yml
Outdated
|
|
||
| test: | ||
| runs-on: ubuntu-latest | ||
| runs-on: ${{ github.repository_owner == 'y-scope' && fromJSON('{"group":"Default"}') || 'ubuntu-latest' }} |
There was a problem hiding this comment.
🧩 Analysis chain
Verify grouped runner syntax for conditional assignment.
The runs-on expression uses fromJSON('{"group":"Default"}') to conditionally select a runner group when the repository owner is y-scope. Verify that this syntax is the correct and documented approach for GitHub Actions grouped runners in your runner configuration.
🌐 Web query:
GitHub Actions grouped runners conditional assignment syntax fromJSON
💡 Result:
Use fromJSON to turn a job output (JSON string) into a matrix or to index into an object for runs-on. Common patterns:
-
Dynamic matrix from a JSON array output:
jobs:
dynamic:
needs: generate
strategy:
matrix:
runner: ${{ fromJSON(needs.generate.outputs.runners) }}
runs-on: ${{ matrix.runner }} -
Pick runs-on per matrix entry from a JSON object output:
jobs:
test:
needs: setup
strategy:
matrix:
run: ['app1','app2','app3']
runs-on: ${{ fromJSON(needs.setup.outputs.runners)[matrix.run] }} -
Guard a job only when the JSON array is non-empty:
if: ${{ fromJSON(needs.setup.outputs.runners) != '[]' }}
References: GitHub Actions docs (matrix/fromJSON examples) and community examples. [1][2]
Sources
[1] GitHub Docs — Running variations of jobs in a workflow (matrix/fromJSON examples).
[2] Stack Overflow — using fromJSON to select runs-on from prior job outputs.
Correct the runs-on grouped runner syntax—use proper YAML structure instead of fromJSON.
The documented syntax for GitHub Actions grouped runners uses YAML object notation: runs-on: group: ubuntu-runners or combined with labels runs-on: group: ubuntu-runners labels: ubuntu-20.04-16core. The current expression using fromJSON('{"group":"Default"}') does not match this pattern. The runs-on value expects a runner label or the name of a runner group, not a parsed JSON object.
For conditional assignment, use a string expression instead:
runs-on: ${{ github.repository_owner == 'y-scope' && 'Default' || 'ubuntu-latest' }}Or if targeting a named runner group that requires the group: syntax, structure it as a conditional job or use a different approach.
🤖 Prompt for AI Agents
.github/workflows/tests.yml around line 44: the runs-on uses
fromJSON('{"group":"Default"}') which is invalid for runner groups; replace the
expression with a proper string conditional or the YAML group form — e.g. change
the runs-on to a string conditional like ${{ github.repository_owner ==
'y-scope' && 'Default' || 'ubuntu-latest' }} or, if you must target a runner
group object, restructure the job definition to use the group: key under runs-on
(conditionalize the whole job or use separate jobs) so runs-on receives either a
label string or a proper group: mapping.
ddfca65 to
bfa18ff
Compare
dd5c838 to
3105485
Compare
There was a problem hiding this comment.
Actionable comments posted: 2
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
📒 Files selected for processing (10)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java(3 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java(1 hunks)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java(1 hunks)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java(1 hunks)presto-clp/src/test/resources/test-pinot-split-filter.json(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-06-13T12:56:06.325Z
Learnt from: wraymo
Repo: y-scope/presto PR: 15
File: presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java:22-33
Timestamp: 2025-06-13T12:56:06.325Z
Learning: `ClpMetadataProvider` is instantiated only once and used solely by the Presto coordinator, so concurrency/thread-safety guarantees are unnecessary.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.javapresto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java
📚 Learning: 2025-09-16T17:39:45.102Z
Learnt from: anlowee
Repo: y-scope/presto PR: 64
File: presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java:284-299
Timestamp: 2025-09-16T17:39:45.102Z
Learning: CLP wildcard UDFs (CLP_WILDCARD_STRING_COLUMN, CLP_WILDCARD_INT_COLUMN, etc.) cannot be used as split filters (formerly called metadata filters) because the split filter mechanism checks whether the filter contains field names that are defined in the split filter configuration file. Since wildcard functions don't correspond to specific column names, they don't pass this validation.
Applied to files:
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.javapresto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java
🧬 Code graph analysis (8)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1)
Test(35-244)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (7)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1)
ClpYamlMetadataProvider(44-266)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (1)
ClpMySqlSplitProvider(39-300)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1)
ClpPinotSplitProvider(51-432)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1)
ClpUberPinotSplitProvider(37-123)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (1)
ClpMySqlSplitFilterProvider(34-180)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1)
ClpPinotSplitFilterProvider(25-33)presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1)
ClpUberPinotSplitFilterProvider(42-122)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (2)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1)
Test(31-155)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (3)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (1)
ClpConfig(21-223)presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1)
ClpTopNSpec(28-148)presto-native-execution/presto_cpp/presto_protocol/connector/clp/presto_protocol_clp.h (1)
SplitType(52-55)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (18)
- GitHub Check: test (8.0.442, :presto-main)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-general)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-local-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-resource-manager)
- GitHub Check: test (17.0.13, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-aggregation-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-non-hash-gen)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-tpch-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P ci-only-distributed-queries)
- GitHub Check: test (8.0.442, :presto-tests -P presto-tests-execution-memory)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: prestissimo-worker-images-build
- GitHub Check: prestocpp-linux-build-for-test
- GitHub Check: maven-checks (8.0.442)
- GitHub Check: maven-checks (17.0.13)
- GitHub Check: prestissimo-worker-images-build
| private static List<JsonNode> getQueryResult(URL url, String sql) | ||
| { | ||
| try { | ||
| HttpURLConnection conn = (HttpURLConnection) url.openConnection(); | ||
| conn.setRequestMethod("POST"); | ||
| conn.setRequestProperty("Content-Type", "application/json"); | ||
| conn.setRequestProperty("Accept", "application/json"); | ||
| conn.setDoOutput(true); | ||
| conn.setConnectTimeout((int) SECONDS.toMillis(5)); | ||
| conn.setReadTimeout((int) SECONDS.toMillis(30)); | ||
|
|
||
| log.info("Executing Pinot query: %s", sql); | ||
| ObjectMapper mapper = new ObjectMapper(); | ||
| String body = format("{\"sql\": %s }", mapper.writeValueAsString(sql)); | ||
| try (OutputStream os = conn.getOutputStream()) { | ||
| os.write(body.getBytes(StandardCharsets.UTF_8)); | ||
| } | ||
|
|
||
| int code = conn.getResponseCode(); | ||
| InputStream is = (code >= 200 && code < 300) ? conn.getInputStream() : conn.getErrorStream(); | ||
| if (is == null) { | ||
| throw new IOException("Pinot HTTP " + code + " with empty body"); | ||
| } | ||
|
|
||
| JsonNode root; | ||
| try (InputStream in = is) { | ||
| root = mapper.readTree(in); | ||
| } | ||
| JsonNode resultTable = root.get("resultTable"); | ||
| if (resultTable == null) { | ||
| throw new IllegalStateException("Pinot query response missing 'resultTable' field"); | ||
| } | ||
| JsonNode rows = resultTable.get("rows"); | ||
| if (rows == null) { | ||
| throw new IllegalStateException("Pinot query response missing 'rows' field in resultTable"); | ||
| } | ||
| ImmutableList.Builder<JsonNode> resultBuilder = ImmutableList.builder(); | ||
| for (Iterator<JsonNode> it = rows.elements(); it.hasNext(); ) { | ||
| JsonNode row = it.next(); | ||
| resultBuilder.add(row); | ||
| } | ||
| List<JsonNode> results = resultBuilder.build(); | ||
| log.debug("Number of results: %s", results.size()); | ||
| return results; | ||
| } | ||
| catch (IOException e) { | ||
| log.error(e, "IO error executing Pinot query: %s", sql); | ||
| return Collections.emptyList(); | ||
| } | ||
| catch (Exception e) { | ||
| log.error(e, "Unexpected error executing Pinot query: %s", sql); | ||
| return Collections.emptyList(); | ||
| } |
There was a problem hiding this comment.
Do not drop Pinot query failures. Returning an empty list when the HTTP call or JSON parsing fails makes listSplits believe there are zero splits, so the coordinator will quietly run the query against no data instead of surfacing the underlying failure. That is silent data loss. Please log and rethrow so the query fails fast instead of proceeding.
Apply this diff to rethrow:
catch (IOException e) {
log.error(e, "IO error executing Pinot query: %s", sql);
- return Collections.emptyList();
+ throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e);
}
catch (Exception e) {
log.error(e, "Unexpected error executing Pinot query: %s", sql);
- return Collections.emptyList();
+ throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e);
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| private static List<JsonNode> getQueryResult(URL url, String sql) | |
| { | |
| try { | |
| HttpURLConnection conn = (HttpURLConnection) url.openConnection(); | |
| conn.setRequestMethod("POST"); | |
| conn.setRequestProperty("Content-Type", "application/json"); | |
| conn.setRequestProperty("Accept", "application/json"); | |
| conn.setDoOutput(true); | |
| conn.setConnectTimeout((int) SECONDS.toMillis(5)); | |
| conn.setReadTimeout((int) SECONDS.toMillis(30)); | |
| log.info("Executing Pinot query: %s", sql); | |
| ObjectMapper mapper = new ObjectMapper(); | |
| String body = format("{\"sql\": %s }", mapper.writeValueAsString(sql)); | |
| try (OutputStream os = conn.getOutputStream()) { | |
| os.write(body.getBytes(StandardCharsets.UTF_8)); | |
| } | |
| int code = conn.getResponseCode(); | |
| InputStream is = (code >= 200 && code < 300) ? conn.getInputStream() : conn.getErrorStream(); | |
| if (is == null) { | |
| throw new IOException("Pinot HTTP " + code + " with empty body"); | |
| } | |
| JsonNode root; | |
| try (InputStream in = is) { | |
| root = mapper.readTree(in); | |
| } | |
| JsonNode resultTable = root.get("resultTable"); | |
| if (resultTable == null) { | |
| throw new IllegalStateException("Pinot query response missing 'resultTable' field"); | |
| } | |
| JsonNode rows = resultTable.get("rows"); | |
| if (rows == null) { | |
| throw new IllegalStateException("Pinot query response missing 'rows' field in resultTable"); | |
| } | |
| ImmutableList.Builder<JsonNode> resultBuilder = ImmutableList.builder(); | |
| for (Iterator<JsonNode> it = rows.elements(); it.hasNext(); ) { | |
| JsonNode row = it.next(); | |
| resultBuilder.add(row); | |
| } | |
| List<JsonNode> results = resultBuilder.build(); | |
| log.debug("Number of results: %s", results.size()); | |
| return results; | |
| } | |
| catch (IOException e) { | |
| log.error(e, "IO error executing Pinot query: %s", sql); | |
| return Collections.emptyList(); | |
| } | |
| catch (Exception e) { | |
| log.error(e, "Unexpected error executing Pinot query: %s", sql); | |
| return Collections.emptyList(); | |
| } | |
| private static List<JsonNode> getQueryResult(URL url, String sql) | |
| { | |
| try { | |
| HttpURLConnection conn = (HttpURLConnection) url.openConnection(); | |
| conn.setRequestMethod("POST"); | |
| conn.setRequestProperty("Content-Type", "application/json"); | |
| conn.setRequestProperty("Accept", "application/json"); | |
| conn.setDoOutput(true); | |
| conn.setConnectTimeout((int) SECONDS.toMillis(5)); | |
| conn.setReadTimeout((int) SECONDS.toMillis(30)); | |
| log.info("Executing Pinot query: %s", sql); | |
| ObjectMapper mapper = new ObjectMapper(); | |
| String body = format("{\"sql\": %s }", mapper.writeValueAsString(sql)); | |
| try (OutputStream os = conn.getOutputStream()) { | |
| os.write(body.getBytes(StandardCharsets.UTF_8)); | |
| } | |
| int code = conn.getResponseCode(); | |
| InputStream is = (code >= 200 && code < 300) ? conn.getInputStream() : conn.getErrorStream(); | |
| if (is == null) { | |
| throw new IOException("Pinot HTTP " + code + " with empty body"); | |
| } | |
| JsonNode root; | |
| try (InputStream in = is) { | |
| root = mapper.readTree(in); | |
| } | |
| JsonNode resultTable = root.get("resultTable"); | |
| if (resultTable == null) { | |
| throw new IllegalStateException("Pinot query response missing 'resultTable' field"); | |
| } | |
| JsonNode rows = resultTable.get("rows"); | |
| if (rows == null) { | |
| throw new IllegalStateException("Pinot query response missing 'rows' field in resultTable"); | |
| } | |
| ImmutableList.Builder<JsonNode> resultBuilder = ImmutableList.builder(); | |
| for (Iterator<JsonNode> it = rows.elements(); it.hasNext(); ) { | |
| JsonNode row = it.next(); | |
| resultBuilder.add(row); | |
| } | |
| List<JsonNode> results = resultBuilder.build(); | |
| log.debug("Number of results: %s", results.size()); | |
| return results; | |
| } | |
| catch (IOException e) { | |
| log.error(e, "IO error executing Pinot query: %s", sql); | |
| throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e); | |
| } | |
| catch (Exception e) { | |
| log.error(e, "Unexpected error executing Pinot query: %s", sql); | |
| throw new RuntimeException(format("Failed to execute Pinot query: %s", sql), e); | |
| } |
🤖 Prompt for AI Agents
In
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java
around lines 340 to 392, the method currently catches IO/other exceptions, logs
them and returns Collections.emptyList(), which silences Pinot failures and
causes silent data loss; change the catch blocks so after logging you rethrow
the original exception (or wrap it) instead of returning an empty list — e.g.,
remove the return Collections.emptyList() and add "throw e;" (or "throw new
IOException(\"Error executing Pinot query: \" + sql, e);" for checked-to-checked
wrapping) in the IOException catch and rethrow a RuntimeException wrapping the
exception in the generic catch; preserve logging and stack trace.
| // Test with trailing slash | ||
| config.setMetadataDbUrl("https://neutrino.uber.com/"); | ||
| URL result = (URL) method.invoke(splitProvider, config); | ||
| assertEquals(result.toString(), "https://neutrino.uber.com//v1/globalStatements"); | ||
|
|
||
| // Test without protocol (should work as URL constructor handles it) | ||
| config.setMetadataDbUrl("http://neutrino-dev.uber.com"); | ||
| result = (URL) method.invoke(splitProvider, config); | ||
| assertEquals(result.toString(), "http://neutrino-dev.uber.com/v1/globalStatements"); | ||
|
|
There was a problem hiding this comment.
Normalise the Neutrino endpoint URL
When the metadata base URL already ends with “/”, this concatenation yields //v1/globalStatements. That path relies on server-specific behaviour and can produce 404s. Trim trailing slashes before appending the Neutrino endpoint and update the expectation accordingly.
Apply this diff to the test:
- assertEquals(result.toString(), "https://neutrino.uber.com//v1/globalStatements");
+ assertEquals(result.toString(), "https://neutrino.uber.com/v1/globalStatements");Then update ClpUberPinotSplitProvider.buildPinotSqlQueryEndpointUrl (and the base ClpPinotSplitProvider implementation) to strip trailing slashes before adding /v1/globalStatements.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Test with trailing slash | |
| config.setMetadataDbUrl("https://neutrino.uber.com/"); | |
| URL result = (URL) method.invoke(splitProvider, config); | |
| assertEquals(result.toString(), "https://neutrino.uber.com//v1/globalStatements"); | |
| // Test without protocol (should work as URL constructor handles it) | |
| config.setMetadataDbUrl("http://neutrino-dev.uber.com"); | |
| result = (URL) method.invoke(splitProvider, config); | |
| assertEquals(result.toString(), "http://neutrino-dev.uber.com/v1/globalStatements"); | |
| // Test with trailing slash | |
| config.setMetadataDbUrl("https://neutrino.uber.com/"); | |
| URL result = (URL) method.invoke(splitProvider, config); | |
| assertEquals(result.toString(), "https://neutrino.uber.com/v1/globalStatements"); | |
| // Test without protocol (should work as URL constructor handles it) | |
| config.setMetadataDbUrl("http://neutrino-dev.uber.com"); | |
| result = (URL) method.invoke(splitProvider, config); | |
| assertEquals(result.toString(), "http://neutrino-dev.uber.com/v1/globalStatements"); |
🤖 Prompt for AI Agents
In
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java
around lines 79-88, the test currently expects a double-slash when metadataDbUrl
ends with “/”; change the test to expect a single “/v1/globalStatements” and
update the production code: in
ClpUberPinotSplitProvider.buildPinotSqlQueryEndpointUrl (and the shared base
ClpPinotSplitProvider implementation) trim any trailing slashes from
metadataDbUrl before concatenating the path, then append "/v1/globalStatements"
(ensuring exactly one slash between host and path); keep existing behavior when
the URL already lacks a trailing slash and preserve protocol handling.
Description
Checklist
breaking change.
Validation performed
Summary by CodeRabbit
New Features
Chores