ci: centralize config computation to fix env.* limitation by jackluo923 · Pull Request #113 · y-scope/presto

jackluo923 · 2025-11-29T22:23:14Z

Summary

Centralizes all configuration computation in a single config job inlined in ci.yml. This fixes the CI failure and eliminates duplicated version tag computation across workflows.

Problem

CI workflow fails immediately (0s) with error:

"Unrecognized named-value: 'env'. Located at position 1 within expression: env.IMAGE_VERSION_TYPE"

Root cause: GitHub Actions doesn't allow ${{ env.* }} in with: sections when calling reusable workflows.

Example of problematic code:

env:
  IMAGE_VERSION_TYPE: 'BETA'

jobs:
  presto:
    uses: ./.github/workflows/presto-build.yml
    with:
      image-version-type: ${{ env.IMAGE_VERSION_TYPE }}  # ❌ Fails: env.* not allowed here

Additional issue: Runtime version tag computation was duplicated in presto-build.yml and prestocpp-linux-build-and-unit-test.yml.

Solution

Problem	Solution
`env.*` not allowed in `with:` blocks	Use job outputs instead (`needs.config.outputs.*`)
Duplicated version tag computation	Centralize in `config` job
Separate `compute-builder-tag.yml` workflow	Merge into `config` job

Key Design Decisions

Centralized Config Job

A single config job computes all shared configuration upfront:

Output	Example	Purpose
`builder-image`	`ghcr.io/y-scope/presto/unified-builder:a1b2c3d4-...`	Container for build jobs
`runtime-version-tag`	`0.293-BETA-20250529140509-484b00e`	Immutable runtime image tag
`runtime-snapshot-tag`	`0.293-BETA-SNAPSHOT`	Mutable SNAPSHOT tag

Single Source of Truth

IMAGE_VERSION_TYPE is defined once in ci.yml's env: block. Changing the version stream (RELEASE/BETA/DEV) requires editing only one line.

Implementation Details

Job Dependency Graph

Before:

compute-builder-tag ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                             ├─► presto ───────┘
                                             └─► presto-tests

compute-builder-tag was a separate reusable workflow
presto and prestocpp each computed their own runtime version tags
env.IMAGE_VERSION_TYPE passed via with: blocks (broken)

After:

config ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                ├─► presto ───────┘
                                └─► presto-tests

config is an inline job in ci.yml
All version tags computed once and passed as job outputs
No env.* usage in with: blocks

Files Changed

File	Change
`ci.yml`	Add inline `config` job with 3 outputs, update all job references
`compute-builder-tag.yml`	Deleted (merged into ci.yml)
`create-builder-image.yml`	Remove `builder-tag` input, simplify concurrency group
`presto-build.yml`	Replace `image-version-type` with `runtime-version-tag` + `runtime-snapshot-tag`
`prestocpp-linux-build-and-unit-test.yml`	Same as presto-build.yml

Test Plan

CI workflow triggers successfully on this PR
All jobs pass (config, create-builder-image, presto, prestocpp, integration-tests)
Docker images are tagged correctly with runtime-version-tag and runtime-snapshot-tag

Contributor Checklist

Commit messages follow the project's conventions
All CI checks pass
Documentation updated where applicable

Summary by CodeRabbit

New Features
- Added YAML-based metadata provider for CLP plugin
- Introduced Pinot-compatible split providers with TopN optimization support
- Unified CI/CD pipeline architecture with containerized builders
Enhancements
- Improved filter optimization with column remapping capabilities
- Enhanced UDF rewriting for better query planning
- Strengthened integration test coverage with split filtering
Dependencies
- Updated Joda-Time to version 2.13.1
- Added Jackson YAML support
Chores
- Refactored spill path configuration in tests
- Reorganized GitHub Actions workflows for improved maintainability

_{✏️ Tip: You can customize this high-level summary in your review settings.}

…ilds Change Docker image tagging from static :dev tags to dynamic branch-based tags, and update all workflow conditions to accept any branch with the release-0.293-clp-connector-snapshot prefix. Changes: - Docker image tags now use actual branch name via type=ref,event=branch * Before: ghcr.io/repo/prestissimo-worker:dev * After: ghcr.io/repo/prestissimo-worker:release-0.293-clp-connector-snapshot-nov29 - Push conditions changed from exact match to prefix match using startsWith() - Cancel-in-progress logic updated to exclude all snapshot prefix branches - PR title checks now accept release-0.293-clp-connector-snapshot* pattern Example: Pushing to branch "release-0.293-clp-connector-snapshot-nov29" will: - Build and push images tagged with :release-0.293-clp-connector-snapshot-nov29 - Run all CI jobs to completion without cancellation - Enable PR title validation for PRs targeting this branch This allows multiple snapshot branches to coexist with unique image tags.

Add alternative provider implementations for metadata and split discovery, enabling flexible deployment configurations without MySQL dependency. New providers: - ClpYamlMetadataProvider: Reads table/column metadata from YAML config files instead of querying MySQL database. Supports nested schemas and polymorphic types through hierarchical YAML structure. - ClpPinotSplitProvider: Queries Apache Pinot database for archive file paths via HTTP/JSON API instead of MySQL. Determines split types based on file extensions (.clp.zst for IR, otherwise ARCHIVE). Configuration: - Add clp.metadata-provider-type=YAML option for YAML-based metadata - Add clp.split-provider-type=PINOT option for Pinot-based split discovery - Add clp.metadata-yaml-path config for YAML metadata file location The YAML metadata provider parses a two-level structure: 1. Main metadata file: maps connector → schema → table → schema_file_path 2. Individual table schema files: define column names and types This provides deployment flexibility - users can now choose between: - Database-driven (MySQL for both metadata and splits) - Hybrid (YAML metadata + Pinot splits) - Or any combination based on their infrastructure Dependencies: Adds jackson-dataformat-yaml and snakeyaml 2.1

Ensure all timestamp literals are converted to nanoseconds before generating KQL and SQL filters for pushdown, regardless of their original precision (milliseconds, microseconds, etc.). Problem: Presto supports multiple timestamp precisions (TIMESTAMP with millisecond precision, TIMESTAMP_MICROSECONDS, etc.), but CLP's query engine expects timestamps in a consistent nanosecond format. Previously, timestamp literals were passed through without precision normalization, causing incorrect results when different timestamp types were used in queries. Solution: Add timestamp normalization logic in ClpFilterToKqlConverter: - tryEnsureNanosecondTimestamp(): Detects TIMESTAMP and TIMESTAMP_MICROSECONDS types - ensureNanosecondTimestamp(): Converts timestamp values to nanoseconds using: 1. Extract epoch seconds via TimestampType.getEpochSecond() 2. Extract nanosecond fraction via TimestampType.getNanos() 3. Convert to total nanoseconds: SECONDS.toNanos(seconds) + nanosFraction Applied to: - BETWEEN operator: Both lower and upper bounds - Comparison operators: All timestamp literal values (=, !=, <, >, <=, >=) This ensures consistent timestamp representation in both KQL filters (sent to CLP engine) and SQL filters (used for metadata filtering), regardless of the timestamp precision used in the original Presto query. Example: Query: WHERE timestamp BETWEEN TIMESTAMP '2025-01-01' AND TIMESTAMP '2025-01-02' Before: Literal values passed as milliseconds or microseconds After: Both bounds normalized to nanosecond representation

Implement a plan optimizer that pushes down LIMIT + ORDER BY operations to reduce split scanning when querying on timestamp metadata columns. The optimizer detects TopN patterns in query plans and uses archive metadata (timestamp bounds, message counts) to select only the minimum set of archives needed to guarantee correct results. Key features: - Detects and rewrites TopN → [Project] → [Filter] → TableScan patterns - Pushes TopN specifications through ClpTableLayoutHandle - Handles overlapping archives by merging into non-overlapping groups - Supports both ASC (earliest-first) and DESC (latest-first) ordering - Uses worst-case analysis to ensure correctness when archive ranges overlap Implementation: - ClpTopNSpec: Data structure for TopN specifications (limit + orderings) - ClpComputePushDown: Enhanced plan optimizer with TopN detection - ClpMySqlSplitProvider: Archive selection logic using metadata queries - Comprehensive test coverage for various TopN scenarios Performance: For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100", this eliminates scanning of archives that cannot contain the top results, significantly reducing I/O.

@JsonProperty

Implement TopN pushdown optimization in ClpPinotSplitProvider to minimize archive scanning when executing LIMIT + ORDER BY queries on timestamp metadata columns. The optimization uses archive metadata (timestamp bounds and message counts) to intelligently select only the minimum set of archives needed to guarantee correct TopN results, significantly reducing I/O for time-range queries. Implementation details: Archive selection algorithm: - Fetches archive metadata (file path, creation time, modification time, message count) - Merges overlapping archives into non-overlapping groups by timestamp ranges - For DESC ordering: includes newest group + older groups until limit is covered - For ASC ordering: includes oldest group + newer groups until limit is covered - Uses worst-case analysis to ensure correctness when archives overlap Code structure: - ClpPinotSplitProvider.listSplits(): Detects TopN specs and routes to optimized path - selectTopNArchives(): Implements the archive selection algorithm - toArchiveGroups(): Merges overlapping archives into logical components - ArchiveMeta: Represents archive metadata with validation - ArchiveGroup: Represents merged archive groups for overlap handling ClpTopNSpec enhancements: - Made fields private with proper encapsulation - Added @JsonProperty annotations to getters for correct serialization - Maintains immutability for thread-safety Code quality improvements: - Proper exception handling with context-aware error messages - Input validation in constructors (bounds checking, null checks) - Extracted determineSplitType() helper to eliminate duplication - Made fields final where immutable (pinotDatabaseUrl, ArchiveMeta fields) - Improved logging with table names and SQL queries for debugging - Better encapsulation: private fields, getters-only access Performance impact: For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100", this eliminates scanning of archives outside the time range of interest, providing substantial I/O reduction for large archive sets.

…ability.

Implements dynamic schema discovery from YAML metadata files, allowing the CLP connector to support multiple schemas beyond just "default" (e.g., clp.dev.table, clp.prod.table). Key changes: - Add default listSchemaNames() method to ClpMetadataProvider interface for DRY principle - Implement dynamic schema discovery in ClpYamlMetadataProvider with thread-safe caching - Update ClpMetadata to delegate schema listing to metadata providers - Add comprehensive error handling with graceful fallback to default schema - Optimize performance with double-checked locking and ObjectMapper reuse - Add 15+ unit tests covering all edge cases and error scenarios - Fix checkstyle violations in UberClpPinotSplitProvider and ClpPinotSplitProvider The implementation is backward compatible - existing single-schema setups continue to work without configuration changes. Multi-schema support is activated automatically when the YAML metadata file contains multiple schema entries. Example YAML structure: ```yaml clp: default: logs: /path/to/default/logs.yaml dev: test_logs: /path/to/dev/logs.yaml prod: production_logs: /path/to/prod/logs.yaml ``` Performance optimizations: - Thread-safe caching prevents repeated YAML parsing - Double-checked locking pattern for lazy initialization - Reused ObjectMapper instances reduce object creation overhead - Synchronized table schema map updates ensure thread safety Testing: - All 11 tests in TestClpYamlMetadataProvider pass - All 4 tests in TestClpMultiSchema pass - Fixed testListSchemaNamesNoCatalogField to handle missing catalog gracefully

…lConverter (#90)

…test failures (#103)

…restodb#26254) ## Description Fix flakytests that uses `experimental.spiller-spill-path` as `/tmp/presto/spills/` in query runner ## Motivation and Context prestodb#25890 ## Impact  ## Test Plan  ## Contributor checklist - [ ] Please make sure your submission complies with our [contributing guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md), in particular [code style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style) and [commit standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards). - [ ] PR description addresses the issue accurately and concisely. If the change is non-trivial, a GitHub Issue is referenced. - [ ] Documented new properties (with its default value), SQL syntax, functions, or other functionality. - [ ] If release notes are required, they follow the [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines). - [ ] Adequate tests were added if applicable. - [ ] CI passed. ## Release Notes Please follow [release notes guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines) and fill in the release notes below. ``` == NO RELEASE NOTE == ```

Upgrade joda-time to 2.13.1 to support new timezone data including America/Coyhaique. Update tests to use dates where timezone gaps still exist in the updated tzdata: - testLocallyUnrepresentableDateLiterals: 1970-01-01 -> 1932-04-01 - testLocallyUnrepresentableTimeLiterals: use 2017-04-02 or 2012-04-01 Cherry-picked from upstream: prestodb/presto@0a504d20e6

… runners Key changes: - Add comprehensive CI architecture documentation explaining: - Terminology (presto, prestocpp, prestissimo) - Unified builder image strategy for ephemeral runners - Job dependency graph - Comparison with upstream (prestodb/presto) CI - Performance benefits of pre-warmed ccache and Maven deps - Consolidate presto-java8/java17 jobs into single matrix-based `presto` job - ARTIFACT_JAVA_VERSION controls which version uploads artifacts/images (default: '8') - Add prestissimo image building to prestocpp workflow - Downloads artifacts from build job, packages into runtime image - Uses same tagging strategy as presto image (immutable + SNAPSHOT tags) - Centralize IMAGE_VERSION_TYPE configuration (set to 'BETA') - Applied to both presto and prestissimo images - Document artifacts (presto-server, presto-cli, presto-native-build) and Docker images (unified-builder, presto, prestissimo)

Disable the CLP integration test due to a timestamp filtering issue in the CLP native reader where the query returns empty results when filtering by timestamp. A separate PR will be raised to fix this unit test.

Restructure header comments to explain key design decisions: - Caching strategy: why bake caches into Docker image layers - Image tagging: dual tags (immutable + SNAPSHOT) and version streams - Builder image tag: auto-computed dependency hash Also update comparison table with split image tag rows (builder vs runtime).

Fixes CI workflow failure caused by using `${{ env.* }}` in `with:` blocks when calling reusable workflows (GitHub Actions limitation). Changes: - Add inline `config` job that computes builder-image and runtime tags - Delete compute-builder-tag.yml (merged into ci.yml) - Update presto-build.yml and prestocpp to use passed runtime tags - Update create-builder-image.yml to only require builder-image input

coderabbitai · 2025-11-29T22:24:18Z

Caution

Review failed

The pull request is closed.

Walkthrough

This PR refactors the CI/CD pipeline into modular reusable workflows, introduces a unified YScope Presto builder Docker image, adds YAML-based metadata support to the CLP plugin with Top-N query pushdown, implements Pinot and Pinot Uber split providers, updates dependencies, and standardizes test configurations with UUID-based spill paths.

Changes

Cohort / File(s)	Summary
GitHub Actions Workflow Refactoring `.github/workflows/ci.yml`, `.github/workflows/create-builder-image.yml`, `.github/workflows/presto-build.yml`, `.github/workflows/prestocpp-linux-build-and-unit-test.yml`, `.github/workflows/integration-tests.yml`, `.github/workflows/tests.yml`	New modular workflow orchestration with centralized config job computing image tags, distributed builder image creation, and parallel jobs for Java/C++ builds, tests, and integration tests. Replaces monolithic CI approach with workflow_call-driven composition.
Removed Workflows `.github/workflows/maven-checks.yml`, `.github/workflows/prestissimo-worker-images-build.yml`	Legacy workflow files removed as part of CI consolidation.
Workflow Updates `.github/workflows/docs.yml`, `.github/workflows/pr-title-checks.yaml`, `.github/workflows/prestocpp-format-and-header-check.yml`	Concurrency conditions updated to exclude branches matching `release-0.293-clp-connector-snapshot*` prefix.
Builder Image & Infrastructure `.github/bin/download_nodejs`, `.github/dockerfiles/yscope-presto-builder.dockerfile`, `presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile`	New unified builder Dockerfile with Maven cache, Node.js, CMake, ccache pre-warming, and artifact caching. Maven path unification via `${MAVEN_REPO}` environment variable. Runtime Dockerfile simplified to use prebuilt artifacts from builder.
CLP Metadata & Configuration `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java`	Added YAML metadata provider type and Pinot/Pinot Uber split provider support. Provider binding refactored via map-based lookups. Schema name listing delegated to metadata provider. New error code for unsupported YAML table schemas.
YAML Metadata Provider Implementation `presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java`	New YAML-driven metadata provider with schema caching, nested column extraction, and graceful error handling. Default method added to interface for schema listing.
CLP Optimization & Pushdown `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java`	Top-N pushdown implementation with layout updates, nanosecond timestamp normalization, push-down variable tracking throughout expressions and filter conversions. New Top-N specification class with order direction and column lists.
CLP Split Providers `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java`	MySQL provider enhanced with Top-N archive filtering. New Pinot provider with dynamic split discovery, archive metadata querying, and Top-N support. Uber-specific Pinot provider with custom table naming and query endpoints.
CLP Split Filters `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java`	Abstract method added for column name remapping. Pinot and Uber Pinot filter providers created. Uber provider implements TEXT_MATCH transformations for merged text index queries.
CLP Core Updates `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java`, `presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java`	ClpSplit adds hashCode/equals implementations. ClpTableLayoutHandle adds metadataQueryOnly boolean and optional TopN specification with updated constructors and equality methods. UdfRewriter explicit table scan handling. Minor formatting in optimizer provider.
CLP Tests - Core `presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java`	Import updates, test data setup modified to include archive message counts, database merge semantics updated.
CLP Tests - New `presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java`, `presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java`	Comprehensive test coverage for YAML metadata discovery, Top-N query planning, Pinot provider integration, Uber-specific transformations, and filter remapping.
CLP Test Resources `presto-clp/src/test/resources/test-*.yaml`, `presto-clp/src/test/resources/test-pinot-split-filter.json`, `presto-clp/src/test/resources/test-topn-split-filter.json`	New YAML schema files for CockroachDB, orders, and users tables across multiple schemas; JSON configuration files for split filter testing.
Dependencies `pom.xml`, `presto-clp/pom.xml`, `presto-native-execution/pom.xml`	Joda-Time updated from 2.12.7 to 2.13.1. Jackson YAML and SnakeYAML dependencies added with version 2.1 upper bound.
Native Execution Tests `presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java`, `presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeClpGeneralQueries.java`	Tests disabled with enabled=false annotation pending fixes.
Date/Time Tests `presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java`, `presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java`, `presto-tests/src/test/java/com/facebook/presto/tests/TestH2QueryRunner.java`	Hard-coded date literals updated (1970-01-01 → 2012-04-01 or 1932-04-01) for timezone offset validation.
Spill Path Uniqueness `presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedQueriesIndexed.java`, `presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilledQueries.java`, `presto-tests/src/test/java/com/facebook/presto/tests/TestSpilled*.java`, `presto-thrift-connector/src/test/java/com/facebook/presto/connector/thrift/integration/TestThriftDistributedQueriesIndexed.java`	UUID-based subdirectories appended to spill paths in test configurations to ensure isolation per run.
Submodule & Minor Updates `presto-native-execution/velox`, `presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java`	Velox submodule commit reference updated. Minor test cleanup.

Sequence Diagram(s)

sequenceDiagram
    actor PR as Pull Request Trigger
    participant Config as config job
    participant BuilderImg as create-builder-image job
    participant PrestoBuild as presto-build job
    participant PrestoTest as tests job
    participant PrestoCppBuild as prestocpp-linux-build-and-unit-test job
    participant Integration as integration-tests job
    participant GHCR as GHCR Registry

    PR->>Config: workflow triggered
    Config->>Config: compute builder-image hash<br/>compute runtime-version-tag<br/>compute runtime-snapshot-tag
    Config-->>BuilderImg: outputs: builder-image, tags

    BuilderImg->>GHCR: check image exists
    alt image not found
        BuilderImg->>GHCR: build & push builder image
    end
    BuilderImg-->>PrestoBuild: builder-image ready
    BuilderImg-->>PrestoCppBuild: builder-image ready
    BuilderImg-->>PrestoTest: builder-image ready
    BuilderImg-->>Integration: builder-image ready

    par Java & C++ Builds
        PrestoBuild->>PrestoBuild: checkout, setup Java<br/>Maven build (multi-version)
        PrestoBuild->>PrestoBuild: conditionally upload artifacts
        PrestoBuild->>GHCR: build Presto Docker image
        
        PrestoCppBuild->>PrestoCppBuild: checkout, setup builder image<br/>cmake/ninja build<br/>upload prestocpp artifacts
        PrestoCppBuild->>GHCR: build prestissimo runtime image
    end

    par Tests & Integration
        PrestoTest->>PrestoTest: matrix test execution<br/>presto-tests across Java versions
        Integration->>Integration: download Presto artifacts<br/>run integration-e2e-java<br/>run integration-storage-java<br/>run integration-sidecar-java
    end

    PrestoBuild-->>PrestoTest: artifact outputs
    PrestoCppBuild-->>Integration: artifact outputs

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50–75 minutes

Areas requiring extra attention:

CLP Top-N Pushdown Logic (ClpComputePushDown.java, ClpTopNSpec.java): Complex rewrite logic for plan transformation with multiple edge cases around ordering equality and column remapping that requires careful validation against test expectations.
Pinot Split Provider Implementation (ClpPinotSplitProvider.java, ClpUberPinotSplitProvider.java): HTTP query execution, archive grouping algorithm with overlapping logic, and Top-N archive selection merit detailed scrutiny.
YAML Metadata Provider (ClpYamlMetadataProvider.java): Schema caching with double-checked locking, nested column extraction recursion, and graceful fallback handling on parsing errors should be verified for thread safety and correctness.
CI/CD Workflow Orchestration (.github/workflows/*.yml): New modular approach with config job outputs feeding parallel jobs; verify tag propagation, concurrency controls, and conditional logic for image builds and artifact uploads.
Push-Down Variables Threading (ClpExpression.java, ClpFilterToKqlConverter.java, ClpSplitFilterProvider.java): Signature changes propagating variable sets through multiple layers; ensure all call sites properly maintain these sets and that null/empty handling is consistent.
Database Schema Updates (ClpMetadataDbSetUp.java): Archive table now includes num_messages column; verify all INSERT/MERGE statements and test fixtures correctly provide this field.

Possibly related PRs

feat: Add support for defining and using specific columns for metadata filtering. #19: Directly related CLP pushdown and metadata filtering refactoring; modifies overlapping ClpExpression, ClpFilterToKql, split filter provider, and layout handling code paths.

Suggested reviewers

kirkrodrigues

✨ Finishing touches

📝 Generate docstrings

🧪 Generate unit tests (beta)

Create PR with unit tests
Post copyable unit tests in a comment
Commit unit tests in branch ci-snapshot

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and fefa99e.

📒 Files selected for processing (70)

.github/bin/download_nodejs (2 hunks)
.github/dockerfiles/yscope-presto-builder.dockerfile (1 hunks)
.github/workflows/ci.yml (1 hunks)
.github/workflows/create-builder-image.yml (1 hunks)
.github/workflows/docs.yml (1 hunks)
.github/workflows/integration-tests.yml (1 hunks)
.github/workflows/maven-checks.yml (0 hunks)
.github/workflows/pr-title-checks.yaml (1 hunks)
.github/workflows/prestissimo-worker-images-build.yml (0 hunks)
.github/workflows/presto-build.yml (1 hunks)
.github/workflows/prestocpp-format-and-header-check.yml (1 hunks)
.github/workflows/prestocpp-linux-build-and-unit-test.yml (2 hunks)
.github/workflows/tests.yml (2 hunks)
pom.xml (2 hunks)
presto-clp/pom.xml (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (6 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (6 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (18 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (5 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1 hunks)
presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1 hunks)
presto-clp/src/test/resources/test-cockroachdb-schema.yaml (1 hunks)
presto-clp/src/test/resources/test-orders-schema1.yaml (1 hunks)
presto-clp/src/test/resources/test-orders-schema2.yaml (1 hunks)
presto-clp/src/test/resources/test-pinot-split-filter.json (1 hunks)
presto-clp/src/test/resources/test-tables-schema.yaml (1 hunks)
presto-clp/src/test/resources/test-topn-split-filter.json (1 hunks)
presto-clp/src/test/resources/test-users-schema1.yaml (1 hunks)
presto-native-execution/pom.xml (1 hunks)
presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1 hunks)
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java (2 hunks)
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeClpGeneralQueries.java (1 hunks)
presto-native-execution/velox (1 hunks)
presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java (1 hunks)
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedQueriesIndexed.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilledQueries.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestH2QueryRunner.java (1 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithHighMemoryRevokingThreshold.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithLargeBlockSpillingEnabled.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithPreprocessingEnabled.java (2 hunks)
presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledTopNQueries.java (2 hunks)
presto-thrift-connector/src/test/java/com/facebook/presto/connector/thrift/integration/TestThriftDistributedQueriesIndexed.java (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

jackluo923 and others added 20 commits October 28, 2025 12:00

feat: Upgrade velox submodule to y-scope/velox@52bb2f (#85)

5a861d1

fix: Rewrite CLP UDFs across full plan tree (#86)

c8aad93

refactor: Make Pinot query endpoint URL construction overridable (#88)

e3082e5

Refactor: Auto-correct code style violations for consistency and read…

a3deb6d

…ability.

fix: Add metadata SQL generation for string literals in ClpFilterToKq…

3105485

…lConverter (#90)

feat: Add Pinot split filter providers with TEXT_MATCH support (#101)

519a803

refactor: Improve ClpYamlMetadataProvider cache organization and fix …

dd4dda1

…test failures (#103)

[native] Disable TestPrestoNativeAsyncDataCacheCleanupAPI

221dfe5

test: temporarily disable TestPrestoNativeClpGeneralQueries.test

8c8abef

Disable the CLP integration test due to a timestamp filtering issue in the CLP native reader where the query returns empty results when filtering by timestamp. A separate PR will be raised to fix this unit test.

jackluo923 closed this Nov 29, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: centralize config computation to fix env.* limitation#113

ci: centralize config computation to fix env.* limitation#113
jackluo923 wants to merge 20 commits intorelease-0.293-clp-connectorfrom
ci-snapshot

jackluo923 commented Nov 29, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Nov 29, 2025 •

edited

Loading

Review failed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

jackluo923 commented Nov 29, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Problem

Solution

Key Design Decisions

Centralized Config Job

Single Source of Truth

Implementation Details

Job Dependency Graph

Files Changed

Test Plan

Contributor Checklist

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Nov 29, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Suggested reviewers

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jackluo923 commented Nov 29, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Nov 29, 2025 •

edited

Loading