Skip to content

ci: centralize config computation to fix env.* limitation#113

Closed
jackluo923 wants to merge 20 commits intorelease-0.293-clp-connectorfrom
ci-snapshot
Closed

ci: centralize config computation to fix env.* limitation#113
jackluo923 wants to merge 20 commits intorelease-0.293-clp-connectorfrom
ci-snapshot

Conversation

@jackluo923
Copy link
Member

@jackluo923 jackluo923 commented Nov 29, 2025

Summary

Centralizes all configuration computation in a single config job inlined in ci.yml. This fixes the CI failure and eliminates duplicated version tag computation across workflows.

Problem

CI workflow fails immediately (0s) with error:

"Unrecognized named-value: 'env'. Located at position 1 within expression: env.IMAGE_VERSION_TYPE"

Root cause: GitHub Actions doesn't allow ${{ env.* }} in with: sections when calling reusable workflows.

Example of problematic code:

env:
  IMAGE_VERSION_TYPE: 'BETA'

jobs:
  presto:
    uses: ./.github/workflows/presto-build.yml
    with:
      image-version-type: ${{ env.IMAGE_VERSION_TYPE }}  # ❌ Fails: env.* not allowed here

Additional issue: Runtime version tag computation was duplicated in presto-build.yml and prestocpp-linux-build-and-unit-test.yml.

Solution

Problem Solution
env.* not allowed in with: blocks Use job outputs instead (needs.config.outputs.*)
Duplicated version tag computation Centralize in config job
Separate compute-builder-tag.yml workflow Merge into config job

Key Design Decisions

Centralized Config Job

A single config job computes all shared configuration upfront:

Output Example Purpose
builder-image ghcr.io/y-scope/presto/unified-builder:a1b2c3d4-... Container for build jobs
runtime-version-tag 0.293-BETA-20250529140509-484b00e Immutable runtime image tag
runtime-snapshot-tag 0.293-BETA-SNAPSHOT Mutable SNAPSHOT tag

Single Source of Truth

IMAGE_VERSION_TYPE is defined once in ci.yml's env: block. Changing the version stream (RELEASE/BETA/DEV) requires editing only one line.

Implementation Details

Job Dependency Graph

Before:

compute-builder-tag ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                             ├─► presto ───────┘
                                             └─► presto-tests
  • compute-builder-tag was a separate reusable workflow
  • presto and prestocpp each computed their own runtime version tags
  • env.IMAGE_VERSION_TYPE passed via with: blocks (broken)

After:

config ─► create-builder-image ─┬─► prestocpp ────┬─► integration-tests
                                ├─► presto ───────┘
                                └─► presto-tests
  • config is an inline job in ci.yml
  • All version tags computed once and passed as job outputs
  • No env.* usage in with: blocks

Files Changed

File Change
ci.yml Add inline config job with 3 outputs, update all job references
compute-builder-tag.yml Deleted (merged into ci.yml)
create-builder-image.yml Remove builder-tag input, simplify concurrency group
presto-build.yml Replace image-version-type with runtime-version-tag + runtime-snapshot-tag
prestocpp-linux-build-and-unit-test.yml Same as presto-build.yml

Test Plan

  • CI workflow triggers successfully on this PR
  • All jobs pass (config, create-builder-image, presto, prestocpp, integration-tests)
  • Docker images are tagged correctly with runtime-version-tag and runtime-snapshot-tag

Contributor Checklist

  • Commit messages follow the project's conventions
  • All CI checks pass
  • Documentation updated where applicable

Summary by CodeRabbit

  • New Features

    • Added YAML-based metadata provider for CLP plugin
    • Introduced Pinot-compatible split providers with TopN optimization support
    • Unified CI/CD pipeline architecture with containerized builders
  • Enhancements

    • Improved filter optimization with column remapping capabilities
    • Enhanced UDF rewriting for better query planning
    • Strengthened integration test coverage with split filtering
  • Dependencies

    • Updated Joda-Time to version 2.13.1
    • Added Jackson YAML support
  • Chores

    • Refactored spill path configuration in tests
    • Reorganized GitHub Actions workflows for improved maintainability

✏️ Tip: You can customize this high-level summary in your review settings.

jackluo923 and others added 20 commits October 28, 2025 12:00
…ilds

Change Docker image tagging from static :dev tags to dynamic branch-based tags,
and update all workflow conditions to accept any branch with the
release-0.293-clp-connector-snapshot prefix.

Changes:
- Docker image tags now use actual branch name via type=ref,event=branch
  * Before: ghcr.io/repo/prestissimo-worker:dev
  * After: ghcr.io/repo/prestissimo-worker:release-0.293-clp-connector-snapshot-nov29
- Push conditions changed from exact match to prefix match using startsWith()
- Cancel-in-progress logic updated to exclude all snapshot prefix branches
- PR title checks now accept release-0.293-clp-connector-snapshot* pattern

Example: Pushing to branch "release-0.293-clp-connector-snapshot-nov29" will:
- Build and push images tagged with :release-0.293-clp-connector-snapshot-nov29
- Run all CI jobs to completion without cancellation
- Enable PR title validation for PRs targeting this branch

This allows multiple snapshot branches to coexist with unique image tags.
  Add alternative provider implementations for metadata and split discovery,
  enabling flexible deployment configurations without MySQL dependency.

  New providers:
  - ClpYamlMetadataProvider: Reads table/column metadata from YAML config files
    instead of querying MySQL database. Supports nested schemas and polymorphic
    types through hierarchical YAML structure.

  - ClpPinotSplitProvider: Queries Apache Pinot database for archive file paths
    via HTTP/JSON API instead of MySQL. Determines split types based on file
    extensions (.clp.zst for IR, otherwise ARCHIVE).

  Configuration:
  - Add clp.metadata-provider-type=YAML option for YAML-based metadata
  - Add clp.split-provider-type=PINOT option for Pinot-based split discovery
  - Add clp.metadata-yaml-path config for YAML metadata file location

  The YAML metadata provider parses a two-level structure:
  1. Main metadata file: maps connector → schema → table → schema_file_path
  2. Individual table schema files: define column names and types

  This provides deployment flexibility - users can now choose between:
  - Database-driven (MySQL for both metadata and splits)
  - Hybrid (YAML metadata + Pinot splits)
  - Or any combination based on their infrastructure

  Dependencies: Adds jackson-dataformat-yaml and snakeyaml 2.1
Ensure all timestamp literals are converted to nanoseconds before generating
KQL and SQL filters for pushdown, regardless of their original precision
(milliseconds, microseconds, etc.).

Problem:
Presto supports multiple timestamp precisions (TIMESTAMP with millisecond
precision, TIMESTAMP_MICROSECONDS, etc.), but CLP's query engine expects
timestamps in a consistent nanosecond format. Previously, timestamp literals
were passed through without precision normalization, causing incorrect
results when different timestamp types were used in queries.

Solution:
Add timestamp normalization logic in ClpFilterToKqlConverter:
- tryEnsureNanosecondTimestamp(): Detects TIMESTAMP and TIMESTAMP_MICROSECONDS types
- ensureNanosecondTimestamp(): Converts timestamp values to nanoseconds using:
  1. Extract epoch seconds via TimestampType.getEpochSecond()
  2. Extract nanosecond fraction via TimestampType.getNanos()
  3. Convert to total nanoseconds: SECONDS.toNanos(seconds) + nanosFraction

Applied to:
- BETWEEN operator: Both lower and upper bounds
- Comparison operators: All timestamp literal values (=, !=, <, >, <=, >=)

This ensures consistent timestamp representation in both KQL filters (sent to
CLP engine) and SQL filters (used for metadata filtering), regardless of the
timestamp precision used in the original Presto query.

Example:
Query: WHERE timestamp BETWEEN TIMESTAMP '2025-01-01' AND TIMESTAMP '2025-01-02'
Before: Literal values passed as milliseconds or microseconds
After: Both bounds normalized to nanosecond representation
   Implement a plan optimizer that pushes down LIMIT + ORDER BY operations
   to reduce split scanning when querying on timestamp metadata columns.

   The optimizer detects TopN patterns in query plans and uses archive
   metadata (timestamp bounds, message counts) to select only the minimum
   set of archives needed to guarantee correct results.

   Key features:
   - Detects and rewrites TopN → [Project] → [Filter] → TableScan patterns
   - Pushes TopN specifications through ClpTableLayoutHandle
   - Handles overlapping archives by merging into non-overlapping groups
   - Supports both ASC (earliest-first) and DESC (latest-first) ordering
   - Uses worst-case analysis to ensure correctness when archive ranges overlap

   Implementation:
   - ClpTopNSpec: Data structure for TopN specifications (limit + orderings)
   - ClpComputePushDown: Enhanced plan optimizer with TopN detection
   - ClpMySqlSplitProvider: Archive selection logic using metadata queries
   - Comprehensive test coverage for various TopN scenarios

   Performance: For queries like "SELECT * FROM logs ORDER BY timestamp DESC
   LIMIT 100", this eliminates scanning of archives that cannot contain
   the top results, significantly reducing I/O.
Implement TopN pushdown optimization in ClpPinotSplitProvider to minimize
archive scanning when executing LIMIT + ORDER BY queries on timestamp
metadata columns.

The optimization uses archive metadata (timestamp bounds and message counts)
to intelligently select only the minimum set of archives needed to guarantee
correct TopN results, significantly reducing I/O for time-range queries.

Implementation details:

Archive selection algorithm:
- Fetches archive metadata (file path, creation time, modification time, message count)
- Merges overlapping archives into non-overlapping groups by timestamp ranges
- For DESC ordering: includes newest group + older groups until limit is covered
- For ASC ordering: includes oldest group + newer groups until limit is covered
- Uses worst-case analysis to ensure correctness when archives overlap

Code structure:
- ClpPinotSplitProvider.listSplits(): Detects TopN specs and routes to optimized path
- selectTopNArchives(): Implements the archive selection algorithm
- toArchiveGroups(): Merges overlapping archives into logical components
- ArchiveMeta: Represents archive metadata with validation
- ArchiveGroup: Represents merged archive groups for overlap handling

ClpTopNSpec enhancements:
- Made fields private with proper encapsulation
- Added @JsonProperty annotations to getters for correct serialization
- Maintains immutability for thread-safety

Code quality improvements:
- Proper exception handling with context-aware error messages
- Input validation in constructors (bounds checking, null checks)
- Extracted determineSplitType() helper to eliminate duplication
- Made fields final where immutable (pinotDatabaseUrl, ArchiveMeta fields)
- Improved logging with table names and SQL queries for debugging
- Better encapsulation: private fields, getters-only access

Performance impact:
For queries like "SELECT * FROM logs ORDER BY timestamp DESC LIMIT 100",
this eliminates scanning of archives outside the time range of interest,
providing substantial I/O reduction for large archive sets.
Implements dynamic schema discovery from YAML metadata files, allowing the CLP connector
to support multiple schemas beyond just "default" (e.g., clp.dev.table, clp.prod.table).

Key changes:
- Add default listSchemaNames() method to ClpMetadataProvider interface for DRY principle
- Implement dynamic schema discovery in ClpYamlMetadataProvider with thread-safe caching
- Update ClpMetadata to delegate schema listing to metadata providers
- Add comprehensive error handling with graceful fallback to default schema
- Optimize performance with double-checked locking and ObjectMapper reuse
- Add 15+ unit tests covering all edge cases and error scenarios
- Fix checkstyle violations in UberClpPinotSplitProvider and ClpPinotSplitProvider

The implementation is backward compatible - existing single-schema setups continue to work
without configuration changes. Multi-schema support is activated automatically when the
YAML metadata file contains multiple schema entries.

Example YAML structure:
```yaml
clp:
  default:
    logs: /path/to/default/logs.yaml
  dev:
    test_logs: /path/to/dev/logs.yaml
  prod:
    production_logs: /path/to/prod/logs.yaml
```

Performance optimizations:
- Thread-safe caching prevents repeated YAML parsing
- Double-checked locking pattern for lazy initialization
- Reused ObjectMapper instances reduce object creation overhead
- Synchronized table schema map updates ensure thread safety

Testing:
- All 11 tests in TestClpYamlMetadataProvider pass
- All 4 tests in TestClpMultiSchema pass
- Fixed testListSchemaNamesNoCatalogField to handle missing catalog gracefully
…restodb#26254)

## Description
Fix flakytests that uses `experimental.spiller-spill-path` as
`/tmp/presto/spills/` in query runner

## Motivation and Context
prestodb#25890
## Impact
<!---Describe any public API or user-facing feature change or any
performance impact-->

## Test Plan
<!---Please fill in how you tested your change-->

## Contributor checklist

- [ ] Please make sure your submission complies with our [contributing
guide](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md),
in particular [code
style](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#code-style)
and [commit
standards](https://github.com/prestodb/presto/blob/master/CONTRIBUTING.md#commit-standards).
- [ ] PR description addresses the issue accurately and concisely. If
the change is non-trivial, a GitHub Issue is referenced.
- [ ] Documented new properties (with its default value), SQL syntax,
functions, or other functionality.
- [ ] If release notes are required, they follow the [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines).
- [ ] Adequate tests were added if applicable.
- [ ] CI passed.

## Release Notes
Please follow [release notes
guidelines](https://github.com/prestodb/presto/wiki/Release-Notes-Guidelines)
and fill in the release notes below.

```
== NO RELEASE NOTE ==
```
Upgrade joda-time to 2.13.1 to support new timezone data including America/Coyhaique.

Update tests to use dates where timezone gaps still exist in the updated tzdata:
- testLocallyUnrepresentableDateLiterals: 1970-01-01 -> 1932-04-01
- testLocallyUnrepresentableTimeLiterals: use 2017-04-02 or 2012-04-01

Cherry-picked from upstream: prestodb/presto@0a504d20e6
… runners

Key changes:
- Add comprehensive CI architecture documentation explaining:
  - Terminology (presto, prestocpp, prestissimo)
  - Unified builder image strategy for ephemeral runners
  - Job dependency graph
  - Comparison with upstream (prestodb/presto) CI
  - Performance benefits of pre-warmed ccache and Maven deps

- Consolidate presto-java8/java17 jobs into single matrix-based `presto` job
  - ARTIFACT_JAVA_VERSION controls which version uploads artifacts/images (default: '8')

- Add prestissimo image building to prestocpp workflow
  - Downloads artifacts from build job, packages into runtime image
  - Uses same tagging strategy as presto image (immutable + SNAPSHOT tags)

- Centralize IMAGE_VERSION_TYPE configuration (set to 'BETA')
  - Applied to both presto and prestissimo images

- Document artifacts (presto-server, presto-cli, presto-native-build)
  and Docker images (unified-builder, presto, prestissimo)
Disable the CLP integration test due to a timestamp filtering issue in
the CLP native reader where the query returns empty results when
filtering by timestamp.

A separate PR will be raised to fix this unit test.
Restructure header comments to explain key design decisions:
- Caching strategy: why bake caches into Docker image layers
- Image tagging: dual tags (immutable + SNAPSHOT) and version streams
- Builder image tag: auto-computed dependency hash

Also update comparison table with split image tag rows (builder vs runtime).
Fixes CI workflow failure caused by using `${{ env.* }}` in `with:`
blocks when calling reusable workflows (GitHub Actions limitation).

Changes:
- Add inline `config` job that computes builder-image and runtime tags
- Delete compute-builder-tag.yml (merged into ci.yml)
- Update presto-build.yml and prestocpp to use passed runtime tags
- Update create-builder-image.yml to only require builder-image input
@coderabbitai
Copy link

coderabbitai bot commented Nov 29, 2025

Caution

Review failed

The pull request is closed.

Walkthrough

This PR refactors the CI/CD pipeline into modular reusable workflows, introduces a unified YScope Presto builder Docker image, adds YAML-based metadata support to the CLP plugin with Top-N query pushdown, implements Pinot and Pinot Uber split providers, updates dependencies, and standardizes test configurations with UUID-based spill paths.

Changes

Cohort / File(s) Summary
GitHub Actions Workflow Refactoring
.github/workflows/ci.yml, .github/workflows/create-builder-image.yml, .github/workflows/presto-build.yml, .github/workflows/prestocpp-linux-build-and-unit-test.yml, .github/workflows/integration-tests.yml, .github/workflows/tests.yml
New modular workflow orchestration with centralized config job computing image tags, distributed builder image creation, and parallel jobs for Java/C++ builds, tests, and integration tests. Replaces monolithic CI approach with workflow_call-driven composition.
Removed Workflows
.github/workflows/maven-checks.yml, .github/workflows/prestissimo-worker-images-build.yml
Legacy workflow files removed as part of CI consolidation.
Workflow Updates
.github/workflows/docs.yml, .github/workflows/pr-title-checks.yaml, .github/workflows/prestocpp-format-and-header-check.yml
Concurrency conditions updated to exclude branches matching release-0.293-clp-connector-snapshot* prefix.
Builder Image & Infrastructure
.github/bin/download_nodejs, .github/dockerfiles/yscope-presto-builder.dockerfile, presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile
New unified builder Dockerfile with Maven cache, Node.js, CMake, ccache pre-warming, and artifact caching. Maven path unification via ${MAVEN_REPO} environment variable. Runtime Dockerfile simplified to use prebuilt artifacts from builder.
CLP Metadata & Configuration
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java
Added YAML metadata provider type and Pinot/Pinot Uber split provider support. Provider binding refactored via map-based lookups. Schema name listing delegated to metadata provider. New error code for unsupported YAML table schemas.
YAML Metadata Provider Implementation
presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java
New YAML-driven metadata provider with schema caching, nested column extraction, and graceful error handling. Default method added to interface for schema listing.
CLP Optimization & Pushdown
presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java
Top-N pushdown implementation with layout updates, nanosecond timestamp normalization, push-down variable tracking throughout expressions and filter conversions. New Top-N specification class with order direction and column lists.
CLP Split Providers
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java
MySQL provider enhanced with Top-N archive filtering. New Pinot provider with dynamic split discovery, archive metadata querying, and Top-N support. Uber-specific Pinot provider with custom table naming and query endpoints.
CLP Split Filters
presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java
Abstract method added for column name remapping. Pinot and Uber Pinot filter providers created. Uber provider implements TEXT_MATCH transformations for merged text index queries.
CLP Core Updates
presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java, presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java
ClpSplit adds hashCode/equals implementations. ClpTableLayoutHandle adds metadataQueryOnly boolean and optional TopN specification with updated constructors and equality methods. UdfRewriter explicit table scan handling. Minor formatting in optimizer provider.
CLP Tests - Core
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java
Import updates, test data setup modified to include archive message counts, database merge semantics updated.
CLP Tests - New
presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java, presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java
Comprehensive test coverage for YAML metadata discovery, Top-N query planning, Pinot provider integration, Uber-specific transformations, and filter remapping.
CLP Test Resources
presto-clp/src/test/resources/test-*.yaml, presto-clp/src/test/resources/test-pinot-split-filter.json, presto-clp/src/test/resources/test-topn-split-filter.json
New YAML schema files for CockroachDB, orders, and users tables across multiple schemas; JSON configuration files for split filter testing.
Dependencies
pom.xml, presto-clp/pom.xml, presto-native-execution/pom.xml
Joda-Time updated from 2.12.7 to 2.13.1. Jackson YAML and SnakeYAML dependencies added with version 2.1 upper bound.
Native Execution Tests
presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java, presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeClpGeneralQueries.java
Tests disabled with enabled=false annotation pending fixes.
Date/Time Tests
presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java, presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java, presto-tests/src/test/java/com/facebook/presto/tests/TestH2QueryRunner.java
Hard-coded date literals updated (1970-01-01 → 2012-04-01 or 1932-04-01) for timezone offset validation.
Spill Path Uniqueness
presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedQueriesIndexed.java, presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilledQueries.java, presto-tests/src/test/java/com/facebook/presto/tests/TestSpilled*.java, presto-thrift-connector/src/test/java/com/facebook/presto/connector/thrift/integration/TestThriftDistributedQueriesIndexed.java
UUID-based subdirectories appended to spill paths in test configurations to ensure isolation per run.
Submodule & Minor Updates
presto-native-execution/velox, presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java
Velox submodule commit reference updated. Minor test cleanup.

Sequence Diagram(s)

sequenceDiagram
    actor PR as Pull Request Trigger
    participant Config as config job
    participant BuilderImg as create-builder-image job
    participant PrestoBuild as presto-build job
    participant PrestoTest as tests job
    participant PrestoCppBuild as prestocpp-linux-build-and-unit-test job
    participant Integration as integration-tests job
    participant GHCR as GHCR Registry

    PR->>Config: workflow triggered
    Config->>Config: compute builder-image hash<br/>compute runtime-version-tag<br/>compute runtime-snapshot-tag
    Config-->>BuilderImg: outputs: builder-image, tags

    BuilderImg->>GHCR: check image exists
    alt image not found
        BuilderImg->>GHCR: build & push builder image
    end
    BuilderImg-->>PrestoBuild: builder-image ready
    BuilderImg-->>PrestoCppBuild: builder-image ready
    BuilderImg-->>PrestoTest: builder-image ready
    BuilderImg-->>Integration: builder-image ready

    par Java & C++ Builds
        PrestoBuild->>PrestoBuild: checkout, setup Java<br/>Maven build (multi-version)
        PrestoBuild->>PrestoBuild: conditionally upload artifacts
        PrestoBuild->>GHCR: build Presto Docker image
        
        PrestoCppBuild->>PrestoCppBuild: checkout, setup builder image<br/>cmake/ninja build<br/>upload prestocpp artifacts
        PrestoCppBuild->>GHCR: build prestissimo runtime image
    end

    par Tests & Integration
        PrestoTest->>PrestoTest: matrix test execution<br/>presto-tests across Java versions
        Integration->>Integration: download Presto artifacts<br/>run integration-e2e-java<br/>run integration-storage-java<br/>run integration-sidecar-java
    end

    PrestoBuild-->>PrestoTest: artifact outputs
    PrestoCppBuild-->>Integration: artifact outputs
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~50–75 minutes

Areas requiring extra attention:

  • CLP Top-N Pushdown Logic (ClpComputePushDown.java, ClpTopNSpec.java): Complex rewrite logic for plan transformation with multiple edge cases around ordering equality and column remapping that requires careful validation against test expectations.
  • Pinot Split Provider Implementation (ClpPinotSplitProvider.java, ClpUberPinotSplitProvider.java): HTTP query execution, archive grouping algorithm with overlapping logic, and Top-N archive selection merit detailed scrutiny.
  • YAML Metadata Provider (ClpYamlMetadataProvider.java): Schema caching with double-checked locking, nested column extraction recursion, and graceful fallback handling on parsing errors should be verified for thread safety and correctness.
  • CI/CD Workflow Orchestration (.github/workflows/*.yml): New modular approach with config job outputs feeding parallel jobs; verify tag propagation, concurrency controls, and conditional logic for image builds and artifact uploads.
  • Push-Down Variables Threading (ClpExpression.java, ClpFilterToKqlConverter.java, ClpSplitFilterProvider.java): Signature changes propagating variable sets through multiple layers; ensure all call sites properly maintain these sets and that null/empty handling is consistent.
  • Database Schema Updates (ClpMetadataDbSetUp.java): Archive table now includes num_messages column; verify all INSERT/MERGE statements and test fixtures correctly provide this field.

Possibly related PRs

Suggested reviewers

  • kirkrodrigues
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch ci-snapshot

📜 Recent review details

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 41aa302 and fefa99e.

📒 Files selected for processing (70)
  • .github/bin/download_nodejs (2 hunks)
  • .github/dockerfiles/yscope-presto-builder.dockerfile (1 hunks)
  • .github/workflows/ci.yml (1 hunks)
  • .github/workflows/create-builder-image.yml (1 hunks)
  • .github/workflows/docs.yml (1 hunks)
  • .github/workflows/integration-tests.yml (1 hunks)
  • .github/workflows/maven-checks.yml (0 hunks)
  • .github/workflows/pr-title-checks.yaml (1 hunks)
  • .github/workflows/prestissimo-worker-images-build.yml (0 hunks)
  • .github/workflows/presto-build.yml (1 hunks)
  • .github/workflows/prestocpp-format-and-header-check.yml (1 hunks)
  • .github/workflows/prestocpp-linux-build-and-unit-test.yml (2 hunks)
  • .github/workflows/tests.yml (2 hunks)
  • pom.xml (2 hunks)
  • presto-clp/pom.xml (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpConfig.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpErrorCode.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpMetadata.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpModule.java (3 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpSplit.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/ClpTableLayoutHandle.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/metadata/ClpYamlMetadataProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpComputePushDown.java (6 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpExpression.java (6 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpFilterToKqlConverter.java (18 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpPlanOptimizerProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpTopNSpec.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/optimization/ClpUdfRewriter.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpMySqlSplitProvider.java (5 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/ClpUberPinotSplitProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpMySqlSplitFilterProvider.java (2 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpSplitFilterProvider.java (1 hunks)
  • presto-clp/src/main/java/com/facebook/presto/plugin/clp/split/filter/ClpUberPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/ClpMetadataDbSetUp.java (5 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpFilterToKql.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpMultiSchema.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpTopN.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpUdfRewriter.java (2 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/TestClpYamlMetadata.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/metadata/TestClpYamlMetadataProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/optimization/TestClpFilterToKqlConverter.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/TestClpUberPinotSplitProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpSplitFilterConfigCommon.java (1 hunks)
  • presto-clp/src/test/java/com/facebook/presto/plugin/clp/split/filter/TestClpUberPinotSplitFilterProvider.java (1 hunks)
  • presto-clp/src/test/resources/test-cockroachdb-schema.yaml (1 hunks)
  • presto-clp/src/test/resources/test-orders-schema1.yaml (1 hunks)
  • presto-clp/src/test/resources/test-orders-schema2.yaml (1 hunks)
  • presto-clp/src/test/resources/test-pinot-split-filter.json (1 hunks)
  • presto-clp/src/test/resources/test-tables-schema.yaml (1 hunks)
  • presto-clp/src/test/resources/test-topn-split-filter.json (1 hunks)
  • presto-clp/src/test/resources/test-users-schema1.yaml (1 hunks)
  • presto-native-execution/pom.xml (1 hunks)
  • presto-native-execution/scripts/dockerfiles/prestissimo-runtime.dockerfile (1 hunks)
  • presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeAsyncDataCacheCleanupAPI.java (2 hunks)
  • presto-native-execution/src/test/java/com/facebook/presto/nativeworker/TestPrestoNativeClpGeneralQueries.java (1 hunks)
  • presto-native-execution/velox (1 hunks)
  • presto-native-tests/src/test/java/com/facebook/presto/nativetests/TestDistributedEngineOnlyQueries.java (1 hunks)
  • presto-tests/src/main/java/com/facebook/presto/tests/AbstractTestEngineOnlyQueries.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedQueriesIndexed.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestDistributedSpilledQueries.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestH2QueryRunner.java (1 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithHighMemoryRevokingThreshold.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithLargeBlockSpillingEnabled.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledAggregationWithPreprocessingEnabled.java (2 hunks)
  • presto-tests/src/test/java/com/facebook/presto/tests/TestSpilledTopNQueries.java (2 hunks)
  • presto-thrift-connector/src/test/java/com/facebook/presto/connector/thrift/integration/TestThriftDistributedQueriesIndexed.java (2 hunks)

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@jackluo923 jackluo923 closed this Nov 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants