Skip to content

Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres#27745

Open
sonika-shah wants to merge 1 commit intomainfrom
fix-27158-tag-usage-postgres-index
Open

Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres#27745
sonika-shah wants to merge 1 commit intomainfrom
fix-27158-tag-usage-postgres-index

Conversation

@sonika-shah
Copy link
Copy Markdown
Collaborator

@sonika-shah sonika-shah commented Apr 26, 2026

Fixes #27158

Summary

getTagsInternalByPrefix parallel seq-scans tag_usage on Postgres,
causing RDS CPU spikes during ingestion (#27158).

Cause: 1.11.0 perf migration (#23054) added four partial indexes on
tag_usage filtered WHERE state = 1. #24063 dropped the matching
AND tu.state = 1 from the query (Suggested rows are valid for both
classification and glossary derivation), leaving every partial index
inapplicable. MySQL was unaffected because its 1.11.0 indexes were never
partial (no partial-index syntax in MySQL).

Fix

bootstrap/sql/migrations/native/1.12.8/postgres/schemaChanges.sql:

  1. Add non-partial single-col btree on targetfqnhash_lower (mirrors
    MySQL's idx_targetfqnhash_lower) — serves prefix-LIKE queries with
    no source predicate.
  2. Rebuild the four 1.11.0 partials as non-partial — same shape, same
    INCLUDE columns; only WHERE state = 1 removed so predicate changes
    can't silently invalidate them again. No current tag_usage query
    filters state = 1, so this is purely additive in coverage.

All DDL CONCURRENTLY and idempotent. New single-col index created
first so getTagsInternalByPrefix stays served during composite
rebuilds. No Java/query change. No MySQL change (1.12.8/mysql holds
placeholders only — MySQL's 1.11.0 indexes were already non-partial).

Verification

50k synthetic rows in local Postgres:

Plan Buffers
Before Seq Scan (Rows Removed by Filter: 49010) 2274
After Bitmap Index Scan on idx_tag_usage_targetfqnhash_lower_pattern 1024
Prepared / generic plan Index still picked; LIKE LOWER($1)~>=~/~<~ range 24 (index only)

Test plan

  • Reproduced seq scan on a representative dataset
  • Verified bitmap index scan after fix (inline + prepared statement)
  • Verified rebuilt composite still serves source-filtered queries
  • Verified migration idempotency

Summary by Gitar

  • Database optimization:
    • Enabled pg_trgm extension to support GIN indexing for partial string matching.
    • Added gin_tag_usage_targetfqn_trgm GIN index on targetFQNHash to improve search performance.

This will update automatically on new commits.

Copilot AI review requested due to automatic review settings April 26, 2026 19:56
@github-actions github-actions Bot added backend safe to test Add this label to run secure Github workflows on PRs labels Apr 26, 2026
@sonika-shah sonika-shah force-pushed the fix-27158-tag-usage-postgres-index branch from 6243f22 to 5e01f10 Compare April 26, 2026 19:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Restores efficient Postgres execution for tag_usage prefix-LIKE lookups by reintroducing a usable index for the current query shape and removing the brittle coupling between query predicates and partial index predicates.

Changes:

  • Add a non-partial btree index on tag_usage.targetfqnhash_lower using text_pattern_ops to serve prefix LIKE queries.
  • Rebuild the existing tag_usage partial indexes (previously WHERE state = 1) as non-partial indexes to avoid future predicate-coupling regressions.
  • Rebuild the existing gin_tag_usage_targetfqn_trgm index without the partial predicate.

Comment thread bootstrap/sql/migrations/native/2.0.1/postgres/schemaChanges.sql Outdated
The 1.11.0 perf migration (#23054) added four `WHERE state = 1` partial
indexes on tag_usage; #24063 dropped the matching `state = 1` predicate
from getTagsInternalByPrefix (Suggested-state rows are valid for both
classification and glossary derivation), leaving every partial index
inapplicable. Postgres fell back to a parallel seq scan; MySQL was
unaffected because its 1.11.0 indexes were never partial.

Adds a non-partial single-col btree on targetfqnhash_lower (mirrors
MySQL's idx_targetfqnhash_lower) and rebuilds the four partials as
non-partial -- same shape, same INCLUDE columns, predicate coupling
removed so future query changes can't silently invalidate them.

Verified end-to-end against a local Postgres with 50k rows: seq scan
reproduced before the fix (matches reporter's EXPLAIN), bitmap index
scan after, both for inline and prepared-statement paths.
@sonika-shah sonika-shah force-pushed the fix-27158-tag-usage-postgres-index branch from 9520cc4 to bc73c29 Compare April 26, 2026 20:08
Copilot AI review requested due to automatic review settings April 26, 2026 20:08
@sonika-shah sonika-shah changed the title Fixes #27158: restore tag_usage prefix-LIKE index on Postgres Fixes #27158: ingestion slowdown from tag_usage seq-scan on Postgres Apr 26, 2026
@gitar-bot
Copy link
Copy Markdown

gitar-bot Bot commented Apr 26, 2026

Code Review ✅ Approved

Restores the prefix-LIKE index on the tag_usage table to resolve performance regressions. No issues found.

Options

Display: compact → Showing less information.

Comment with these commands to change:

Compact
gitar display:verbose         

Was this helpful? React with 👍 / 👎 | Gitar

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

Comment on lines +1 to +4
-- Issue #27158: tag_usage seq-scan on Postgres. #24063 dropped the
-- `state = 1` predicate that 1.11.0's partial indexes required.
-- Fix: add a single-col index, and drop the `WHERE state = 1` filter
-- from the existing partials so query changes can't invalidate them.
Copy link

Copilot AI Apr 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR description references applying the fix under native/2.0.1/postgres/schemaChanges.sql, but the actual change is introduced in native/1.12.8/postgres/schemaChanges.sql. Please align the PR description (and/or title) with the versioned migration directory that is actually being modified to avoid confusion during verification/rollout.

Copilot uses AI. Check for mistakes.
@github-actions
Copy link
Copy Markdown
Contributor

🔴 Playwright Results — 1 failure(s), 18 flaky

✅ 3954 passed · ❌ 1 failed · 🟡 18 flaky · ⏭️ 86 skipped

Shard Passed Failed Flaky Skipped
🟡 Shard 1 296 0 3 4
🔴 Shard 2 753 1 5 8
🟡 Shard 3 729 0 3 7
🟡 Shard 4 756 0 3 18
🟡 Shard 5 686 0 1 41
🟡 Shard 6 734 0 3 8

Genuine Failures (failed on all attempts)

Features/DomainFilterQueryFilter.spec.ts › Domain filter should use exact match and prefix with dot to prevent false positives (shard 2)
�[31mTest timeout of 180000ms exceeded.�[39m
🟡 18 flaky test(s) (passed on retry)
  • Features/CustomizeDetailPage.spec.ts › Data Product - customization should work (shard 1, 1 retry)
  • Pages/AuditLogs.spec.ts › should apply both User and EntityType filters simultaneously (shard 1, 1 retry)
  • Pages/UserCreationWithPersona.spec.ts › Create user with persona and verify on profile (shard 1, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event is created when description is updated (shard 2, 1 retry)
  • Features/ActivityAPI.spec.ts › Activity event shows the actor who made the change (shard 2, 1 retry)
  • Features/DataQuality/DataQuality.spec.ts › Column test case (shard 2, 1 retry)
  • Features/DomainFilterQueryFilter.spec.ts › Domain filter should persist across page navigation (shard 2, 1 retry)
  • Features/ExploreQuickFilters.spec.ts › tier with assigned asset appears in dropdown, tier without asset does not (shard 2, 1 retry)
  • Features/RTL.spec.ts › Verify Following widget functionality (shard 3, 1 retry)
  • Features/Table.spec.ts › Tags term should be consistent for search (shard 3, 1 retry)
  • Features/UserProfileOnlineStatus.spec.ts › Should show "Active recently" for users active within last hour (shard 3, 1 retry)
  • Pages/DataContracts.spec.ts › Create Data Contract and validate for Database Schema (shard 4, 1 retry)
  • Pages/DataProducts.spec.ts › Create Data Product and Manage Assets (shard 4, 2 retries)
  • Pages/DataProducts.spec.ts › Search Data Products (shard 4, 2 retries)
  • Pages/EntityDataSteward.spec.ts › Tier Add, Update and Remove (shard 5, 1 retry)
  • Pages/Lineage/LineageFilters.spec.ts › Verify lineage schema filter selection (shard 6, 1 retry)
  • Pages/Lineage/LineageRightPanel.spec.ts › Verify custom properties tab IS visible for supported type: searchIndex (shard 6, 1 retry)
  • Pages/Users.spec.ts › Check permissions for Data Steward (shard 6, 1 retry)

📦 Download artifacts

How to debug locally
# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip    # view trace

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

backend safe to test Add this label to run secure Github workflows on PRs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Hive ingestion slowdown after upgrade to 1.12.3

2 participants