fix(profiler): N+1 / missing-index regression on /tables/.../columns?fields=profile (#3488)#27746
fix(profiler): N+1 / missing-index regression on /tables/.../columns?fields=profile (#3488)#27746sonika-shah wants to merge 1 commit intomainfrom
Conversation
…fields=profile (#3488) Root cause ---------- The 1.9.9 migration introduced two separate index regressions on `profiler_data_time_series`: 1. **PostgreSQL**: `schemaChanges.sql` explicitly dropped the unique constraint `profiler_data_time_series_unique_hash_extension_ts` (entityFQNHash, extension, operation, timestamp) to allow altering the generated `operation` column expression, but never recreated it. After the migration the table kept only the `(extension, timestamp)` index, which is useless for queries filtering by `entityFQNHash`. 2. **MySQL/both**: `postDataMigrationSQLScript.sql` created temporary indexes (idx_pdts_entityFQNHash, idx_pdts_composite, etc.) for its bulk UPDATE pass and then dropped **all** of them, including the only index covering `entityFQNHash`. The batch query issued by `getLatestExtensionsBatch()` when `fields=profile` is requested: SELECT entityFQNHash, MAX(timestamp) FROM profiler_data_time_series WHERE entityFQNHash IN (...N hashes...) AND extension = 'table.columnProfile' GROUP BY entityFQNHash required an `(entityFQNHash, extension, timestamp)` index. Without it the database performs a full table scan. On production deployments with millions of profiler rows this caused 100+ second response times (Grafana: 106 770 ms; 99 % in DB; 93 dbOps). Without `profile` in the fields param the same endpoint returned in ~150-220 ms. A secondary N+1 bug existed independently of the index: `customMetrics` in fields called `getCustomMetrics(table, column)` once per paginated column, issuing up to N identical queries against `entity_extension` and then filtering in Java. Fix --- * **migration 2.0.2** (MySQL + PostgreSQL): `CREATE INDEX IF NOT EXISTS idx_pdts_fqnhash_ext_ts ON profiler_data_time_series(entityFQNHash, extension, timestamp)`. The `IF NOT EXISTS` guard makes the migration safe to re-run and handles both upgrade and fresh-install paths. * **`getTableColumnsInternal`** — `customMetrics` block: fetch all column custom metrics for the table in one query, group by column name in Java, then distribute. Reduces N queries to 1. * **`getTableColumnsInternal`** — `profile` block: skip the duplicate `populateEntityFieldTags` call when `tags` was already fetched earlier in the same request, saving one prefix-scan on `tag_usage` per request. Related: PR #26855 (fixed N+1 tag queries on the list-tables path but left the profiler-index and customMetrics N+1 untouched on the columns sub-path).
Fix SummaryRoot cause confirmedThe 106-second latency is caused by a full table scan on PostgreSQL path (most severe) Both databases The net effect: SELECT entityFQNHash, MAX(timestamp) FROM profiler_data_time_series
WHERE entityFQNHash IN (...50 hashes...) AND extension = 'table.columnProfile'
GROUP BY entityFQNHashwith a full-table scan on every request that includes What this PR adds / changes
Expected timing improvement (from production symptom analysis)
Migration ID added
Generated by Claude Code |
| CREATE INDEX IF NOT EXISTS idx_pdts_fqnhash_ext_ts | ||
| ON profiler_data_time_series (entityFQNHash, extension, timestamp); |
There was a problem hiding this comment.
🚨 Bug: MySQL migration uses unsupported IF NOT EXISTS for CREATE INDEX
The MySQL migration uses CREATE INDEX IF NOT EXISTS, but this syntax is not supported in MySQL prior to 8.0.29. No other MySQL migration in the project uses this pattern — all 148 other CREATE INDEX statements in mysql/*.sql files use plain CREATE INDEX without IF NOT EXISTS. The project's docker-compose pins mysql:8.0, which today resolves to ≥8.0.29, but users running older 8.0.x versions (e.g., 8.0.28 or below) will get a syntax error and the migration will fail, blocking the upgrade.
PostgreSQL correctly supports CREATE INDEX IF NOT EXISTS in all supported versions, so the Postgres file is fine.
Suggested fix:
-- Use a conditional check via stored procedure,
-- or match existing project pattern with plain CREATE INDEX
-- wrapped in a procedure that tolerates 'duplicate key' error:
SET @index_exists = (SELECT COUNT(1) FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = DATABASE()
AND TABLE_NAME = 'profiler_data_time_series'
AND INDEX_NAME = 'idx_pdts_fqnhash_ext_ts');
SET @sql = IF(@index_exists = 0,
'CREATE INDEX idx_pdts_fqnhash_ext_ts ON profiler_data_time_series (entityFQNHash, extension, timestamp)',
'SELECT 1');
PREPARE stmt FROM @sql;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
| } | ||
| for (Column column : paginatedColumns) { | ||
| column.setCustomMetrics(getCustomMetrics(table, column.getName())); | ||
| column.setCustomMetrics(metricsByColumn.getOrDefault(column.getName(), new ArrayList<>())); |
There was a problem hiding this comment.
💡 Quality: Unnecessary ArrayList allocation per column without metrics
getOrDefault(column.getName(), new ArrayList<>()) allocates a new ArrayList for every column that has no custom metrics. Since the returned list is only set on the column and never mutated afterwards, an immutable empty list would avoid unnecessary allocations.
Suggested fix:
column.setCustomMetrics(
metricsByColumn.getOrDefault(column.getName(), List.of()));
Was this helpful? React with 👍 / 👎 | Reply gitar fix to apply this suggestion
Code Review 🚫 Blocked 0 resolved / 2 findingsAdds index and query optimizations for the profiler API but contains an unsupported MySQL migration syntax and redundant ArrayList allocations. 🚨 Bug: MySQL migration uses unsupported
|
| Compact |
|
Was this helpful? React with 👍 / 👎 | Gitar
There was a problem hiding this comment.
Pull request overview
Addresses a performance regression on the /tables/.../columns?fields=profile path by restoring an efficient composite index for profiler lookups and removing N+1 behavior when loading column customMetrics.
Changes:
- Adds a composite index on
profiler_data_time_series(entityFQNHash, extension, timestamp)for PostgreSQL and MySQL migrations. - Updates
TableRepository#getTableColumnsInternalto fetch all column custom-metrics in one query and avoid duplicate tag population whentagsis already requested. - Adds an integration regression test covering
fields=profileand combinations withtags/customMetrics/extension.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| openmetadata-service/src/main/java/org/openmetadata/service/jdbi3/TableRepository.java | Batches column custom-metrics retrieval and avoids duplicate tag population when profile is requested alongside tags. |
| openmetadata-integration-tests/src/test/java/org/openmetadata/it/tests/TableResourceIT.java | Adds regression coverage for columns API with fields=profile and related combinations. |
| bootstrap/sql/migrations/native/2.0.2/postgres/schemaChanges.sql | Adds composite index to prevent profiler full-table scans on PostgreSQL. |
| bootstrap/sql/migrations/native/2.0.2/mysql/schemaChanges.sql | Adds composite index intended to prevent profiler full-table scans on MySQL. |
| CREATE INDEX IF NOT EXISTS idx_pdts_fqnhash_ext_ts | ||
| ON profiler_data_time_series (entityFQNHash, extension, timestamp); |
There was a problem hiding this comment.
MySQL does not support CREATE INDEX IF NOT EXISTS syntax (unlike PostgreSQL). Existing MySQL migrations use a conditional information_schema.statistics check + prepared statement to keep index creation idempotent (e.g., bootstrap/sql/migrations/native/1.13.0/mysql/schemaChanges.sql:27-40). This statement will fail on MySQL during migration; please rewrite it using the established conditional pattern (or equivalent idempotent ALTER TABLE logic).
| CREATE INDEX IF NOT EXISTS idx_pdts_fqnhash_ext_ts | |
| ON profiler_data_time_series (entityFQNHash, extension, timestamp); | |
| SET @sql = ( | |
| SELECT IF( | |
| EXISTS( | |
| SELECT 1 | |
| FROM information_schema.statistics | |
| WHERE table_schema = DATABASE() | |
| AND table_name = 'profiler_data_time_series' | |
| AND index_name = 'idx_pdts_fqnhash_ext_ts' | |
| ), | |
| 'SELECT 1', | |
| 'CREATE INDEX idx_pdts_fqnhash_ext_ts ON profiler_data_time_series (entityFQNHash, extension, timestamp)' | |
| ) | |
| ); | |
| PREPARE statement FROM @sql; | |
| EXECUTE statement; | |
| DEALLOCATE PREPARE statement; |
| } | ||
| for (Column column : paginatedColumns) { | ||
| column.setCustomMetrics(getCustomMetrics(table, column.getName())); | ||
| column.setCustomMetrics(metricsByColumn.getOrDefault(column.getName(), new ArrayList<>())); |
There was a problem hiding this comment.
metricsByColumn.getOrDefault(column.getName(), new ArrayList<>()) allocates a new list for every column without metrics. Prefer Collections.emptyList() (or a shared constant empty list) to avoid unnecessary allocations and to signal the list is intentionally empty.
| column.setCustomMetrics(metricsByColumn.getOrDefault(column.getName(), new ArrayList<>())); | |
| column.setCustomMetrics( | |
| metricsByColumn.getOrDefault(column.getName(), Collections.emptyList())); |
| // Verify all four field combinations don't regress: | ||
| // (a) fields=profile — profile data returned, no full-table-scan on profiler_data_time_series | ||
| TableColumnList withProfile = | ||
| assertTimeout( | ||
| Duration.ofSeconds(30), | ||
| () -> client.tables().getColumns(table.getId(), "profile"), | ||
| "columns?fields=profile should complete within 30s"); |
There was a problem hiding this comment.
The test comment says it verifies “all four field combinations”, but this test exercises three calls: profile, tags,customMetrics,extension,profile, and tags,profile. Please correct the comment (or add the missing 4th combination) so the test documentation matches what’s actually asserted.
| // (c) fields=tags,profile — duplicate populateEntityFieldTags must not run twice | ||
| TableColumnList withTagsAndProfile = | ||
| assertTimeout( | ||
| Duration.ofSeconds(30), | ||
| () -> client.tables().getColumns(table.getId(), "tags,profile"), | ||
| "columns?fields=tags,profile should complete within 30s"); |
There was a problem hiding this comment.
The comment claims this call validates that duplicate populateEntityFieldTags “must not run twice”, but the assertions only check correctness of returned tags/profile (no assertion around number of calls/queries). Please reword the comment to describe what is actually verified, or add an assertion/metric that specifically detects the duplicate-tag population regression.
🟡 Playwright Results — all passed (14 flaky)✅ 3959 passed · ❌ 0 failed · 🟡 14 flaky · ⏭️ 86 skipped
🟡 14 flaky test(s) (passed on retry)
How to debug locally# Download playwright-test-results-<shard> artifact and unzip
npx playwright show-trace path/to/trace.zip # view trace |
Fixes collate#3488
Related to / extends #26855
Root Cause
The 1.9.9 migration introduced two separate index regressions on
profiler_data_time_seriesthat weren't addressed by #26855 (which fixed tag N+1 on the list-tables path but left the profiler index and customMetrics N+1 on the/tables/name/{fqn}/columnssub-path).1. PostgreSQL: unique constraint dropped, never recreated
bootstrap/sql/migrations/native/1.9.9/postgres/schemaChanges.sqlexplicitly droppedprofiler_data_time_series_unique_hash_extension_ts(entityFQNHash, extension, operation, timestamp) to allow changing the generatedoperationcolumn expression, but never recreated it. After the migration only the(extension, timestamp)index remained — useless for queries that filter byentityFQNHash.2. MySQL + PostgreSQL: all temporary indexes dropped at end of postDataMigration
bootstrap/sql/migrations/native/1.9.9/mysql(postgres)/postDataMigrationSQLScript.sqlcreatedidx_pdts_entityFQNHash,idx_pdts_composite, and related indexes for its bulk UPDATE pass, then dropped all of them at the end of the migration, including the only index coveringentityFQNHash.Impact
The batch query in
getLatestExtensionsBatch(), called wheneverfields=profileis requested on the columns endpoint:needs an
(entityFQNHash, extension, timestamp)index. Without it the database performs a full table scan. On production deployments with millions of profiler rows this caused 106 770 ms response time (99% DB; 93 dbOps). The same endpoint withoutprofilein fields returns in ~150-220 ms.Secondary:
customMetricsN+1getTableColumnsInternalcalledgetCustomMetrics(table, column.getName())once per paginated column (up to 50 calls), each fetching the same full set of custom metrics for the table and filtering in Java. This is N identical queries instead of 1.Secondary: duplicate
populateEntityFieldTagsWhen both
tagsandprofileare in the fields param,populateEntityFieldTagswas called twice — once in thetagsblock and again unconditionally inside theprofileblock.Changes
bootstrap/sql/migrations/native/2.0.2/mysql/schemaChanges.sqlCREATE INDEX IF NOT EXISTS idx_pdts_fqnhash_ext_ts ON profiler_data_time_series(entityFQNHash, extension, timestamp)bootstrap/sql/migrations/native/2.0.2/postgres/schemaChanges.sqlTableRepository.javacustomMetricsfetch: one query for all column metrics, grouped in JavaTableRepository.javapopulateEntityFieldTagscall whentagsalready fetchedTableResourceIT.javatest_getColumnsWithProfileField_correctnessAndNoBatchRegression— verifies profile data correctness forfields=profile,fields=tags,customMetrics,extension,profile, andfields=tags,profilewithin a 30s timeoutMigration Note
CREATE INDEX IF NOT EXISTSis idempotent — safe on all upgrade paths including fresh installs and systems that already ran 1.9.9.Test Plan
profile,tags,customMetrics,extension,profile,tags,profile)tagsandprofileare requestedmvn spotless:applypasses cleanIF NOT EXISTSguardGenerated by Claude Code