feat: native Elasticsearch vector search support by joaopamaral · Pull Request #1 · Automattic/OpenMetadata

joaopamaral · 2026-04-06T23:39:46Z

Summary

ElasticSearchVectorService: new ES implementation of VectorIndexService, mirroring OpenSearchVectorService but using Rest5Client for generic HTTP requests
ES native index mappings: vector_search_index_es_native.json for en/jp/ru/zh locales using dense_vector / dims / cosine (ES 8.x+ format)
Query builder: VectorSearchQueryBuilder.buildNativeESQuery() emits the ES 8.x top-level knn query format (distinct from OpenSearch's query.knn)
SemanticSearchQueryBuilder for Elasticsearch package (mirrors OpenSearch equivalent)
ElasticSearchIndexManager: extractMappingsJson() extracts the mappings sub-object before calling putMapping (ES rejects full index JSON there)
reformatVectorIndexWithDimension(): handles both "dims" (ES native) and "dimension" (OpenSearch) keys so dimension injection works for both backends
Wiring: SearchRepository and ElasticSearchBulkSink now don't initialize using OpenSearchVectorService
Tests: VectorSearchQueryBuilderTest, ElasticSearchIndexManagerTest, and new ElasticSearchVectorServiceTest

Test plan

Unit tests: mvn test -pl openmetadata-service -Dtest=VectorSearchQueryBuilderTest,ElasticSearchIndexManagerTest,ElasticSearchVectorServiceTest
Integration: configure embeddingProvider in elasticSearchConfiguration, run Search Index app against an ES 8.x cluster, verify vector index created and knn search returns results
Confirm OpenSearch path is unaffected (OpenSearchBulkSink / OpenSearchVectorService unchanged)

🤖 Generated with Claude Code

- Add ElasticSearchVectorService mirroring OpenSearchVectorService using Rest5Client - Add vector_search_index_es_native.json with dense_vector/dims/cosine mappings for en/jp/ru/zh locales - Add VectorSearchQueryBuilder.buildNativeESQuery() for ES 8.x/9.x top-level knn query format - Add SemanticSearchQueryBuilder for Elasticsearch (mirrors OpenSearch equivalent) - Fix ElasticSearchIndexManager.extractMappingsJson() to extract mappings sub-object for putMapping - Fix reformatVectorIndexWithDimension() to handle both "dims" (ES) and "dimension" (OpenSearch) keys - Wire ElasticSearchVectorService into SearchRepository and ElasticSearchBulkSink - Extend VectorSearchQueryBuilderTest and ElasticSearchIndexManagerTest with new coverage Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

github-actions · 2026-04-06T23:40:16Z

Hi there 👋 Thanks for your contribution!

The OpenMetadata team will review the PR shortly! Once it has been labeled as safe to test, the CI workflows
will start executing and we'll be able to make sure everything is working as expected.

Let us know if you need any help!

…y By Service view (open-metadata#27258) * fix(lineage): prevent pipeline annotation inheritance in service/domain/dataProduct lineage and add pipeline service edges Bug #1: Service nodes (e.g., DatabaseService, MessagingService) were incorrectly appearing in entity-level lineage views. Root cause: getOrCreateLineageDetails() in addServiceLineage(), addDomainLineage(), and addDataProductsLineage() was copying the pipeline annotation from entity-level LineageDetails to service/domain/dataProduct-level LineageDetails. This caused service entities to have upstreamLineage.pipeline.fqnHash set in their Elasticsearch documents, making them match the PIPELINE_AS_EDGE_KEY query during BFS traversal and incorrectly appear alongside actual data assets. Fix: add .withPipeline(null) on each service/domain/dataProduct LineageDetails object to strip the pipeline annotation before persisting. Bug open-metadata#2: "By Service" view was empty when viewing lineage for pipeline entities that were stored as edge annotators (Case B: table → topic with pipeline=flink_pipeline in LineageDetails) rather than as actual nodes (Case A). Root cause: addServiceLineage() only created database_service → kafka_service edges but no edges involving flink_pipeline_service. Fix: add addPipelineServiceEdges() called from addServiceLineage() that creates fromService → pipelineService and pipelineService → toService edges when a pipeline annotation exists in the entity-level lineage details. Also add unit tests covering both fixes to prevent regression. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): add migration to remove pipeline annotation from service/domain/dataProduct lineage edges The previous fix (e6df7a6) prevented new lineage from inheriting pipeline annotations on service/domain/dataProduct-level edges. However, existing data in the entity_relationship table already has pipeline set on those edges from before the fix, and Elasticsearch reindex reads from the DB — so reindex alone does not fix stale data. This migration removes the pipeline field from all service-to-service, domain-to-domain, and dataProduct-to-dataProduct lineage edges (relation=13/UPSTREAM) in entity_relationship. After upgrading and running this migration, operators should trigger an Elasticsearch/OpenSearch reindex so that the corrected DB records are reflected in the search index, which is what the lineage graph BFS traversal reads from. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): move pipeline annotation migration from 1.12.0 to 1.13.0 Moves the data migration that removes the pipeline field from service/domain/dataProduct lineage edges in entity_relationship to the 1.13.0 migration scripts, which is the correct target version. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): move pipeline annotation migration from 1.13.0 to new 1.12.6 Creates a new 1.12.6 migration with the data fix that removes the pipeline field from service/domain/dataProduct lineage edges in entity_relationship, and removes it from 1.13.0 where it was previously placed. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): add v1126 Java migration to create pipeline service edges for existing data For installations upgrading to 1.12.6 with existing lineage data, service edges fromService→pipelineService and pipelineService→toService were never created (only added by the code fix for new lineage going forward). This migration reads service-level lineage edges that have a pipeline annotation, resolves the pipeline entity's service, and inserts the two missing service edges into entity_relationship (DB only). After the SQL migration strips pipeline from service edges and a reindex runs, the "By Service" lineage view for pipeline services correctly shows their upstream/downstream service connections. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): fix v1126 migration to read entity-level edges for pipeline service creation The original migration read service-level edges (databaseService→messagingService) looking for pipeline annotations, but those had already been cleaned by the SQL migration before the Java migration could run in subsequent server restarts. Fix: read data-asset-level edges (table→topic etc.) which retain their pipeline annotation permanently. For each such edge, resolve fromEntity.service, toEntity.service, and pipeline.service, then create the two missing pipelineService edges in entity_relationship. Verified: after running the migration manually via direct SQL + OpenSearch update, the By Service view for lineage_test_flink_svc correctly shows 3 nodes with upstream (db_svc→flink_svc) and downstream (flink_svc→kafka_svc) edges. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): clean up pipeline service edges when entity lineage is deleted When entity-level lineage (table→topic) is deleted, cleanUpExtendedLineage only cleaned up fromService→toService (db_svc→kafka_svc) but left the new pipeline service edges (db_svc→flink_svc, flink_svc→kafka_svc) as orphans in both entity_relationship and OpenSearch. Fix: - Pass lineageDetails (which contains the pipeline reference) into cleanUpExtendedLineage from both deleteLineage and deleteLineageByFQN - Add cleanUpPipelineServiceEdges that mirrors addPipelineServiceEdges: uses getPipelineService(lineageDetails) to resolve the pipelineService, then calls processExtendedLineageCleanup for fromService→pipelineService and pipelineService→toService edges (decrement assetEdges or delete+remove from search if count reaches zero) - Also fix deleteLineageByFQN which was missing cleanUpExtendedLineage call entirely (pre-existing gap for service edge cleanup) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(lineage): add unit tests for pipeline annotation stripping and pipeline service edge creation - Add 4 new unit tests to LineageRepositoryTest covering: - Bug #1 (2 tests): service-level edges do not inherit pipeline annotation from entity lineage, both for new and existing edges - Bug open-metadata#2 (2 tests): addPipelineServiceEdges creates fromService→pipelineService and pipelineService→toService edges when pipeline annotator is present, and skips them when no pipeline is set - Fix MySQL migration: add metadataService to entity type list (was in Java migration's SERVICE_ENTITY_TYPES but missing from SQL) and replace JSON_EXTRACT IS NOT NULL with JSON_CONTAINS_PATH to correctly handle both present and explicit-null pipeline fields - Fix PostgreSQL migration: add metadataService to entity type list Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(lineage): add integration tests for pipeline-as-annotator lineage scenario Tests Bug #1 (service nodes absent from entity-level lineage) and Bug open-metadata#2 (pipeline service connected in service-level lineage) using a table → topic edge annotated with a pipeline entity reference. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * test(e2e): add Playwright tests for pipeline-as-annotator lineage scenario Tests Bug #1 (service nodes absent from entity-level lineage) and Bug open-metadata#2 (pipeline service appears in service-level lineage) using API interception and direct request assertions via page.request.get(). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply spotless formatting to LineageRepositoryTest Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * style: apply prettier formatting to LineagePipelineAnnotator spec Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(lineage): guard against null LineageDetails in getPipelineService When the json column in entity_relationship is NULL, JsonUtils.readValue returns null. getPipelineService now short-circuits on a null argument instead of throwing NullPointerException via entityLineageDetails.getPipeline(). Fixes NPE in deleteLineageByFQN and deleteLineage cleanup paths. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(e2e): use authenticated apiContext for service lineage assertions page.request.get() sends browser cookies but OpenMetadata authenticates via JWT in localStorage, so those calls were unauthenticated (non-2xx). Replace with getToken + getAuthContext pattern used elsewhere. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(migration): add driveService to 1.12.6 pipeline annotation cleanup Directory, File, Spreadsheet, and Worksheet entities map to driveService, so service-level lineage edges between driveService instances could also have incorrectly inherited the pipeline annotation. Include driveService in the 1.12.6 cleanup migration for both MySQL and PostgreSQL. Also drops the stray trailing-newline changes from the 1.12.0 migration files — those edits were unnecessary. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * new line remove * fix(migration): add DRIVE_SERVICE to v1126 SERVICE_ENTITY_TYPES set driveService-to-driveService edges must be skipped during the pipeline service edge migration scan, same as all other service-level edges. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> * fix(migration): resolve merge conflict in v1126 MigrationUtil The rebase left MigrationUtil with duplicate imports and a missing closing brace on insertEdgeIfMissing. Merged both method sets cleanly and ran spotless. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com> --------- Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>

…adata#27625) (open-metadata#27627) * fix(ingestion): release Engine resources on database switch (open-metadata#27625) * fix(ingestion): release engine resources on database switch Close fairies in _connection_map and clear _inspector_map before engine.dispose() in CommonDbSourceService.set_inspector/close. Dispose alone does not free Inspector.info_cache or release checked-out ConnectionFairies, leaving the old engine GC-pinned across DB switches and triggering _finalize_fairy RecursionError at interpreter shutdown. Eagerly fetch multi-DB name queries (MultiDBSource._execute_database_query and SnowflakeSource.get_database_names_raw) so the cursor closes before the caller invokes set_inspector, which disposes the engine the cursor was bound to. Also rebind scoped_session to the new engine so it doesn't keep the disposed one alive via sessionmaker.bind. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * py format * fix(ingestion): address PR review feedback from gitar-bot and Copilot - Set self.engine = None after dispose in _release_engine (gitar-bot): prevents close() from leaving a dangling disposed-engine reference that would produce a confusing pool error on accidental later access. - _FakeSource now has close() and is wrapped in a fixture that cleans up its checked-out connection (Copilot #1): avoids resource warnings and an interfering fairy across test teardown. - Rewrite test_generator_survives_engine_dispose_mid_iteration as test_generator_survives_connection_close_mid_iteration (Copilot open-metadata#2): Engine.dispose() does not close checked-out connections, so the old test did not reproduce what _release_engine actually does. The real regression is the explicit conn.close() on the fairy in _connection_map before dispose. The new test closes the connection mid-iteration, which is what fetchall() needs to survive. - Switch the query in _FakeSource.get_database_names_raw and the seeded INSERT assertions to the TEXT name column (Copilot open-metadata#3): _execute_database_query is typed Iterable[str]; testing on integer ids obscured the actual contract. - Update test_disposes_pool to assert surrogate.engine is None after release (follows from the new self.engine = None behavior) and verify the original pool's checkedout() is 0. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> * fix(ingestion): keep connection_obj in sync with engine across DB switches self.connection_obj is set once in __init__ to the initial engine and never updated. After set_inspector rebuilds self.engine, connection_obj still points at the disposed original engine — pinning its dialect and compiled_cache alive for the source's lifetime. Rebind connection_obj when creating the new engine in set_inspector, and clear it in _release_engine so close() leaves nothing dangling. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

joaopamaral had a problem deploying to test April 6, 2026 23:39 — with GitHub Actions Failure

joaopamaral mentioned this pull request Apr 10, 2026

feat: native Elasticsearch vector search support open-metadata/OpenMetadata#27111

Open

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: native Elasticsearch vector search support#1

feat: native Elasticsearch vector search support#1
joaopamaral wants to merge 1 commit intostablefrom
feat/es-elasticsearch-vector-search

joaopamaral commented Apr 6, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

joaopamaral commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

github-actions Bot commented Apr 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

joaopamaral commented Apr 6, 2026 •

edited

Loading