feat: native Elasticsearch vector search support#27111
feat: native Elasticsearch vector search support#27111joaopamaral wants to merge 18 commits intoopen-metadata:mainfrom
Conversation
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Initial results look good, but I've run a test only with ES 9.x and version 1.12.4 (not the one from main). I also need to double-check if OpenSearch is affected by this change. Also need to review some AI-resolved conflicts from version 1.12.4 with main. |
|
Thanks @joaopamaral this is great!!. Can you make it ready for review? and also address comments here #27111 (comment) |
|
Sure @harshach! I'll work on the bot review first before making it ready for review! 👍 |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
2 similar comments
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
5 similar comments
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi there 👋 Thanks for your contribution! The OpenMetadata team will review the PR shortly! Once it has been labeled as Let us know if you need any help! |
|
Hi @harshach, ’ve addressed the bot review, but I still need to re-review the code after rebasing/merging with main and rerun the tests against a real server. So far, I’ve tested this PR with version 1.12.4 and ES 9.3.1. I still need to validate that everything continues to work correctly with OpenSearch and ES 8.x. I won’t be able to run tests for the next couple of days, but feel free to proceed with any testing on your side in the meantime. |
There was a problem hiding this comment.
Pull request overview
This PR adds native Elasticsearch (8.x/9.x) vector search support to OpenMetadata, aiming to provide semantic/vector search capabilities on Elasticsearch deployments comparable to the existing OpenSearch implementation.
Changes:
- Added a new
ElasticSearchVectorServiceplus wiring inSearchRepository/ElasticSearchBulkSinkto initialize and use it when Elasticsearch is the configured backend. - Introduced ES-native vector index mapping templates (
vector_search_index_es_native.json) and extended query-building to emit Elasticsearch’s top-levelknnquery format. - Added/updated tests around the ES-native query format and Elasticsearch vector service behavior.
Reviewed changes
Copilot reviewed 20 out of 20 changed files in this pull request and generated 10 comments.
Show a summary per file
| File | Description |
|---|---|
| openmetadata-spec/src/main/resources/json/schema/search/searchRequest.json | Adds semanticSearch flag to the search request schema. |
| openmetadata-spec/src/main/resources/elasticsearch/en/vector_search_index_es_native.json | New ES-native vector index template (en). |
| openmetadata-spec/src/main/resources/elasticsearch/jp/vector_search_index_es_native.json | New ES-native vector index template (jp). |
| openmetadata-spec/src/main/resources/elasticsearch/ru/vector_search_index_es_native.json | New ES-native vector index template (ru). |
| openmetadata-spec/src/main/resources/elasticsearch/zh/vector_search_index_es_native.json | New ES-native vector index template (zh). |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilder.java | Adds buildNativeESQuery and refactors filter emission for vector search queries. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/vector/VectorSearchQueryBuilderTest.java | Adds coverage for ES-native top-level knn query structure and filter behavior. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/VectorIndexService.java | Extends vector service interface and adds an alias helper. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/OpenSearchVectorService.java | Adjusts to use the new interface default alias method and annotates overrides. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/vector/ElasticSearchVectorService.java | New Elasticsearch vector service implementation using Rest5Client for generic requests. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/vector/ElasticSearchVectorServiceTest.java | New tests for ES vector service result parsing, grouping, and dimension patching. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/SearchRepository.java | Initializes ES vector service when Elasticsearch backend is configured; mapping selection tweaks for ES-native template. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/RecreateWithEmbeddings.java | Attempts to include a vector “entity” key in recreate flow when vector search is enabled. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/SemanticSearchQueryBuilder.java | New builder for semantic/hybrid query composition on Elasticsearch. |
| openmetadata-service/src/main/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManager.java | Extracts mappings sub-object before calling putMapping. |
| openmetadata-service/src/test/java/org/openmetadata/service/search/elasticsearch/ElasticSearchIndexManagerTest.java | Adds a test asserting updateIndex handles full index JSON by extracting mappings. |
| openmetadata-service/src/main/java/org/openmetadata/service/resources/search/VectorSearchResource.java | Switches to repository-provided VectorIndexService and adds a fingerprint endpoint. |
| openmetadata-service/src/main/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSink.java | Adds async vector-embedding task execution + migration path for ES indexing jobs. |
| openmetadata-service/src/test/java/org/openmetadata/service/apps/bundles/searchIndex/ElasticSearchBulkSinkSimpleTest.java | Adds minimal coverage for vector-embedding helpers on the ES sink. |
| openmetadata-mcp/src/main/java/org/openmetadata/mcp/tools/SemanticSearchTool.java | Uses repository VectorIndexService rather than OpenSearch-only implementation. |
|
Also need to review all after this refactor #26000 😢 |
|
@joaopamaral thanks for your work on this, can you check the co-pilot comments and address the merge conflict here please |
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
|
The /vector/fingerprint diagnostic endpoint allowed any authenticated user to enumerate vector fingerprints for arbitrary entity UUIDs. Replace the subject-only extraction with authorizer.authorizeAdmin() to restrict access to admins. Add VectorSearchResourceTest covering: admin gate enforcement, found/not-found fingerprints, bad UUID, missing parentId, and service-unavailable when vector search is disabled. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
Elasticsearch returns 4xx/5xx as regular HTTP responses that the low-level client does not throw on. Previously the response body was returned as-is, causing downstream JSON parsing failures with no context about the real error. Now checks response.getStatusCode() and throws IOException with the status and body when >= 400, mirroring the same pattern in OpenSearchVectorService. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arch/vector/ElasticSearchVectorServiceTest.java Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
| @@ -601,11 +613,7 @@ private String getIndexMapping(IndexMapping indexMapping) { | |||
| } | |||
|
|
|||
| public String readIndexMapping(IndexMapping indexMapping) { | |||
| String mapping = getIndexMapping(indexMapping); | |||
| if (isVectorEmbeddingEnabled() && embeddingClient != null && mapping != null) { | |||
| mapping = reformatVectorIndexWithDimension(mapping, embeddingClient.getDimension()); | |||
| } | |||
| return mapping; | |||
| return getIndexMapping(indexMapping); | |||
| } | |||
There was a problem hiding this comment.
readIndexMapping() no longer enriches mappings when vector embeddings are enabled. Index creation/update now enriches via EsUtils.enrichIndexMappingForElasticsearch, but index template creation (createOrUpdateIndexTemplate(s)) still uses readIndexMapping() directly, so templates for embedding-capable indices may miss the injected dense_vector embedding field and _meta when embeddings are enabled. Consider enriching the mapping content for the template path as well (e.g., apply the same EsUtils.enrichIndexMappingForElasticsearch before calling putIndexTemplate).
… coverage The method was deleted in the inline-embedding refactor. Remove the test that invoked it via reflection (which would throw NoSuchMethodException). Replace with: - SearchRepositoryBehaviorTest: readIndexMappingReturnsMappingForKnownIndex verifies readIndexMapping still loads the file-based mapping correctly - EsUtilsTest: three tests for enrichIndexMappingForElasticsearch covering null/empty input, skip when fingerprint field absent, and dense_vector injection with _meta when fingerprint field present and vector enabled Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Code Review 👍 Approved with suggestions 5 resolved / 6 findingsAdds native Elasticsearch vector search support with comprehensive test coverage and fixes to pagination, initialization ordering, and interface safety. Consider adding a type guard for the extractRestClient cast to Rest5ClientTransport to prevent runtime errors. 💡 Edge Case: extractRestClient cast to Rest5ClientTransport has no guardAt line 61, Suggested fix✅ 5 resolved✅ Bug: Test calls build() with 4 args but method requires 6 — won't compile
✅ Edge Case: loadIndexMapping dimension replacement is brittle — exact string match
✅ Edge Case: init() assigns instance before registerVectorEmbeddingHandler completes
✅ Bug: ES search pagination is broken vs OpenSearch implementation
✅ Quality: Unsafe downcast defeats purpose of VectorIndexService interface
🤖 Prompt for agentsOptionsDisplay: compact → Showing less information. Comment with these commands to change:
Was this helpful? React with 👍 / 👎 | Gitar |
|
The Java checkstyle failed. Please run You can install the pre-commit hooks with |
|
|



Summary
Adds native Elasticsearch 8.x/9.x vector search support, mirroring the existing OpenSearch implementation. OpenMetadata deployments backed by Elasticsearch can now use the same semantic/vector search features as OpenSearch deployments.
Changes
ElasticSearchVectorService(new): ES implementation ofVectorIndexService, usingRest5Clientfor generic HTTP requests. MirrorsOpenSearchVectorServicestructure.vector_search_index_es_native.json(new, en/jp/ru/zh): ES-native index mappings usingdense_vector/dims/cosinesimilarity (ES 8.x/9.x format, as opposed to OpenSearch'sknn_vector/dimension/ HNSW).VectorSearchQueryBuilder.buildNativeESQuery(): emits the ES 8.x/9.x top-levelknnquery format (distinct from OpenSearch's nestedquery.knn). Reference: https://www.elastic.co/docs/solutions/search/vector/knnSemanticSearchQueryBuilderfor Elasticsearch package: mirrors the OpenSearch equivalent.ElasticSearchIndexManager.extractMappingsJson(): extracts themappingssub-object before callingputMapping— ES rejects full index JSON (withsettings/aliases) at the mappings API.reformatVectorIndexWithDimension(): handles both"dims"(ES native) and"dimension"(OpenSearch) keys so embedding dimension injection works for both backends.SearchRepository/ElasticSearchBulkSink: wired to initialize and useElasticSearchVectorServicewhen ES backend is configured.VectorSearchQueryBuilderTest,ElasticSearchIndexManagerTest, and newElasticSearchVectorServiceTest.Compatibility
OpenSearchBulkSink/OpenSearchVectorServiceuntouched.Test plan
mvn test -pl openmetadata-service -Dtest=VectorSearchQueryBuilderTest,ElasticSearchIndexManagerTest,ElasticSearchVectorServiceTestembeddingProviderinelasticSearchConfiguration, run Search Index app against an ES 8.x/9.x cluster, verify vector index is created and knn search returns resultsReferences
🤖 Generated with Claude Code
Summary by Gitar
EsUtils.enrichIndexMappingForElasticsearchcovering null input, missing fingerprints, and successful vector dimension injection.readIndexMappingReturnsMappingForKnownIndextoSearchRepositoryBehaviorTestto verify correct index mapping retrieval.This will update automatically on new commits.