Skip to content

feat(embedding): surface non-symmetric embedding config for VikingDB provider#1110

Open
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:osc/655-nonsymmetric-embedding-config
Open

feat(embedding): surface non-symmetric embedding config for VikingDB provider#1110
mvanhorn wants to merge 1 commit intovolcengine:mainfrom
mvanhorn:osc/655-nonsymmetric-embedding-config

Conversation

@mvanhorn
Copy link
Copy Markdown
Contributor

Problem Statement

Non-symmetric embedding uses different representations for queries vs documents, improving retrieval quality for models that support it. The OpenAI, Gemini, Jina, and Minimax embedders already support query_param/document_param in ov.conf. The VikingDB embedder accepts is_query but ignores it -- all calls use symmetric mode regardless of config.

Closes #655.

Changes

  • VikingDBDenseEmbedder: accept query_param/document_param, pass input_type to API data items
  • VikingDBHybridEmbedder: same treatment
  • VikingDBClientMixin._call_api(): accept optional input_type, add to request data items when set
  • Factory entries: wire query_param/document_param from config to VikingDB embedder constructors
  • Sparse embedder unchanged (sparse models are symmetric)

Config Example

[embedding.dense]
provider = "vikingdb"
model = "bge-m3"
query_param = "query"
document_param = "passage"

When configured, retrieval calls embed(text, is_query=True) which passes input_type=query in the API request. Indexing calls embed(text, is_query=False) which passes input_type=passage.

When not configured, behavior is unchanged (symmetric mode, no input_type in request).

Testing

4 unit tests:

  • _resolve_input_type returns None in symmetric mode
  • _resolve_input_type returns correct param for query vs document
  • Hybrid embedder resolves correctly
  • Backward compatibility without params
tests/unit/embedder/test_vikingdb_nonsymmetric.py ....  [100%]
4 passed

Implementation Notes

  • Follows the existing pattern in openai_embedders.py:213-216
  • _resolve_input_type() is a shared helper on Dense and Hybrid embedders
  • The HierarchicalRetriever already passes is_query=True for queries (line 132), so this works end-to-end once configured

Feature Area

Retrieval/Search

This contribution was developed with AI assistance (Claude Code).

…provider

VikingDB embedders accepted is_query but ignored it. Now
VikingDBDenseEmbedder and VikingDBHybridEmbedder accept
query_param/document_param and pass input_type to the API
when non-symmetric mode is configured.

- Add query_param/document_param to VikingDB Dense and Hybrid constructors
- Add _resolve_input_type() to select query vs document param
- Pass input_type in _call_api data items when set
- Wire factory entries to pass config params through
- Sparse embedder unchanged (sparse models are symmetric)

Closes volcengine#655

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@CLAassistant
Copy link
Copy Markdown

CLA assistant check
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.


Matt Van Horn seems not to be a GitHub user. You need a GitHub account to be able to sign the CLA. If you have already a GitHub account, please add the email address used for this commit to your account.
You have signed the CLA already but the status is still pending? Let us recheck it.

@github-actions
Copy link
Copy Markdown

Failed to generate code suggestions for PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Backlog

Development

Successfully merging this pull request may close these issues.

[Feature]: About non-symmetric embedding support

2 participants