fix: derive EMBEDDING_DIM from embeddings module (was hardcoded 384) by Whishp · Pull Request #200 · AxDSan/mnemosyne

Whishp · 2026-05-29T12:55:07Z

Problem

Three hardcoded references to EMBEDDING_DIM = 384 (bge-small-en-v1.5) silently break vector operations when using larger models like intfloat/multilingual-e5-large (1024-dim):

File	Line	What's hardcoded	Impact
`binary_vectors.py`	30	`EMBEDDING_DIM = 384`	`maximally_informative_binarization` truncates 1024→384, losing 62.5% info
`binary_vectors.py`	251-252	`48.0`, `count * 48`	Stats always assume 48 bytes per vector
`shmr.py`	36	`EMBEDDING_DIM = 384`	Self-harmonizing reasoning uses wrong dimension
`polyphonic_recall.py`	340	`sim = 1.0 - (raw_dist / 384.0)`	Bit-vector similarity score always normalised to 384

While embeddings.py and beam.py correctly resolve to the configured model's dimension (e.g. 1024 for e5-large), these three files had their own hardcoded constants.

Fix

Derive EMBEDDING_DIM from mnemosyne.core.embeddings at import time with a 384 fallback for standalone/test use:

binary_vectors.py — imports EMBEDDING_DIM from embeddings; BYTES_PER_VECTOR, compression_ratio, and theoretical_size_mb now use the resolved dimension instead of hardcoded constants
shmr.py — uses _embeddings.EMBEDDING_DIM instead of 384
polyphonic_recall.py — module-level cached _EMBEDDING_DIM_BITS for bit-vector normalisation (avoids per-row imports in hot path)
test_integration.py — tests use EMBEDDING_DIM/BYTES_PER_VECTOR instead of hardcoded 384/48

Verification

Unit tests pass with both models

Default (bge-small, 384-dim): 4 binary vector tests passed ✅
With multilingual-e5-large (1024-dim): 4 tests passed ✅
EmbeddingDimConfig (3 tests): all passed ✅

End-to-end on production DB (multilingual-e5-large)

Consolidated 38 working memories → episodic with full 1024-dim vectors
embeddings.EMBEDDING_DIM = 1024 ✅
vec_episodes DDL: int8[1024] matches model output ✅
All vectors stored at 1024 bytes (= int8[1024]) ✅
sqlite-vec semantic search returns correct results ✅

CI note

The CI failure (ModuleNotFoundError: No module named 'hermes_plugin' in test collection) is a pre-existing issue, not caused by this PR.

Files changed

CHANGELOG.md                        | 13 +++++++++++++
mnemosyne/core/binary_vectors.py    | 24 ++++++++++++++++--------
mnemosyne/core/polyphonic_recall.py | 13 ++++++++++---
mnemosyne/core/shmr.py              |  2 +-
tests/test_integration.py           |  8 ++++----
5 files changed, 44 insertions(+), 16 deletions(-)

binary_vectors.py hardcoded EMBEDDING_DIM=384 (bge-small-en-v1.5), which silently truncated larger embeddings to the first 384 components, losing up to 62.5% of vector information. Now imports from mnemosyne.core.embeddings with a 384 fallback. Also fixes BYTES_PER_VECTOR, compression_ratio, and theoretical_size_mb. Updated tests to use dynamic constants.

shmr.py had the same hardcoded EMBEDDING_DIM=384 as binary_vectors. polyphonic_recall.py hardcoded for bit-type vector normalization, silently breaking for non-384-dim models. Both now derive from embeddings.EMBEDDING_DIM at import time with a 384 fallback. polyphonic_recall uses a module-level cached float to avoid per-row imports in the hot path.

Missed one np.random.randn(384) on line 238 — now uses EMBEDDING_DIM like the other test methods already fixed.

AxDSan · 2026-05-30T16:30:51Z

Thank you @Whishp for this. Clean PR. Two files with hardcoded 384 dims is the kind of bug that lives for months because nobody runs anything but the default model. I appreciate that you tested against both 384 and 1024 models and verified sqlite-vec stores the correct vectors. Thats the right level of rigor.

I do have one contrarian take. This is the third time someone hardcoded 384 in a file. Nothing stops a fourth next month. The real fix is a centralized dimension getter that every module imports, with a linting rule that flags hardcoded numbers near embedding operations. Ill open a follow-up issue for that.

For now this ships. Low risk, high correctness, fixes silent data corruption for multilingual users. Thats a real problem, if you switch to e5-large and your vectors get truncated by 62%, you notice nothing until recall starts returning garbage.

Merging.

This release includes contributions from 6 contributors across 36 commits: Added: - sync_roles config for role-based autosave filtering (bitr8, #209) - sync_turn content limit env vars - Fact recall merged into standard recall path (MNEMOSYNE_FACT_RECALL_ENABLED) - Auto-default scope=global when extract=true - fact_recall() now searches consolidated_facts - MNEMOSYNE_EMBEDDING_API_URL independent of OPENROUTER_BASE_URL (mia-fourier, #206) Fixed: - remember() silently never stored embeddings - Hardcoded EMBEDDING_DIM=384 across 3 files + 1 test (Whishp, #200) - Plugin dir named 'mnemosyne' shadows pip package (#212) - Cross-session deletion of scope=global memories blocked (#204) - _vec_insert() ran inside deferred transaction (chinesewebman) - symlink rmtree crash on upgrade path - Windows directory junctions (no admin required) - Dead hermes_plugin tests breaking CI Changed: - Modular Hermes provider (2007-line split into 5 modules) - Consolidated extensions/ and hermes/ into integrations/ - Dropped Python 3.9 CI support Docs: benchmarks restored, plugin section revamped, standalone README, multilingual setup, UPDATING.md updated, install script link fixed (Joao Fernandes, #201)

Whishp added 3 commits May 29, 2026 12:53

fix: last hardcoded 384 in test_integration.py full pipeline

b332749

Missed one np.random.randn(384) on line 238 — now uses EMBEDDING_DIM like the other test methods already fixed.

AxDSan added bug Something isn't working core labels May 29, 2026

AxDSan self-assigned this May 29, 2026

AxDSan merged commit c61f9fd into AxDSan:main May 30, 2026
1 of 5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: derive EMBEDDING_DIM from embeddings module (was hardcoded 384)#200

fix: derive EMBEDDING_DIM from embeddings module (was hardcoded 384)#200
AxDSan merged 3 commits into
AxDSan:mainfrom
Whishp:fix/binary-vectors-embedding-dim

Whishp commented May 29, 2026 •

edited

Loading

Uh oh!

AxDSan commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

Whishp commented May 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Fix

Verification

Unit tests pass with both models

End-to-end on production DB (multilingual-e5-large)

CI note

Files changed

Uh oh!

AxDSan commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Whishp commented May 29, 2026 •

edited

Loading