Bug
Setting embedding_model: None on RuntimeConfig does not produce an embedding-free runtime:
let config = RuntimeConfig {
db_path: None,
embedding_model: None, // intent: no embedders, text-only recall
..RuntimeConfig::default()
};
RuntimeConfig::default() (crates/khive-runtime/src/config.rs, Default impl) separately seeds additional_embedding_models with ParaphraseMultilingualMiniLmL12V2 whenever KHIVE_ADDITIONAL_EMBEDDING_MODELS is unset. The runtime therefore still registers an embedder, and memory.remember / memory.recall lazily initialize it on first use.
On a machine with local model files this is invisible. On a machine without them (CI runners, fresh installs), the first memory.remember hard-fails:
runtime: embedding: model initialization failed: IO error: No such file or directory
The failure is hard, not degraded: create_note_inner's single-model vector path propagates the embed error with full compensation (the note row and FTS doc are rolled back), so the write is lost.
Why it matters
- The obvious spelling of "no embeddings" (
embedding_model: None) silently means "no default model, but one additional model". Every consumer that wants a deterministic no-embed runtime must know to also write additional_embedding_models: vec![].
- The trap is environment-dependent: tests written this way pass on developer machines and fail only on model-less runners. This surfaced in downstream integration tests that had been green for their entire life because the repository they lived in had no CI.
- The runtime's own test code already uses the two-field form (runtime.rs uses
embedding_model: None together with an explicitly cleared additional list), which shows the intent exists but is not encoded in the API.
Repro
On a machine without model files under the khive data dir (or with HOME pointed at an empty directory):
let config = RuntimeConfig { db_path: None, embedding_model: None, ..RuntimeConfig::default() };
let rt = KhiveRuntime::new(config).unwrap();
// dispatch memory.remember → Err(Runtime(Embedding(ModelInitialization("IO error: ..."))))
Adding additional_embedding_models: vec![] makes the same code pass, with recall degrading to the FTS text leg.
Suggested fix (either)
- Make
embedding_model: None authoritative: when the primary model is explicitly None, do not seed additional_embedding_models from the env default (an explicit env var or field value still wins).
- Or add a constructor/builder method (
RuntimeConfig::no_embed() or similar) that produces the genuinely embedding-free config, and document the two-field requirement on both fields.
Option 1 matches the principle of least surprise; option 2 is non-breaking.
Bug
Setting
embedding_model: NoneonRuntimeConfigdoes not produce an embedding-free runtime:RuntimeConfig::default()(crates/khive-runtime/src/config.rs,Defaultimpl) separately seedsadditional_embedding_modelswithParaphraseMultilingualMiniLmL12V2wheneverKHIVE_ADDITIONAL_EMBEDDING_MODELSis unset. The runtime therefore still registers an embedder, andmemory.remember/memory.recalllazily initialize it on first use.On a machine with local model files this is invisible. On a machine without them (CI runners, fresh installs), the first
memory.rememberhard-fails:The failure is hard, not degraded:
create_note_inner's single-model vector path propagates the embed error with full compensation (the note row and FTS doc are rolled back), so the write is lost.Why it matters
embedding_model: None) silently means "no default model, but one additional model". Every consumer that wants a deterministic no-embed runtime must know to also writeadditional_embedding_models: vec![].embedding_model: Nonetogether with an explicitly cleared additional list), which shows the intent exists but is not encoded in the API.Repro
On a machine without model files under the khive data dir (or with
HOMEpointed at an empty directory):Adding
additional_embedding_models: vec![]makes the same code pass, with recall degrading to the FTS text leg.Suggested fix (either)
embedding_model: Noneauthoritative: when the primary model is explicitlyNone, do not seedadditional_embedding_modelsfrom the env default (an explicit env var or field value still wins).RuntimeConfig::no_embed()or similar) that produces the genuinely embedding-free config, and document the two-field requirement on both fields.Option 1 matches the principle of least surprise; option 2 is non-breaking.