Skip to content

feat: phrase memory foundation + semantic search workspace shell (#7)#205

Open
nikazzio wants to merge 20 commits into
mainfrom
feat/phrase-memory
Open

feat: phrase memory foundation + semantic search workspace shell (#7)#205
nikazzio wants to merge 20 commits into
mainfrom
feat/phrase-memory

Conversation

@nikazzio
Copy link
Copy Markdown
Owner

@nikazzio nikazzio commented Jun 2, 2026

Cosa introduce questa PR

Phrase Memory — sistema di memoria semantica per le traduzioni. L'app ricorda le frasi già tradotte, le indicizza con embedding vettoriali e le suggerisce automaticamente durante la traduzione per garantire coerenza terminologica tra sessioni e documenti.

La PR raccoglie 4 piani di sviluppo implementati su branch dedicati e integrati qui.


Piano 1 — Fondamenta DB + Workspace

  • Integrazione sqlite-vec via rusqlite per ricerca vettoriale locale
  • Schema DB: workspaces, phrase_memory, phrase_memory_presets, source_phrase_embeddings, historical_techniques, technique_tags
  • Workspace service CRUD + store Zustand + active workspace
  • 4 preset built-in (Moderno, Medievale IT, Latino, Legale)
  • Workspace come boundary reale di phrase memory e corpus semantico

Piano 2 — Embedding search + Shell gating

  • Servizi TS: embedding generation, phrase split (regex/LLM/none), ricerca semantica per coseno, save-on-lock
  • Comandi Tauri per embedding/search/save phrase memory
  • Store Zustand per match e job status
  • Shell gating: workspace home → editor solo con progetto reale aperto
  • Workspace default creato automaticamente all'init DB (no wizard al primo avvio)

Piano 3 — Tab Memoria + Injection nel prompt

  • Tab "Memoria" nell'InsightsDrawer: lista match per chunk, toggle abilitazione singolo match
  • ExtractTermDialog: suggerimento automatico del termine da aggiungere al glossario
  • Injection frase memoria nel prompt di traduzione (Map-based, race-safe)
  • Prelaunch warning se chunk con match disabilitati

Piano 4 — Preset Management UI + Pipeline Config

  • PhraseMemoryOverrides + 3 campi su PipelineConfig (usePhraseMemory, phraseMemoryPresetId, phraseMemoryOverrides)
  • 3 colonne aggiunte a pipelines via ALTER TABLE idempotente
  • updateCustomPreset + clonePreset su phraseMemoryPresetService
  • Componente PresetForm: crea/modifica preset custom (splitter, soglia, maxResults, minPhraseLength)
  • Componente PhraseMemoryPresetManager: lista preset built-in (sola lettura, clonabili) e custom (edit/delete)
  • Componente PhraseMemoryConfig: toggle + dropdown preset + sezione Avanzate collassabile con override per-pipeline
  • Tab "Phrase Memory" in SettingsModal per gestione globale preset del workspace
  • Sezione "Phrase Memory" in fondo alla tab Settings della pipeline

Note architetturali

  • I preset sono asset del workspace attivo, non globali cross-workspace
  • Gli override pipeline sovrascrivono il preset selezionato senza modificarlo
  • L'ordine dei blocchi nel system prompt (static → blob → stage-instructions) rimane invariato per preservare il prefix caching
  • Nessun nuovo store Zustand: la selezione preset viaggia dentro PipelineConfig (già in pipelineStore)

Test plan

  • npm test — tutti i test verdi
  • npm run typecheck — nessun errore
  • cargo clippy — zero warning
  • First run con DB vuoto: workspace default creato, editor accessibile
  • Tab "Memoria" nell'InsightsDrawer: match visibili, toggle abilitazione funzionante
  • Traduzione con phrase memory attiva: memory block iniettato nel prompt, prompt ripristinato dopo
  • ExtractTermDialog: termine suggerito correttamente
  • Settings → tab "Phrase Memory": lista preset built-in, clona, crea/modifica/elimina custom
  • Pipeline → tab Settings → sezione Phrase Memory: toggle, dropdown preset, sezione Avanzate con badge "modificato"
  • Salva pipeline con phrase memory attiva → ricarica → valori persistiti
  • Duplica pipeline → campi phrase memory copiati

Scope fuori da questa PR

  • UX multi-workspace: switcher, rename/delete, lista avanzata
  • Discovery / ingest / OCR / library-centric workflows
  • Phrase memory export/import nel backup workspace

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR lays the groundwork for the “phrase memory” feature by introducing a new SQLite schema (workspaces + presets + embedding-related tables), wiring a first-run Workspace Wizard in the React app, and adding a Rust-side sqlite-vec auto-extension registration with a vec_ping Tauri command.

Changes:

  • Adds Phrase Memory–related TypeScript domain types and services (workspaces + phrase memory presets), plus initial Zustand workspace store.
  • Extends DB initialization to create the new tables, add projects.workspace_id, seed built-in presets, and store active_workspace_id in app_settings.
  • Adds a first-run WorkspaceWizard gate in App.tsx and introduces Rust sqlite-vec integration + vec_ping command.

Reviewed changes

Copilot reviewed 15 out of 15 changed files in this pull request and generated 25 comments.

Show a summary per file
File Description
src/types.ts Adds Phrase Memory types (workspaces, presets, config enums).
src/stores/workspaceStore.ts New Zustand store for workspace list/active workspace + loading state.
src/services/workspaceService.ts New workspace CRUD + active-workspace setting helpers.
src/services/workspaceService.test.ts Unit tests for workspaceService.
src/services/phraseMemoryPresetService.ts Built-in preset seeding + preset CRUD/listing.
src/services/phraseMemoryPresetService.test.ts Unit tests for preset service.
src/services/dbService.ts Adds Phrase Memory schema creation, projects.workspace_id migration, active_workspace_id seeding, and preset seeding.
src/components/workspace/WorkspaceWizard.tsx First-run wizard UI to create a workspace and select embedding model.
src/App.tsx Adds workspace guard/wizard gating.
src-tauri/src/vector/mod.rs Registers sqlite-vec auto-extension and adds vec_ping command.
src-tauri/src/lib.rs Wires vector module init + exposes vec_ping command.
src-tauri/Cargo.toml Adds rusqlite (bundled) and sqlite-vec dependencies.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/stores/workspaceStore.ts
Comment thread src/services/workspaceService.ts Outdated
Comment thread src/services/workspaceService.ts
Comment thread src/services/workspaceService.ts
Comment thread src/services/workspaceService.ts
Comment thread src/types.ts Outdated
Comment thread src/types.ts Outdated
Comment thread src/services/dbService.ts
Comment thread src/services/dbService.ts
Comment thread src/App.tsx Outdated
nikazzio added a commit that referenced this pull request Jun 2, 2026
- vec_upsert_source_phrase: workspace_id → project_id (schema mismatch)
- vec_save_locked_phrases: add project_id, source_language, target_language;
  fix DELETE to use project_id; fix saved counter via rows-changed
- vec_search_phrase_memory: CTE elimina doppio vec_distance_cosine; propaga
  errori row invece di filter_map(ok)
- split_phrases_llm: aggiunge controllo HTTP status come get_embeddings
- phraseMemoryService: pre-filtra frasi per minPhraseLength prima dell'embed;
  aggiunge projectId, sourceLanguage, targetLanguage a SaveLockedPhrasesOptions
- DocumentView: passa projectId e lingue a saveLockedPhrases
- workspaceStore: auto-seleziona primo workspace se active_id non trovato
- App: gestisce promise rejection di loadWorkspaces
@nikazzio nikazzio changed the title feat: phrase memory — piano 1: DB foundation + workspace (#7) feat: phrase memory foundation + semantic search workspace shell (#7) Jun 2, 2026
nikazzio and others added 2 commits June 3, 2026 09:56
)

- phraseMemoryStore: nuovi tipi PhraseMemoryMatch/ChunkPhraseMatches,
  Map-based state, toggleMatchEnabled, setEnabledMatchIds, conversione
  distance→score
- uiStore: aggiunge 'memory' a ChunkDrawerTab
- usePhraseMemoryMatches: hook per match + selezione per chunk
- buildMemoryInjection: funzione pura per blocco stage-instructions
- checkAllChunksHaveEnabledMatches: check pre-lancio pipeline
- MemoryTab: lista match con checkbox, Applica (clipboard), Rielabora
- ExtractTermDialog: dialog LLM term suggestion + inserimento glossario
- InsightsDrawer: tab Memoria, badge match nell'IndexTab
- usePipeline: rerunChunkWithMemory (injection temporanea stage prompt),
  warning pre-lancio per match tutti disabilitati
- glossaryService: addGlossaryEntry wrapper
- llmService: extractTermFromPhrase stub (TODO: piano 4 Tauri command)
- i18n: chiavi memory.*, document.insightsTabMemory, glossary.*, common.optional

Co-authored-by: nikazzio <nikazzio@users.noreply.github.com>
…#228)

* fix(phrase-memory): address Piano 3 PR review feedback

- ExtractTermDialog: granular Zustand selectors instead of full config + fix generateId prefix 'gle'
- usePipeline: restore original prompts by stage ID after phrase memory injection (Map-based, race-safe)
- glossaryService: direct SQL INSERT/ON CONFLICT instead of upsertGlossaryEntries
- phraseMemoryInjection: JSON.stringify source/target phrases for correct escaping
- phraseMemoryStore: new Set(ids) instead of raw array in enabledMatchIds
- InsightsDrawer: remove unused matchesByChunk selector, use i18n key for match badge
- i18n: add memory.matchBadge key (en + it)

* feat(phrase-memory): Piano 4 — Preset Management UI + Pipeline Config

Types:
- Add PhraseMemoryOverrides to types.ts
- Extend PipelineConfig with usePhraseMemory, phraseMemoryPresetId, phraseMemoryOverrides

DB:
- ALTER TABLE pipelines: add use_phrase_memory, phrase_memory_preset_id, phrase_memory_overrides columns

Services:
- phraseMemoryPresetService: add updateCustomPreset + clonePreset
- pipelineService: DbPipeline + rowToPipelineConfig + savePipelineConfig + saveFullState + duplicatePipeline now persist phrase memory fields
- pipelineService.test: phrase memory persistence tests

Components:
- PresetForm: form crea/modifica preset custom (splitter, threshold, maxResults, minPhraseLength)
- PhraseMemoryPresetManager: lista preset built-in/custom con clone/edit/delete
- PhraseMemoryConfig: sezione pipeline con toggle, dropdown preset, avanzate collassabili

Integration:
- SettingsModal: tab "Phrase Memory" con PhraseMemoryPresetManager
- SettingsTabPanel: PhraseMemoryConfig in fondo alla tab Settings
- PipelineConfig: passa phrase memory props a SettingsTabPanel

* fix(phrase-memory): address PR #228 review comments

- PhraseMemoryConfig: sync phraseMemoryPresetId to presets[0] when toggle enabled with null presetId
- SettingsModal: move hardcoded strings to i18n (phraseMemoryTab, phraseMemoryPresetsTitle, phraseMemoryPresetsHint)
- en.json/it.json: add matchBadge_one/matchBadge_other plural keys
- en.json/it.json: add settings.phraseMemory* i18n keys

---------

Co-authored-by: nikazzio <nikazzio@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 58 out of 59 changed files in this pull request and generated 8 comments.

Comment thread src/services/backupService.ts Outdated
Comment thread src/i18n/en.json Outdated
Comment thread src/i18n/it.json
Comment thread src/components/pipeline/PhraseMemoryConfig.tsx
Comment thread src-tauri/src/vector/embedding.rs
Comment on lines +276 to +290
let rows = conn
.execute(
"INSERT OR IGNORE INTO phrase_memory \
(id, workspace_id, source_phrase, target_phrase, \
source_language, target_language, embedding, created_at) \
VALUES (lower(hex(randomblob(16))), ?1, ?2, ?3, ?4, ?5, ?6, datetime('now'))",
rusqlite::params![
workspace_id,
pair.source_phrase,
pair.target_phrase,
source_language,
target_language,
floats_to_blob(&pair.source_embedding)
],
)
Comment thread src/services/dbService.ts
Comment on lines +495 to +502
try {
await conn.execute(
`ALTER TABLE projects ADD COLUMN workspace_id TEXT REFERENCES workspaces(id)`
);
} catch (err) {
const msg = err instanceof Error ? err.message : String(err);
if (!msg.includes('duplicate column') && !msg.includes('already exists')) throw err;
}
Comment on lines +7 to +13
export function checkAllChunksHaveEnabledMatches(
matchesByChunk: Map<string, ChunkPhraseMatches>,
): string[] {
const blocked: string[] = [];
for (const [chunkId, data] of matchesByChunk) {
if (data.matches.length > 0 && data.enabledMatchIds.size === 0) {
blocked.push(chunkId);
nikazzio added 4 commits June 3, 2026 11:09
…pread, embedding prompt

- backupService: add use_phrase_memory/preset_id/overrides to pipelines whitelist
- en.json / it.json: remove duplicate phraseMemory* keys in settings section
- PhraseMemoryConfig: fix spread on nullable overrides (overrides ?? {})
- embedding.rs split_phrases_llm: align prompt with json_object response_format ({"phrases":[...]})
…ection, workspace scoping

Audit PR #205 fixes (issues 1-5):
- fix: runSingleChunk/executePipelineForChunk/runJudgeForChunk read config from
  store at invocation time — memory patch in rerunChunkWithMemory now reaches prompt
- fix: handleLockToggle guards use_phrase_memory flag; resolves splitter/minPhraseLength
  from active preset + overrides instead of hardcoded values
- fix: phrase_memory_presets workspace-scoped via ALTER TABLE migration;
  listPresets/createCustomPreset/deleteCustomPreset/updateCustomPreset/clonePreset
  all require workspaceId; PhraseMemoryPresetManager + PhraseMemoryConfig updated
- fix: split_phrases_llm parser simplified — removes unreachable as_array/sentences
  branches since response_format:json_object guarantees an object

Phrase memory flow wiring (issue 3):
- feat: searchPhraseMemoryBatch — one fetchEmbeddings API call for all chunks,
  then N local SQLite vec_search_phrase_memory queries; no N×API overhead
- feat: saveAllCompletedPhrases — bulk variant of saveLockedPhrases with progress cb
- feat: usePhraseMemoryAutoSearch — background hook triggered on project open and
  when use_phrase_memory toggled; populates matchesByChunk store automatically
- feat: useSaveToMemory — explicit bulk "save to memory" action with progress;
  filters translationLocked/completed chunks; resolves preset+overrides config
- feat: executePipelineForChunk accepts memoryBlock param; getChunkMemoryBlock reads
  enabled matches from store; runPipeline/runSingleChunk/runDryRun inject automatically
- feat: MemoryTab shows "Salva in memoria" button (cold start + footer); progress inline
- feat: handleLockToggle triggers runSearch after saveLockedPhrases for incremental refresh
- i18n: saveToMemoryButton/savedToMemory/saveToMemoryFailed keys (it + en)
…e-memory pipeline fixes

Dashboard redesign:
- Replace flat area cards with unified tab panel (Traduzioni/Biblioteca/Trascrizioni)
- Active tab shares background with content below — visual continuity
- 2px accent bottom indicator on active tab; disabled tabs at 40% opacity
- WorkspaceSettingsModal: new modal with 3 tabs (Generale, Phrase Memory, Backup)
  mirroring SettingsModal pattern (EditorialModalShell + AnimatePresence + useFocusTrap)
- SettingsModal: rename "Impostazioni" tab → "Traduzioni", add disabled Library/
  Transcriptions tabs, remove Backup section (moved to WorkspaceSettingsModal)
- Apply UI polish: text-wrap balance/pretty, tabular-nums on metrics, concentric
  border-radius (28→20→16px), transition-colors duration-150 throughout

Phrase Memory pipeline:
- Fix search pipeline lifecycle and workspace scoping
- Fix memory injection into prompt context
- Fix embedding search + workspace shell gating
- Add usePhraseMemoryAutoSearch and useSaveToMemory hooks with tests
- Add phraseMemoryService and workspaceService tests

i18n: add workspace.settings.{eyebrow,generalTab,memoryTab,backupTab} (en + it)
docs: update ARCHITECTURE.md with workspace/phrase-memory store and component map
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants