Skip to content

feat: browser automation, workspace memory, semantic search, doc editing#28

Open
GlamgarOnDiscord wants to merge 13 commits into
Paseru:mainfrom
GlamgarOnDiscord:feat/ai-tools-v1
Open

feat: browser automation, workspace memory, semantic search, doc editing#28
GlamgarOnDiscord wants to merge 13 commits into
Paseru:mainfrom
GlamgarOnDiscord:feat/ai-tools-v1

Conversation

@GlamgarOnDiscord

@GlamgarOnDiscord GlamgarOnDiscord commented Jun 19, 2026

Copy link
Copy Markdown

Summary

Four new AI agent capabilities, each independently useful:

1. CDP Browser Automation (feat(browser))

New sinew-browser crate + 18 agent tools backed by chromiumoxide (Chrome DevTools Protocol):

  • Navigation, screenshot, DOM inspection, click, type, eval, console/network capture, wait, scroll, select, hover, keys, back, find, PDF export, file upload, cookie management, iframe interaction
  • Isolated browser profile (%TEMP%/sinew-browser-profile) — no conflicts with running Chrome/Edge
  • GIF session recording
  • Selector hint system: when a click fails, lists all visible interactive elements with selectors
  • Stealth overrides to hide CDP automation fingerprint

2. Persistent Workspace Memory (feat(memory))

  • workspace_memory tool backed by .sinew/memory.md per workspace
  • Actions: read, append, update, clear
  • Auto-injected into the agent system prompt — knowledge persists across conversations

3. Semantic Code Search (feat(search))

New sinew-search crate + index_workspace / semantic_search tools:

  • tree-sitter AST chunking per top-level declaration (Rust, TS, JS, Python, Go)
  • Local ONNX embeddings via fastembed v4 (AllMiniLML6V2, 384 dims — no API key)
  • SQLite hybrid index: FTS5 BM25 + cosine similarity, RRF fusion (k=60)
  • Incremental indexing (SHA256 per file), respects .gitignore

4. Surgical Document Editing (feat(docs))

  • doc_read + doc_edit tools via a stateless Python sidecar (sinew-sidecar/sinew_docs.py)
  • DOCX: find/replace preserving run formatting, paragraph insert/delete
  • PDF: PyMuPDF redaction API — surgical text replacement without recreating the file
  • XLSX: cell-level string replacement (openpyxl)
  • PPTX: run-level text replacement (python-pptx)

5. UI improvements (feat(ui))

  • Browser tool cards: icon, action label, contextual meta, screenshot thumbnail in collapsed card
  • Composer: ArrowUp/Down navigates message history (draft restored on exit)
  • TodoStrip: spinner only shows when agent is actively streaming (paused square otherwise)

6. Build fix (fix(build))

  • prepare-sidecars.mjs: use Expand-Archive (PowerShell) instead of tar for .zip on Windows — GNU tar interprets C:\ as a network hostname

Commits

  1. feat(browser): CDP browser automation via chromiumoxide
  2. feat(memory): persistent workspace memory tool
  3. feat(search): semantic code search with local embeddings
  4. feat(docs): surgical document editing via Python sidecar
  5. feat(agent): wire all new tools into the agent pipeline
  6. feat(ui): browser tool cards, composer history, streaming task indicator
  7. fix(build): use PowerShell Expand-Archive for zip extraction on Windows

Test plan

  • Browser — Ask agent to open a URL, take a screenshot, click an element; verify thumbnail appears in collapsed ToolCard
  • Memoryworkspace_memory append then start new conversation; verify content appears in system prompt
  • Searchindex_workspace then semantic_search "authentication" on any code project
  • Docsdoc_read on a .docx; doc_edit find_replace verifies surgical edit (requires pip install python-docx PyMuPDF openpyxl python-pptx)
  • Composer history — Send 3 messages, press ArrowUp in empty composer; verify history navigates
  • Build — Fresh clone on Windows, npm run prepare-sidecars completes without tar errors

Notes

New sinew-browser crate exposing 18 agent tools:
browser_open, browser_screenshot, browser_dom, browser_click,
browser_type, browser_eval, browser_console, browser_network,
browser_wait, browser_scroll, browser_select, browser_hover,
browser_close, browser_record_start, browser_record_stop,
browser_resize, browser_back, browser_keys, browser_find,
browser_pdf, browser_upload, browser_cookies, browser_iframe.

- BrowserSession launches Chrome/Edge via CDP (chromiumoxide)
- Isolated profile in %TEMP%/sinew-browser-profile to avoid
  conflicts with running browser instances
- Selector hint system: lists visible interactive elements on failure
- GIF session recording (recording.rs)
- Stealth overrides to hide CDP automation fingerprint
New workspace_memory tool backed by .sinew/memory.md:
- Actions: read, append, update, clear
- File created automatically on first use
- Auto-injected into the system prompt at session start
  so the agent retains workspace knowledge across conversations
New sinew-search crate + semantic_search tool:
- tree-sitter AST chunking per top-level declaration
  (Rust, TypeScript, JavaScript, Python, Go; line fallback)
- fastembed v4 local ONNX embeddings (AllMiniLML6V2, 384 dims)
- SQLite index: FTS5 BM25 + cosine similarity, RRF fusion (k=60)
- Incremental indexing via SHA256 per file
- Respects .gitignore, skips target/ and node_modules/
- Two tools: index_workspace + semantic_search
New doc_read + doc_edit tools backed by a stateless Python sidecar
(sinew-sidecar/sinew_docs.py) communicating over stdin/stdout JSON:

Supported formats:
- DOCX: python-docx — find/replace preserving run[0] formatting,
  paragraph insert/delete via lxml
- PDF: PyMuPDF redaction API — surgical text replacement without
  recreating the file
- XLSX: openpyxl — cell-level string replacement
- PPTX: python-pptx — run-level text replacement

sidecar discovery: SINEW_DOCS_SCRIPT env var -> next to exe (prod)
-> sinew-sidecar/sinew_docs.py relative to CWD (dev)
- tool_names: constants for all new tools (BROWSER_*, WORKSPACE_MEMORY,
  INDEX_WORKSPACE, SEMANTIC_SEARCH, DOC_READ, DOC_EDIT)
- TurnContext: new Arc fields (browser, workspace_memory,
  semantic_search, doc_tool)
- tool_dispatch: dispatch branches for all new tools
- turn: descriptor registration + both run_tool call sites updated
- sinew-app/lib.rs: export new types
- store.rs: browser_enabled setting
- tool_run: err_with_images helper for browser failure screenshots
- Workspace crates: sinew-browser + sinew-search added to workspace
- Tauri schemas regenerated
ToolCard:
- BrowserGlyph icon for all browser_* tools
- Rich title with action label + contextual meta (URL, selector, text)
- Thumbnail preview of screenshot in collapsed card

ChatPane:
- ArrowUp/Down navigates through message history in composer
  (draft preserved, restored on ArrowDown past end)

TodoStrip:
- isStreaming prop gates the spinner on in_progress tasks
  (paused square when agent is not actively streaming)

App:
- Skip updater check in DEV mode (avoids boot delay during dev)
- tauri.conf.json: updater active=false for local builds
GNU tar on Windows interprets drive letters (C:\) as network hostnames,
causing zip extraction to fail with 'Cannot connect to C: resolve failed'.
Use PowerShell Expand-Archive for .zip files on win32.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant