Draft
Conversation
Port Chrome extension to standalone Nim binary using local LLM runtime. Includes Ollama registry integration (qwen3.5-0.8b/2b/4b, gemma4-e2b), 3-phase AI bookmark organizer, SQLite storage, and cligen CLI with 10 subcommands. Downloads use curl --progress-bar with -C - for resumable partial downloads.
Replace self-managed llama-server + GGUF model downloads with ollama as the LLM backend. Ollama handles model management (pull/list) and serves an OpenAI-compatible API, eliminating ~150 lines of download progress, tarball extraction, and SHA256 verification code. - runtime.nim: ollama serve/stop/health via shell + HTTP checks - model.nim: ollama pull/list API, remove nimcrypto/sha256/curl download - config.nim: default endpoint now 127.0.0.1:11434/v1, add modelName - client.nim: send actual model name instead of hardcoded 'local' - bootstrap.nim: simplified ensureReady (start ollama, pull model) - models.json: stripped digest/sizeBytes, just name/ollamaModel/ollamaTag - nimble: removed nimcrypto dependency (binary ~700KB, down from ~900KB)
…agement
- Fix {refStr} not interpolated in model pull messages
- Use string concat (&) instead of path join (/) for API URLs
- Remove spawn/poll/stop ollama lifecycle (treat as external service)
- Add requireRuntime with platform-specific start/install hints
- Remove unused pidFilePath/logFilePath/logsDir from config
- Increase retry backoff delay for slow model loading
- Add options.think=false to suppress thinking tags in qwen3.5 - Strip thinking tags, system-reminder tags before JSON extraction - Increase max_tokens to 2048 - Include raw response in error messages for debugging
- Log model name and message previews in verbose mode - Switch default from qwen3.5-0.8b to qwen3.5-2b for more reliable output
…straints and few-shot examples
Use the parameter for schema enforcement at the inference layer (llama.cpp grammar-based constrained decoding) instead of the OpenAI- compatible /v1/chat/completions endpoint's post-hoc response_format hint. This fixes the small model (qwen3.5:0.8b) returning non-JSON responses. - client.nim: native /api/chat with param for Ollama, fall back to OpenAI /v1/chat/completions when LLM_URL is set (runtimeManaged=false) - config.nim: default URL changed to native base (no /v1 suffix), strip trailing slashes in ollamaApiUrl(), remove unused readTomlInt - organizer.nim: use full schemas for all model sizes (constrained decoding eliminates the need for simplified small-model schemas)
Outputs bookmarks grouped by AI-assigned category, with unorganized bookmarks in an 'Unorganized' folder. Supports --output for file output and --category for filtering a single folder.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Standalone CLI
New features (CLI-only, non-LLM)
Duplicate detection (dedup)
Dead link detection (check-links)
Performance optimizations
Extension takeaways
Dedup and dead-link detection are pure HTTP/SQL — no LLM needed — and would port easily into the extension's service worker. The concurrency model could similarly speed up classification using Promise.allSettled with a sliding window.