diff --git a/README.md b/README.md index dc71a1f..635aee0 100644 --- a/README.md +++ b/README.md @@ -47,6 +47,8 @@ Ask in natural language โ€” mindmark remembers what you saved. | `mindmark validate` | Check indexed bookmark URLs for stale links (HTTP 4xx/5xx or unreachable) and report them | | `mindmark drop-index` | Delete the local SQLite index database (with confirmation unless `--yes`) | +Human output is concise and TTY-aware: color is enabled in real terminals, disabled automatically for pipes/CI, and can always be turned off with `--no-color`. + > ๐Ÿ”Œ **Works offline** after the first run. Embeddings run on-device via [fastembed](https://github.com/qdrant/fastembed) (ONNX Runtime, ~130 MB one-time model download). ### Supported Browsers @@ -155,12 +157,16 @@ mindmark sync --list-browsers Example output: -``` -Browser Profile Path -------- ------- ---- -Chrome Default ~/Library/.../Google/Chrome/Default/Bookmarks -Chrome Profile 3 ~/Library/.../Google/Chrome/Profile 3/Bookmarks -Edge Default ~/Library/.../Microsoft Edge/Default/Bookmarks +```text +Supported browsers + - Chrome + - Edge + - Brave + - Firefox + +Detected profiles + - Chrome (Default) โ†’ ~/Library/Application Support/Google/Chrome/Default/Bookmarks + - Edge (Default) โ†’ C:\Users\you\AppData\Local\Microsoft\Edge\User Data\Default\Bookmarks ``` @@ -238,7 +244,7 @@ mm open "docker setup" ### 4๏ธโƒฃ JSON output for scripting -Pipe results into **fzf**, **jq**, **Alfred**, **Raycast**, **PowerToys Run**, or any tool that accepts JSON: +Pipe results into **fzf**, **jq**, **Alfred**, **Raycast**, **PowerToys Run**, or any tool that accepts JSON. `find --json` returns the same result object shape as the CLI uses internally: ```bash # macOS / Linux @@ -248,10 +254,44 @@ mindmark find "istio service mesh" --json | jq '.[].url' mindmark find "istio service mesh" --json | ConvertFrom-Json | ForEach-Object { $_.url } ``` +```json +[ + { + "score": 0.842, + "title": "Istio / Service Mesh", + "url": "https://istio.io/latest/docs/", + "folder_path": "Work/Kubernetes", + "domain": "istio.io" + } +] +``` + +If you add `--excerpt`, results that have enriched page content also include `relevant_excerpt`. + --- ## ๐Ÿ“– Usage +### Output modes + +By default, mindmark prints professional human-readable output with status symbols, hints, and color when stdout is an interactive terminal: + +```text +โ†’ Reading bookmarks from Chrome (Default), Firefox (default-release) +โœ“ Collected 812 bookmarks from 2 profile(s) +โ†’ Syncing index at ~/.mindmark/index.db +โœ“ Sync complete: added=12, updated=3, removed=0, unchanged=797 +Hint: Run 'mindmark find "your query"' to search your bookmarks. +``` + +Use `--no-color` when you want plain text even in a TTY. `NO_COLOR=1` and `MINDMARK_NO_COLOR=1` are also respected. + +```bash +mindmark --no-color stats +``` + +Use `--json` for stable machine-readable output from `find`, `sync`, `stats`, `validate`, and `enrich`. + ### Syncing `mindmark sync` reads bookmarks directly from your browser data directories. It's **incremental** โ€” only new or changed bookmarks are re-embedded, making re-syncs near-instant. @@ -261,12 +301,50 @@ mindmark sync # sync all detected browsers mindmark sync --browser chrome # sync only Chrome mindmark sync --browser firefox # sync only Firefox mindmark sync --list-browsers # list detected browsers and profiles +mindmark sync --json # emit sync summary as JSON ``` When you add new bookmarks in your browser, just run `mindmark sync` again โ€” it will pick up only the changes. > ๐Ÿ’ก **Note:** If you change the embedding model with `--model`, all bookmarks will be re-embedded on the next sync. Browser names are case-insensitive (e.g., `--browser Chrome` and `--browser chrome` both work). +`sync --json` returns a top-level `summary`, synced `profiles`, any `warnings`, plus `db_path` and `model`. + +### Stats + +```bash +mindmark stats +mindmark stats --json +``` + +Example human output: + +```text +Bookmarks: 812 +Index: ~/.mindmark/index.db +Model: BAAI/bge-small-en-v1.5 + +Top domains + github.com: 42 + docs.python.org: 18 + +Top folders + Work/Kubernetes: 27 + Reading: 14 +``` + +`stats --json` returns: + +```json +{ + "db_path": "/home/you/.mindmark/index.db", + "model": "BAAI/bge-small-en-v1.5", + "top_domains": [{"count": 42, "domain": "github.com"}], + "top_folders": [{"count": 27, "folder": "Work/Kubernetes"}], + "total": 812 +} +``` + ### Filters and options Narrow down results without changing your query: @@ -293,7 +371,7 @@ Use `drop-index` to remove the local SQLite index database when you want a clean ```bash mindmark drop-index # asks for confirmation mindmark drop-index --yes # skip confirmation -mindmark drop-index --db /path/to/index.db +mindmark --db /path/to/index.db drop-index ``` ### Validate stale links @@ -304,9 +382,10 @@ Use `validate` to probe all indexed HTTP(S) bookmark URLs and identify stale one mindmark validate # identify all stale bookmarks mindmark validate --timeout 5 # per-request timeout in seconds (default 8) mindmark validate --workers 32 # parallel URL checks (default 16) +mindmark validate --json # emit validation summary as JSON ``` -Non-HTTP URLs (for example `file:` or browser-internal URLs) are skipped and not checked. +Non-HTTP URLs (for example `file:` or browser-internal URLs) are skipped and not checked. `validate --json` returns `total`, `checked`, `healthy`, `skipped`, `stale_count`, and a `stale` array with `title`, `url`, `folder_path`, `status_code`, `reason`, and `error`. ### Swap the embedding model @@ -367,6 +446,7 @@ Without enrichment, searching for **"authentication strategies"** on a bookmark ```bash mindmark enrich --limit 100 --workers 4 +mindmark enrich --limit 100 --workers 4 --json ``` Options: @@ -408,22 +488,28 @@ The `โคต` symbol indicates content from the enriched page. Without enrichment, t ### Status and monitoring -Check enrichment status: +Get a machine-readable enrichment run summary: ```bash -python -c " -from mindmark.index import Index -idx = Index() -print(idx.enrichment_stats()) -idx.close() -" +mindmark enrich --json ``` Example output: -```python -{'pending': 1234, 'complete': 450, 'failed': 23} +```json +{ + "before": {"pending": 1234, "complete": 450, "failed": 23}, + "after": {"pending": 1134, "complete": 550, "failed": 25}, + "complete": 100, + "failed": 2, + "reset_failed": 0, + "skipped": 0, + "status": "complete", + "total": 102 +} ``` +> `mindmark enrich --json` still performs enrichment when work is pending. To inspect counts without fetching pages, use the Python API (`Index().enrichment_stats()`). + ### Notes - **100% local** โ€” Page fetching happens on your machine; no cloud service is used. @@ -433,9 +519,11 @@ Example output: --- +## ๐Ÿ’พ Storage Layout + | What | macOS / Linux | Windows | Override | |---|---|---|---| -| Index database | `~/.mindmark/index.db` | `%LOCALAPPDATA%\mindmark\index.db` | `--db` flag or `MINDMARK_DB` env var | +| Index database | `~/.mindmark/index.db` | `%LOCALAPPDATA%\mindmark\index.db` | global `--db` flag (before the command) or `MINDMARK_DB` env var | | Home directory | `~/.mindmark/` | `%LOCALAPPDATA%\mindmark\` | `MINDMARK_HOME` env var | | Embedding model | `~/.cache/fastembed/` | `%LOCALAPPDATA%\fastembed\` | Managed by fastembed | diff --git a/pyproject.toml b/pyproject.toml index 4754f16..02c1a13 100644 --- a/pyproject.toml +++ b/pyproject.toml @@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta" [project] name = "mindmark" -version = "0.1.6" +version = "0.1.7" description = "Local semantic search over your browser bookmarks โ€” on-device embeddings, no cloud." readme = "README.md" requires-python = ">=3.9" diff --git a/src/mindmark/__init__.py b/src/mindmark/__init__.py index 65e9c2c..6778f31 100644 --- a/src/mindmark/__init__.py +++ b/src/mindmark/__init__.py @@ -1,2 +1,2 @@ """mindmark โ€” local semantic search over your browser bookmarks.""" -__version__ = "0.1.0" +__version__ = "0.1.6" diff --git a/src/mindmark/_console.py b/src/mindmark/_console.py new file mode 100644 index 0000000..80303f8 --- /dev/null +++ b/src/mindmark/_console.py @@ -0,0 +1,72 @@ +"""Small TTY-aware console formatting helpers.""" +from __future__ import annotations + +import os +import sys +from typing import TextIO + +_TRUTHY = {"1", "true", "yes", "on"} +_COLORS = { + "muted": "2", + "status": "36", + "success": "32", + "warning": "33", + "error": "31", + "accent": "35", + "bold": "1", +} + + +def _env_disables_color() -> bool: + return ( + "NO_COLOR" in os.environ + or os.environ.get("MINDMARK_NO_COLOR", "").lower() in _TRUTHY + or os.environ.get("TERM") == "dumb" + ) + + +class Console: + def __init__( + self, + *, + color: bool | None = None, + stdout: TextIO | None = None, + stderr: TextIO | None = None, + ) -> None: + self.stdout = stdout or sys.stdout + self.stderr = stderr or sys.stderr + if color is None: + self.color = self.stdout.isatty() and not _env_disables_color() + else: + self.color = color and not _env_disables_color() + + def style(self, text: str, name: str) -> str: + code = _COLORS.get(name) + if not self.color or not code: + return text + return f"\033[{code}m{text}\033[0m" + + def out(self, message: str = "", *, style: str | None = None) -> None: + print(self.style(message, style) if style else message, file=self.stdout) + + def err(self, message: str = "", *, style: str | None = None) -> None: + print(self.style(message, style) if style else message, file=self.stderr) + + def status(self, message: str) -> None: + self.out(f"{self.style('โ†’', 'status')} {message}") + + def success(self, message: str) -> None: + self.out(f"{self.style('โœ“', 'success')} {message}") + + def warning(self, message: str) -> None: + self.err(f"{self.style('!', 'warning')} {message}") + + def error(self, message: str) -> None: + self.err(f"{self.style('โœ–', 'error')} {message}") + + def hint(self, message: str, *, stderr: bool = False) -> None: + line = f"{self.style('Hint:', 'accent')} {message}" + if stderr: + self.err(line) + else: + self.out(line) diff --git a/src/mindmark/browsers/__init__.py b/src/mindmark/browsers/__init__.py index af265a8..5795292 100644 --- a/src/mindmark/browsers/__init__.py +++ b/src/mindmark/browsers/__init__.py @@ -6,13 +6,10 @@ """ from __future__ import annotations -from dataclasses import dataclass, field -from pathlib import Path import json import sqlite3 from ..parser import Bookmark -from ..index import SyncResult from .paths import detect_browsers, BrowserProfile, SUPPORTED_BROWSERS @@ -37,8 +34,9 @@ def collect_all_bookmarks( """ profiles = detect_browsers() if browser_filter: - filt = browser_filter.lower() - profiles = [p for p in profiles if p.browser_name.lower() == filt] + filt = browser_filter.strip().lower() + if filt: + profiles = [p for p in profiles if p.browser_name.lower() == filt] results: list[tuple[BrowserProfile, list[Bookmark]]] = [] for profile in profiles: diff --git a/src/mindmark/cli.py b/src/mindmark/cli.py index de42bb0..b3b48c6 100644 --- a/src/mindmark/cli.py +++ b/src/mindmark/cli.py @@ -2,10 +2,10 @@ import argparse import concurrent.futures +import json import os import shutil import sqlite3 -import sys import webbrowser from pathlib import Path from urllib.error import HTTPError, URLError @@ -13,8 +13,26 @@ from urllib.request import Request, urlopen from . import __version__ -from .parser import parse_file -from .index import Index, SyncResult, default_db_path, DEFAULT_MODEL +from ._console import Console +from .defaults import DEFAULT_MODEL, default_db_path + +_SUPPORTED_BROWSER_NAMES = { + "chrome": "Chrome", + "edge": "Edge", + "brave": "Brave", + "firefox": "Firefox", +} + + +def _console(args: argparse.Namespace) -> Console: + existing = getattr(args, "console", None) + if isinstance(existing, Console): + return existing + return Console(color=False if getattr(args, "no_color", False) else None) + + +def _print_json(console: Console, payload: object, *, preserve_order: bool = False) -> None: + console.out(json.dumps(payload, indent=2, sort_keys=not preserve_order)) def _is_http_url(url: str) -> bool: @@ -33,12 +51,10 @@ def _check_url_status(url: str, timeout: float) -> tuple[str, int | None, str | with urlopen(req, timeout=timeout) as resp: return url, int(getattr(resp, "status", 0) or 0), None except HTTPError as e: - # HTTP errors still include a useful status code. return url, int(e.code), str(e.reason) if e.reason else "HTTP error" except Exception: pass - # Fallback to GET for servers that reject HEAD. try: req = Request(url, headers=headers, method="GET") with urlopen(req, timeout=timeout) as resp: @@ -51,19 +67,35 @@ def _check_url_status(url: str, timeout: float) -> tuple[str, int | None, str | return url, None, str(e) -def _cmd_validate(args): +def _cmd_validate(args: argparse.Namespace) -> int: + from .index import Index + + console = _console(args) idx = Index(db_path=args.db) try: bookmarks = idx.all_bookmarks() + total = len(bookmarks) if not bookmarks: - print("index is empty โ€” run 'mindmark sync' first.") + payload = { + "checked": 0, + "healthy": 0, + "message": "Index is empty. Run 'mindmark sync' to import bookmarks.", + "skipped": 0, + "stale": [], + "stale_count": 0, + "total": 0, + } + if getattr(args, "json", False): + _print_json(console, payload) + else: + console.error(payload["message"]) return 1 - total = len(bookmarks) - print(f"validating {total} indexed bookmarks...") + if not getattr(args, "json", False): + console.status(f"Validating {total} indexed bookmarks") url_to_bm = {b["url"]: b for b in bookmarks} - stale = [] + stale: list[tuple[dict, int | None, str | None]] = [] skipped = 0 with concurrent.futures.ThreadPoolExecutor(max_workers=args.workers) as ex: @@ -81,47 +113,71 @@ def _cmd_validate(args): checked = total - skipped healthy = checked - len(stale) + stale_items = [] + for bm, code, error in stale: + reason = f"HTTP {code}" if code is not None else (error or "unreachable") + stale_items.append( + { + "error": error, + "folder_path": bm["folder_path"], + "reason": reason, + "status_code": code, + "title": bm["title"], + "url": bm["url"], + } + ) + + payload = { + "checked": checked, + "healthy": healthy, + "skipped": skipped, + "stale": stale_items, + "stale_count": len(stale_items), + "total": total, + } + if getattr(args, "json", False): + _print_json(console, payload) + return 0 - print( - f"checked={checked} healthy={healthy} stale={len(stale)} skipped={skipped}" + summary = ( + f"Checked {checked} bookmarks: healthy={healthy}, " + f"stale={len(stale)}, skipped={skipped}" ) - if not stale: - print("all checked bookmarks look valid.") + console.success(summary) return 0 - print("\nstale bookmarks found:") - for i, (bm, code, error) in enumerate(stale, 1): - reason = f"HTTP {code}" if code is not None else (error or "unreachable") - folder = bm["folder_path"] or "(root)" - print(f"\n{i}. {bm['title']}") - print(f" status: {reason}") - print(f" url: {bm['url']}") - print(f" path: {folder}") - + console.warning(summary) + console.out() + console.out(console.style("Stale bookmarks", "bold")) + for i, item in enumerate(stale_items, 1): + folder = item["folder_path"] or "(root)" + console.out(f"{i:2d}. {item['title']}") + console.out(f" status: {item['reason']}") + console.out(f" url: {item['url']}") + console.out(f" folder: {folder}") + console.hint("Review or remove stale bookmarks in your browser, then run 'mindmark sync'.") return 0 - except KeyboardInterrupt: - print("\n\nCancelled by user.") - return 1 finally: idx.close() -def _cmd_drop_index(args): +def _cmd_drop_index(args: argparse.Namespace) -> int: + console = _console(args) db_path = Path(args.db).expanduser() if args.db else default_db_path() if not db_path.exists(): - print(f"index not found: {db_path}") + console.success(f"Index not found: {db_path}") return 0 if not args.yes: try: ans = input(f"drop local index at '{db_path}'? [y/N] ").strip().lower() if ans != "y": - print("cancelled.") + console.warning("Cancelled.") return 0 except (EOFError, OSError): - print("cancelled.") + console.warning("Cancelled.") return 0 try: @@ -130,21 +186,19 @@ def _cmd_drop_index(args): elif db_path.is_dir(): shutil.rmtree(db_path) else: - print(f"index path is not a file or directory: {db_path}") + console.error(f"Index path is not a file or directory: {db_path}") return 1 except PermissionError as e: - # Windows can keep SQLite files locked by another process handle. - # If deletion fails, try clearing index data in-place as a fallback. if db_path.is_file() and _clear_index_contents(db_path): - print(f"index file is in use; cleared index contents instead: {db_path}") + console.warning(f"Index file is in use; cleared contents instead: {db_path}") return 0 - print(f"error: failed to remove index: {e}", file=sys.stderr) + console.error(f"Failed to remove index: {e}") return 1 except OSError as e: - print(f"error: failed to remove index: {e}", file=sys.stderr) + console.error(f"Failed to remove index: {e}") return 1 - print(f"dropped local index: {db_path}") + console.success(f"Dropped local index: {db_path}") return 0 @@ -164,168 +218,403 @@ def _clear_index_contents(db_path: Path) -> bool: return False -def _cmd_index(args): +def _cmd_index(args: argparse.Namespace) -> int: + from .index import Index + from .parser import parse_file + + console = _console(args) path = Path(args.path).expanduser() if not path.is_file(): - print(f"error: file not found: {path}", file=sys.stderr) + console.error(f"File not found: {path}") return 2 - print(f"[1/3] parsing {path}") + + console.status(f"Parsing bookmarks from {path}") bookmarks = parse_file(str(path)) - print(f" parsed {len(bookmarks)} unique bookmarks") - print(f"[2/3] loading embedding model ({args.model})") + console.success(f"Parsed {len(bookmarks)} unique bookmarks") + console.status(f"Loading embedding model: {args.model}") idx = Index(db_path=args.db, model_name=args.model) - print(f"[3/3] embedding + writing index to {idx.db_path}") - info = idx.rebuild(bookmarks, batch_size=args.batch_size) - print(f"done. indexed={info['indexed']} dim={info.get('dim','?')} model={info['model']}") + try: + console.status(f"Writing index to {idx.db_path}") + info = idx.rebuild(bookmarks, batch_size=args.batch_size) + finally: + idx.close() + console.success( + f"Indexed {info['indexed']} bookmarks " + f"(dim={info.get('dim', '?')}, model={info['model']})" + ) return 0 -def _auto_sync_hint(idx: Index) -> None: - """Print a hint when the index is empty.""" - if not idx.is_empty(): - return - print("index is empty โ€” run 'mindmark sync' to import bookmarks from your browsers,") - print("or run 'mindmark index ' to import from an exported file.") - print() +def _format_score(score: object) -> str: + try: + return f"{float(score):.3f}" + except (TypeError, ValueError): + return "n/a" + +def _cmd_find(args: argparse.Namespace) -> int: + from .index import Index -def _cmd_find(args): + console = _console(args) idx = Index(db_path=args.db) - if not getattr(args, 'json', False): - _auto_sync_hint(idx) - include_excerpt = getattr(args, 'excerpt', False) - results = idx.search( - query=args.query, k=args.top, - domain=args.domain, folder=args.folder, - include_excerpt=include_excerpt, - ) + try: + include_excerpt = getattr(args, "excerpt", False) + results = idx.search( + query=args.query, + k=args.top, + domain=args.domain, + folder=args.folder, + include_excerpt=include_excerpt, + ) + finally: + idx.close() + if not results: - print("no results (is the index empty? run: mindmark sync)") + if getattr(args, "json", False): + _print_json(console, [], preserve_order=True) + else: + console.out("No matching bookmarks. Run 'mindmark sync' to import bookmarks or broaden your query.") return 1 if args.open is not None: n = args.open - 1 if not 0 <= n < len(results): - print(f"error: --open {args.open} out of range (1..{len(results)})", file=sys.stderr) + console.error(f"--open {args.open} is out of range (1..{len(results)})") return 2 webbrowser.open(results[n]["url"]) - print(f"opened: {results[n]['title']}") + console.success(f"Opened {args.open}. {results[n]['title']}") + console.out(results[n]["url"]) return 0 - import json if getattr(args, "json", False): - print(json.dumps(results, indent=2)) - else: - for i, r in enumerate(results, 1): - domain = urlparse(r["url"]).netloc - folder = r["folder_path"] - path = f"{folder}/" if folder else "" - print(f"{i:2d}. {r['title']}") - print(f" {path}{domain}") - if include_excerpt and r.get("relevant_excerpt"): - excerpt = r["relevant_excerpt"] - print(f" โคต {excerpt}") + _print_json(console, results, preserve_order=True) + return 0 + for i, r in enumerate(results, 1): + folder = r.get("folder_path") or "(root)" + url = r["url"] + console.out(f"{i:2d}. {console.style(r['title'], 'bold')}") + console.out( + " " + f"score={console.style(_format_score(r.get('score')), 'accent')} " + f"folder={folder}" + ) + console.out(f" url={url}") + if include_excerpt and r.get("relevant_excerpt"): + console.out(f" โคต {r['relevant_excerpt']}") + console.hint(f"Open a result with: mindmark find {args.query!r} --open N") return 0 -def _cmd_stats(args): +def _cmd_open(args: argparse.Namespace) -> int: + args.open = 1 + args.json = False + return _cmd_find(args) + + +def _stable_stats(stats: dict) -> dict: + return { + "db_path": stats["db_path"], + "model": stats["model"], + "top_domains": [ + {"count": count, "domain": domain} + for domain, count in stats.get("top_domains", []) + ], + "top_folders": [ + {"count": count, "folder": folder} + for folder, count in stats.get("top_folders", []) + ], + "total": stats["total"], + } + + +def _cmd_stats(args: argparse.Namespace) -> int: + from .index import Index + + console = _console(args) idx = Index(db_path=args.db) try: - stats = idx.stats() - print(f"bookmarks: {stats['total']}") - if stats['total'] > 0: - print(f"model: {stats['model']}") - if stats['top_domains']: - print(f"\ntop domains:") - for domain, count in stats['top_domains']: - print(f" {domain}: {count}") - if stats['top_folders']: - print(f"\ntop folders:") - for folder, count in stats['top_folders']: - print(f" {folder}: {count}") - return 0 + stats = _stable_stats(idx.stats()) finally: idx.close() + if getattr(args, "json", False): + _print_json(console, stats) + return 0 -def _cmd_enrich(args): + console.out(f"Bookmarks: {console.style(str(stats['total']), 'accent')}") + console.out(f"Index: {stats['db_path']}") + if stats["model"]: + console.out(f"Model: {stats['model']}") + if stats["total"] == 0: + console.hint("Run 'mindmark sync' to import bookmarks from your browsers.") + return 0 + + if stats["top_domains"]: + console.out() + console.out(console.style("Top domains", "bold")) + for item in stats["top_domains"]: + console.out(f" {item['domain']}: {item['count']}") + if stats["top_folders"]: + console.out() + console.out(console.style("Top folders", "bold")) + for item in stats["top_folders"]: + console.out(f" {item['folder']}: {item['count']}") + return 0 + + +def _cmd_enrich(args: argparse.Namespace) -> int: from .enricher import enrich_pending + from .index import Index + console = _console(args) idx = Index(db_path=args.db) try: pending = idx.pending_enrichment_urls( limit=None if args.refresh_failed else args.limit ) + reset = 0 if args.refresh_failed: reset = idx.reset_failed_enrichment() - if reset: - print(f"reset {reset} failed enrichment rows to pending") - # re-query after reset, respecting --limit pending = idx.pending_enrichment_urls(limit=args.limit) - estats = idx.enrichment_stats() - total_pending = estats.get("pending", 0) - + before = idx.enrichment_stats() if not pending: - print("nothing to enrich โ€” run 'mindmark sync' first, or use --refresh-failed") + payload = { + "before": before, + "complete": 0, + "failed": 0, + "pending": 0, + "reset_failed": reset, + "skipped": 0, + "status": "idle", + "total": 0, + } + if getattr(args, "json", False): + _print_json(console, payload) + else: + console.out("Nothing to enrich. Run 'mindmark sync' first, or use --refresh-failed.") return 0 - to_process = len(pending) - print( - f"enriching {to_process} bookmarks " - f"(pending={total_pending} workers={args.workers} timeout={args.timeout}s)" - ) + if not getattr(args, "json", False): + console.status( + f"Enriching {len(pending)} bookmarks " + f"(pending={before.get('pending', 0)}, workers={args.workers}, timeout={args.timeout}s)" + ) result = enrich_pending( idx, limit=args.limit, workers=args.workers, timeout=args.timeout, - refresh_failed=False, # already handled above + refresh_failed=False, ) - print(f"done. {result}") + after = idx.enrichment_stats() + payload = { + "after": after, + "before": before, + "complete": result.complete, + "failed": result.failed, + "reset_failed": reset, + "skipped": result.skipped, + "status": "complete", + "total": result.total, + } + if getattr(args, "json", False): + _print_json(console, payload) + else: + console.success( + f"Enrichment complete: complete={result.complete}, " + f"failed={result.failed}, skipped={result.skipped}" + ) return 0 - except KeyboardInterrupt: - print("\n\nCancelled by user.") - return 1 finally: idx.close() -def _cmd_sync(args): - from .browsers import parse_browser_bookmarks, detect_browsers - - browsers = detect_browsers() - if not browsers: - print("error: no browsers detected", file=sys.stderr) - return 1 - - print(f"[1/2] collecting bookmarks from {', '.join(b.browser_name for b in browsers)}") - bookmarks = []; [bookmarks.extend(parse_browser_bookmarks(b)) for b in browsers] - if not bookmarks: - print("no bookmarks found.") +def _browser_profile_dict(profile: object) -> dict: + return { + "browser": getattr(profile, "browser_name"), + "path": str(getattr(profile, "bookmark_path")), + "profile": getattr(profile, "profile_name"), + "source_id": getattr(profile, "source_id"), + "type": getattr(profile, "browser_type"), + } + + +def _detect_profiles(browser: str | None) -> list[object]: + from .browsers.paths import detect_browsers + + profiles = detect_browsers() + if browser: + wanted = browser.lower() + profiles = [p for p in profiles if p.browser_name.lower() == wanted] + return profiles + + +def _list_browsers(args: argparse.Namespace) -> int: + console = _console(args) + profiles = _detect_profiles(getattr(args, "browser", None)) + payload = { + "detected": [_browser_profile_dict(p) for p in profiles], + "supported": list(_SUPPORTED_BROWSER_NAMES.values()), + } + if getattr(args, "json", False): + _print_json(console, payload) return 0 - print(f" found {len(bookmarks)} unique bookmarks") - - print(f"[2/2] syncing to {args.db or default_db_path()}") + + console.out(console.style("Supported browsers", "bold")) + for name in payload["supported"]: + console.out(f" - {name}") + if profiles: + console.out() + console.out(console.style("Detected profiles", "bold")) + for profile in profiles: + console.out( + f" - {profile.browser_name} ({profile.profile_name}) " + f"โ†’ {profile.bookmark_path}" + ) + else: + console.out() + console.out("Detected profiles: none") + return 0 + + +def _cmd_sync(args: argparse.Namespace) -> int: + from .browsers import parse_browser_bookmarks + + console = _console(args) + if args.list_browsers: + return _list_browsers(args) + + profiles = _detect_profiles(args.browser) + if not profiles: + target = _SUPPORTED_BROWSER_NAMES.get(args.browser or "", "supported browsers") + message = f"No bookmark files detected for {target}." + payload = { + "error": message, + "profiles": [], + "supported": list(_SUPPORTED_BROWSER_NAMES.values()), + } + if getattr(args, "json", False): + _print_json(console, payload) + else: + console.error(message) + console.hint("Use 'mindmark sync --list-browsers' to see supported browsers.", stderr=True) + return 1 + + if not getattr(args, "json", False): + names = ", ".join(f"{p.browser_name} ({p.profile_name})" for p in profiles) + console.status(f"Reading bookmarks from {names}") + + parsed: list[tuple[object, list[object]]] = [] + warnings: list[dict] = [] + for profile in profiles: + try: + bookmarks = parse_browser_bookmarks(profile) + except (OSError, ValueError, KeyError, json.JSONDecodeError, sqlite3.Error) as exc: + warning = { + "browser": profile.browser_name, + "error": str(exc), + "profile": profile.profile_name, + } + warnings.append(warning) + if not getattr(args, "json", False): + console.warning( + f"Skipped {profile.browser_name} ({profile.profile_name}): {exc}" + ) + continue + parsed.append((profile, bookmarks)) + + if not parsed: + payload = { + "error": "No readable browser bookmark profiles were found.", + "profiles": [], + "summary": {"added": 0, "removed": 0, "unchanged": 0, "updated": 0}, + "warnings": warnings, + } + if getattr(args, "json", False): + _print_json(console, payload) + else: + console.error(payload["error"]) + console.hint("Close browsers that may be locking bookmark files, then retry.", stderr=True) + return 1 + + total_bookmarks = sum(len(bookmarks) for _profile, bookmarks in parsed) + if not getattr(args, "json", False): + console.success(f"Collected {total_bookmarks} bookmarks from {len(parsed)} profile(s)") + console.status(f"Syncing index at {args.db or default_db_path(create=False)}") + + from .index import Index + idx = Index(db_path=args.db, model_name=args.model) - res = idx.sync(bookmarks) - - print(f"done. added={res.added} updated={res.updated} removed={res.removed}") + try: + summary = {"added": 0, "removed": 0, "unchanged": 0, "updated": 0} + profile_results = [] + for profile, bookmarks in parsed: + res = idx.sync(bookmarks, source=profile.source_id) + item = _browser_profile_dict(profile) + item.update( + { + "bookmarks": len(bookmarks), + "added": res.added, + "removed": res.removed, + "unchanged": res.unchanged, + "updated": res.updated, + } + ) + profile_results.append(item) + summary["added"] += res.added + summary["removed"] += res.removed + summary["unchanged"] += res.unchanged + summary["updated"] += res.updated + payload = { + "db_path": str(idx.db_path), + "model": idx.model_name, + "profiles": profile_results, + "summary": summary, + "warnings": warnings, + } + finally: + idx.close() + + if getattr(args, "json", False): + _print_json(console, payload) + else: + console.success( + "Sync complete: " + f"added={summary['added']}, updated={summary['updated']}, " + f"removed={summary['removed']}, unchanged={summary['unchanged']}" + ) + if summary["added"] or summary["updated"]: + console.hint("Run 'mindmark find \"your query\"' to search your bookmarks.") return 0 -def build_parser(): +def _add_search_options(parser: argparse.ArgumentParser) -> None: + parser.add_argument("query") + parser.add_argument("-k", "--top", type=int, default=10) + parser.add_argument("--domain") + parser.add_argument("--folder") + parser.add_argument( + "--excerpt", + action="store_true", + help="include excerpt from enriched page content (requires mindmark enrich)", + ) + + +def build_parser() -> argparse.ArgumentParser: p = argparse.ArgumentParser( prog="mindmark", - description="mindmark โ€” local semantic search over your browser bookmarks.", + description="mindmark - local semantic search over your browser bookmarks.", ) p.add_argument("--version", action="version", version=f"%(prog)s {__version__}") p.add_argument( - "--db", default=os.environ.get("MINDMARK_DB"), - help=f"SQLite index path (default: {default_db_path()})", + "--db", + default=os.environ.get("MINDMARK_DB"), + help=f"SQLite index path (default: {default_db_path(create=False)})", ) + p.add_argument("--no-color", action="store_true", help="disable ANSI color output") sub = p.add_subparsers(dest="cmd") @@ -336,89 +625,96 @@ def build_parser(): pi.set_defaults(func=_cmd_index) pf = sub.add_parser("find", help="search bookmarks by natural-language query") - pf.add_argument("query") - pf.add_argument("-k", "--top", type=int, default=10) - pf.add_argument("--domain") - pf.add_argument("--folder") + _add_search_options(pf) pf.add_argument("--json", action="store_true") pf.add_argument("--open", type=int, metavar="N") - pf.add_argument( - "--excerpt", action="store_true", - help="include excerpt from enriched page content (requires mindmark enrich)", - ) pf.set_defaults(func=_cmd_find) + po = sub.add_parser("open", help="open the top bookmark matching a query") + _add_search_options(po) + po.set_defaults(func=_cmd_open) + ps = sub.add_parser("stats", help="show index stats") + ps.add_argument("--json", action="store_true") ps.set_defaults(func=_cmd_stats) py = sub.add_parser("sync", help="automatically sync bookmarks from local browsers") py.add_argument("--model", default=DEFAULT_MODEL) + py.add_argument( + "--browser", + choices=sorted(_SUPPORTED_BROWSER_NAMES), + type=str.lower, + help="sync only one browser: chrome, edge, brave, or firefox", + ) + py.add_argument("--list-browsers", action="store_true", help="list supported and detected browsers") + py.add_argument("--json", action="store_true") py.set_defaults(func=_cmd_sync) pv = sub.add_parser("validate", help="validate indexed bookmark URLs and report stale entries (read-only)") - pv.add_argument( - "--timeout", - type=float, - default=8.0, - help="per-request timeout in seconds (default: 8.0)", - ) - pv.add_argument( - "--workers", - type=int, - default=16, - help="parallel request workers (default: 16)", - ) + pv.add_argument("--timeout", type=float, default=8.0, help="per-request timeout in seconds (default: 8.0)") + pv.add_argument("--workers", type=int, default=16, help="parallel request workers (default: 16)") + pv.add_argument("--json", action="store_true") pv.set_defaults(func=_cmd_validate) pd = sub.add_parser("drop-index", help="drop (delete) the local index database") - pd.add_argument( - "--yes", - action="store_true", - help="auto-confirm index deletion", - ) + pd.add_argument("--yes", action="store_true", help="auto-confirm index deletion") pd.set_defaults(func=_cmd_drop_index) pe = sub.add_parser( "enrich", help="fetch page content for bookmarks and build summary embeddings (local, no cloud)", ) - pe.add_argument( - "--limit", type=int, default=None, - help="max bookmarks to process per run (default: all pending)", - ) - pe.add_argument( - "--workers", type=int, default=8, - help="parallel fetch workers (default: 8)", - ) - pe.add_argument( - "--timeout", type=float, default=10.0, - help="per-request fetch timeout in seconds (default: 10.0)", - ) - pe.add_argument( - "--refresh-failed", action="store_true", - help="retry previously failed enrichments", - ) + pe.add_argument("--limit", type=int, default=None, help="max bookmarks to process per run (default: all pending)") + pe.add_argument("--workers", type=int, default=8, help="parallel fetch workers (default: 8)") + pe.add_argument("--timeout", type=float, default=10.0, help="per-request fetch timeout in seconds (default: 10.0)") + pe.add_argument("--refresh-failed", action="store_true", help="retry previously failed enrichments") + pe.add_argument("--json", action="store_true") pe.set_defaults(func=_cmd_enrich) return p -def main(argv=None): - parser = build_parser() - args = parser.parse_args(argv) +def _validate_args(parser: argparse.ArgumentParser, args: argparse.Namespace) -> None: if args.cmd == "validate": if args.timeout <= 0: parser.error("--timeout must be > 0") if args.workers <= 0: parser.error("--workers must be > 0") - return args.func(args) - if args.cmd == "enrich": + elif args.cmd == "enrich": if args.workers <= 0: parser.error("--workers must be > 0") if args.timeout <= 0: parser.error("--timeout must be > 0") - return args.func(args) + if args.limit is not None and args.limit <= 0: + parser.error("--limit must be > 0") + elif args.cmd in {"find", "open"}: + if args.top <= 0: + parser.error("--top must be > 0") + if getattr(args, "open", None) is not None and args.open <= 0: + parser.error("--open must be > 0") + elif args.cmd == "index": + if args.batch_size <= 0: + parser.error("--batch-size must be > 0") + + +def main(argv: list[str] | None = None) -> int: + parser = build_parser() + args = parser.parse_args(argv) if args.cmd is None: parser.print_help() return 2 - return args.func(args) + + _validate_args(parser, args) + args.console = Console(color=False if args.no_color else None) + + try: + return int(args.func(args) or 0) + except KeyboardInterrupt: + args.console.error("Cancelled by user.") + return 130 + except BrokenPipeError: + return 1 + except (sqlite3.Error, OSError, RuntimeError, ImportError, ValueError) as exc: + args.console.error(str(exc) or exc.__class__.__name__) + args.console.hint("Re-run with a valid index path or retry after closing locked files.", stderr=True) + return 1 diff --git a/src/mindmark/defaults.py b/src/mindmark/defaults.py new file mode 100644 index 0000000..3bdea17 --- /dev/null +++ b/src/mindmark/defaults.py @@ -0,0 +1,21 @@ +"""Lightweight shared defaults for CLI and index modules.""" +from __future__ import annotations + +import os +from pathlib import Path + +DEFAULT_MODEL = "BAAI/bge-small-en-v1.5" + + +def default_db_path(create: bool = True) -> Path: + env = os.environ.get("MINDMARK_HOME") + if env: + base = Path(env) + elif os.name == "nt": + local = os.environ.get("LOCALAPPDATA") + base = Path(local) / "mindmark" if local else Path.home() / ".mindmark" + else: + base = Path.home() / ".mindmark" + if create: + base.mkdir(parents=True, exist_ok=True) + return base / "index.db" diff --git a/src/mindmark/index.py b/src/mindmark/index.py index b1621e9..7623aea 100644 --- a/src/mindmark/index.py +++ b/src/mindmark/index.py @@ -2,34 +2,18 @@ from __future__ import annotations import hashlib -import os import sqlite3 from dataclasses import dataclass from pathlib import Path import numpy as np +from .defaults import DEFAULT_MODEL, default_db_path from .parser import Bookmark -DEFAULT_MODEL = "BAAI/bge-small-en-v1.5" - _SCHEMA_VERSION = 3 -def default_db_path() -> Path: - env = os.environ.get("MINDMARK_HOME") - if env: - base = Path(env) - elif os.name == "nt": - # On Windows, use %LOCALAPPDATA%\mindmark (dotfolders are unusual) - local = os.environ.get("LOCALAPPDATA") - base = Path(local) / "mindmark" if local else Path.home() / ".mindmark" - else: - base = Path.home() / ".mindmark" - base.mkdir(parents=True, exist_ok=True) - return base / "index.db" - - _SCHEMA = """ CREATE TABLE IF NOT EXISTS meta ( key TEXT PRIMARY KEY, diff --git a/tests/test_browser_detection.py b/tests/test_browser_detection.py index f61a189..c9af685 100644 --- a/tests/test_browser_detection.py +++ b/tests/test_browser_detection.py @@ -1,9 +1,75 @@ """Tests for browser detection and path resolution.""" import sys from pathlib import Path -from unittest.mock import patch -from mindmark.browsers.paths import detect_browsers, BrowserProfile +import pytest + +from mindmark.browsers.paths import BrowserProfile, detect_browsers + + +SUPPORTED_PLATFORMS = ("win32", "darwin", "linux") +PROFILE_NAMES = { + "Chrome": "Default", + "Edge": "Profile 1", + "Brave": "Profile 2", + "Firefox": "abc12345.default-release", +} + + +def _configure_platform(monkeypatch, tmp_path: Path, platform: str) -> dict[str, Path]: + roots = { + "home": tmp_path / "home", + "local": tmp_path / "local", + "roaming": tmp_path / "roaming", + } + for root in roots.values(): + root.mkdir() + + monkeypatch.setattr(sys, "platform", platform) + monkeypatch.setattr(Path, "home", lambda: roots["home"]) + monkeypatch.setenv("LOCALAPPDATA", str(roots["local"])) + monkeypatch.setenv("APPDATA", str(roots["roaming"])) + return roots + + +def _browser_base(roots: dict[str, Path], platform: str, browser: str) -> Path: + paths = { + "win32": { + "Chrome": roots["local"] / "Google" / "Chrome" / "User Data", + "Edge": roots["local"] / "Microsoft" / "Edge" / "User Data", + "Brave": roots["local"] / "BraveSoftware" / "Brave-Browser" / "User Data", + "Firefox": roots["roaming"] / "Mozilla" / "Firefox" / "Profiles", + }, + "darwin": { + "Chrome": roots["home"] / "Library" / "Application Support" / "Google" / "Chrome", + "Edge": roots["home"] / "Library" / "Application Support" / "Microsoft Edge", + "Brave": ( + roots["home"] + / "Library" + / "Application Support" + / "BraveSoftware" + / "Brave-Browser" + ), + "Firefox": roots["home"] / "Library" / "Application Support" / "Firefox" / "Profiles", + }, + "linux": { + "Chrome": roots["home"] / ".config" / "google-chrome", + "Edge": roots["home"] / ".config" / "microsoft-edge", + "Brave": roots["home"] / ".config" / "BraveSoftware" / "Brave-Browser", + "Firefox": roots["home"] / ".mozilla" / "firefox", + }, + } + return paths[platform][browser] + + +def _create_fake_profile(base: Path, browser: str) -> tuple[str, str, Path]: + browser_type = "firefox" if browser == "Firefox" else "chromium" + profile_name = PROFILE_NAMES[browser] + profile_dir = base / profile_name + profile_dir.mkdir(parents=True) + bookmark_path = profile_dir / ("places.sqlite" if browser_type == "firefox" else "Bookmarks") + bookmark_path.write_text("fake bookmark storage") + return profile_name, browser_type, bookmark_path def test_browser_profile_source_id(): @@ -27,62 +93,34 @@ def test_browser_profile_custom_source_id(): assert p.source_id == "custom:id" -def test_detect_browsers_returns_list(tmp_path): +@pytest.mark.parametrize("platform", SUPPORTED_PLATFORMS) +def test_detect_browsers_returns_empty_list_without_profiles(tmp_path, monkeypatch, platform): """detect_browsers should return a list (possibly empty) on any platform.""" - # With a fake home, no browsers should be detected - with patch("mindmark.browsers.paths._home", return_value=tmp_path): - with patch("mindmark.browsers.paths._local_app_data", return_value=tmp_path / "Local"): - with patch("mindmark.browsers.paths._app_data", return_value=tmp_path / "Roaming"): - profiles = detect_browsers() - assert isinstance(profiles, list) - - -def test_detect_chromium_with_fake_profile(tmp_path): - """Simulate a Chrome installation with a Default profile.""" - if sys.platform == "darwin": - chrome_dir = tmp_path / "Library" / "Application Support" / "Google" / "Chrome" - elif sys.platform.startswith("linux"): - chrome_dir = tmp_path / ".config" / "google-chrome" - else: - chrome_dir = tmp_path / "Google" / "Chrome" / "User Data" - - default_profile = chrome_dir / "Default" - default_profile.mkdir(parents=True) - (default_profile / "Bookmarks").write_text('{"roots":{}}') - - with patch("mindmark.browsers.paths._home", return_value=tmp_path): - with patch("mindmark.browsers.paths._local_app_data", return_value=tmp_path): - profiles = detect_browsers() - - chrome_profiles = [p for p in profiles if p.browser_name == "Chrome"] - assert len(chrome_profiles) >= 1 - assert chrome_profiles[0].profile_name == "Default" - assert chrome_profiles[0].browser_type == "chromium" - - -def test_detect_firefox_with_fake_profile(tmp_path): - """Simulate a Firefox installation with a profile.""" - if sys.platform == "darwin": - ff_dir = tmp_path / "Library" / "Application Support" / "Firefox" / "Profiles" - elif sys.platform.startswith("linux"): - ff_dir = tmp_path / ".mozilla" / "firefox" - else: - ff_dir = tmp_path / "Roaming" / "Mozilla" / "Firefox" / "Profiles" - - profile_dir = ff_dir / "abc12345.default-release" - profile_dir.mkdir(parents=True) - # Create a minimal places.sqlite - import sqlite3 - db = profile_dir / "places.sqlite" - con = sqlite3.connect(db) - con.execute("CREATE TABLE moz_places (id INTEGER PRIMARY KEY, url TEXT)") - con.close() - - with patch("mindmark.browsers.paths._home", return_value=tmp_path): - with patch("mindmark.browsers.paths._app_data", return_value=tmp_path / "Roaming"): - profiles = detect_browsers() - - ff_profiles = [p for p in profiles if p.browser_name == "Firefox"] - assert len(ff_profiles) >= 1 - assert ff_profiles[0].browser_type == "firefox" - assert "default-release" in ff_profiles[0].profile_name + _configure_platform(monkeypatch, tmp_path, platform) + + profiles = detect_browsers() + + assert profiles == [] + + +@pytest.mark.parametrize("platform", SUPPORTED_PLATFORMS) +def test_detect_supported_browser_profiles_by_platform(tmp_path, monkeypatch, platform): + """Simulate all supported browsers on every supported platform.""" + roots = _configure_platform(monkeypatch, tmp_path, platform) + expected = {} + for browser in PROFILE_NAMES: + profile_name, browser_type, bookmark_path = _create_fake_profile( + _browser_base(roots, platform, browser), + browser, + ) + expected[(browser, profile_name)] = (browser_type, bookmark_path) + + profiles = detect_browsers() + + detected = {(p.browser_name, p.profile_name): p for p in profiles} + assert set(detected) == set(expected) + for key, (browser_type, bookmark_path) in expected.items(): + profile = detected[key] + assert profile.browser_type == browser_type + assert profile.bookmark_path == bookmark_path + assert profile.source_id == f"{key[0].lower()}:{key[1]}" diff --git a/tests/test_browsers_init.py b/tests/test_browsers_init.py index a04341b..293768c 100644 --- a/tests/test_browsers_init.py +++ b/tests/test_browsers_init.py @@ -1,8 +1,9 @@ """Tests for the browsers orchestration layer (__init__.py).""" import json -import tempfile from pathlib import Path -from unittest.mock import patch, MagicMock +from unittest.mock import patch + +import pytest from mindmark.browsers import ( parse_browser_bookmarks, @@ -11,7 +12,11 @@ from mindmark.browsers.paths import BrowserProfile -def _make_chromium_profile(tmp_path: Path) -> BrowserProfile: +def _make_chromium_profile( + tmp_path: Path, + browser_name: str = "Chrome", + profile_name: str = "Default", +) -> BrowserProfile: """Create a fake Chromium profile with a Bookmarks JSON file.""" bookmark_file = tmp_path / "Bookmarks" data = { @@ -30,9 +35,9 @@ def _make_chromium_profile(tmp_path: Path) -> BrowserProfile: } bookmark_file.write_text(json.dumps(data)) return BrowserProfile( - browser_name="Chrome", + browser_name=browser_name, browser_type="chromium", - profile_name="Default", + profile_name=profile_name, bookmark_path=bookmark_file, ) @@ -87,38 +92,54 @@ def test_parse_browser_bookmarks_unsupported(): profile_name="Default", bookmark_path=Path("/fake"), ) - try: + with pytest.raises(ValueError, match="Unsupported"): parse_browser_bookmarks(profile) - assert False, "Should have raised ValueError" - except ValueError as e: - assert "Unsupported" in str(e) -def test_collect_all_bookmarks_with_filter(tmp_path): - chrome_dir = tmp_path / "chrome" - chrome_dir.mkdir() - chrome_profile = _make_chromium_profile(chrome_dir) +def test_collect_all_bookmarks_with_case_insensitive_filter(tmp_path): + profiles = [] + for browser, profile_name in [ + ("Chrome", "Default"), + ("Edge", "Profile 1"), + ("Brave", "Profile 2"), + ]: + browser_dir = tmp_path / browser.lower() + browser_dir.mkdir() + profiles.append(_make_chromium_profile(browser_dir, browser, profile_name)) ff_dir = tmp_path / "firefox" ff_dir.mkdir() - ff_profile = _make_firefox_profile(ff_dir) - - fake_profiles = [chrome_profile, ff_profile] - - with patch("mindmark.browsers.detect_browsers", return_value=fake_profiles): - # Filter to Chrome only - results = collect_all_bookmarks(browser_filter="Chrome") - assert len(results) == 1 - assert results[0][0].browser_name == "Chrome" + profiles.append(_make_firefox_profile(ff_dir)) + + with patch("mindmark.browsers.detect_browsers", return_value=profiles): + for browser_filter, expected_browser in [ + ("chrome", "Chrome"), + ("EDGE", "Edge"), + (" Edge ", "Edge"), + ("bRaVe", "Brave"), + ("FIREFOX", "Firefox"), + ]: + results = collect_all_bookmarks(browser_filter=browser_filter) + assert [profile.browser_name for profile, _ in results] == [expected_browser] + assert results[0][1] + + assert collect_all_bookmarks(browser_filter="safari") == [] + assert collect_all_bookmarks(browser_filter="unknown-browser") == [] - # Filter to Firefox only - results = collect_all_bookmarks(browser_filter="firefox") - assert len(results) == 1 - assert results[0][0].browser_name == "Firefox" - - # No filter โ€” gets all results = collect_all_bookmarks(browser_filter=None) - assert len(results) == 2 + assert [profile.browser_name for profile, _ in results] == [ + "Chrome", + "Edge", + "Brave", + "Firefox", + ] + whitespace_results = collect_all_bookmarks(browser_filter=" ") + assert [profile.browser_name for profile, _ in whitespace_results] == [ + "Chrome", + "Edge", + "Brave", + "Firefox", + ] def test_collect_all_bookmarks_no_browsers(): @@ -127,17 +148,25 @@ def test_collect_all_bookmarks_no_browsers(): assert results == [] -def test_collect_all_bookmarks_handles_parse_error(tmp_path, capsys): - """A broken profile should print a warning and not crash.""" +def test_collect_all_bookmarks_warns_and_continues_after_parse_error(tmp_path, capsys): + """A broken profile should print a warning and not block valid profiles.""" bad_profile = BrowserProfile( browser_name="Chrome", browser_type="chromium", profile_name="Corrupt", bookmark_path=tmp_path / "nonexistent", ) - with patch("mindmark.browsers.detect_browsers", return_value=[bad_profile]): + + good_dir = tmp_path / "good-edge" + good_dir.mkdir() + good_profile = _make_chromium_profile(good_dir, "Edge", "Default") + + with patch("mindmark.browsers.detect_browsers", return_value=[bad_profile, good_profile]): results = collect_all_bookmarks() - assert results == [] + assert len(results) == 1 + assert results[0][0].browser_name == "Edge" + assert len(results[0][1]) == 2 captured = capsys.readouterr() assert "warning" in captured.err assert "Chrome" in captured.err + assert "Corrupt" in captured.err diff --git a/tests/test_cli_ui.py b/tests/test_cli_ui.py new file mode 100644 index 0000000..cdf0e83 --- /dev/null +++ b/tests/test_cli_ui.py @@ -0,0 +1,259 @@ +from __future__ import annotations + +import io +import json +import sys +from pathlib import Path +from types import SimpleNamespace + +from mindmark import cli +from mindmark._console import Console +from mindmark.browsers.paths import BrowserProfile + + +class _FakeIndex: + search_results = [] + stats_payload = { + "db_path": "fake.db", + "model": "test-model", + "top_domains": [("example.com", 2)], + "top_folders": [("Work", 1)], + "total": 2, + } + bookmarks = [] + pending = [] + enrichment = {} + sync_calls = [] + + def __init__(self, db_path=None, model_name="test-model"): + self.db_path = Path(db_path or "fake.db") + self.model_name = model_name + + def close(self): + pass + + def search(self, **_kwargs): + return list(self.search_results) + + def stats(self): + return dict(self.stats_payload) + + def all_bookmarks(self): + return list(self.bookmarks) + + def pending_enrichment_urls(self, limit=None): + return list(self.pending) + + def reset_failed_enrichment(self): + return 0 + + def enrichment_stats(self): + return dict(self.enrichment) + + def sync(self, bookmarks, source="html"): + self.sync_calls.append((source, len(bookmarks))) + return SimpleNamespace(added=len(bookmarks), updated=0, removed=0, unchanged=0) + + +def _install_fake_index(monkeypatch, fake_index=_FakeIndex): + monkeypatch.setitem(sys.modules, "mindmark.index", SimpleNamespace(Index=fake_index)) + + +def test_console_color_is_tty_aware_and_can_be_disabled(monkeypatch): + class TtyBuffer(io.StringIO): + def isatty(self): + return True + + out = TtyBuffer() + Console(color=True, stdout=out).success("Done") + assert "\033[" in out.getvalue() + + monkeypatch.setenv("NO_COLOR", "1") + out = TtyBuffer() + Console(color=True, stdout=out).success("Done") + assert "\033[" not in out.getvalue() + + +def test_find_human_output_includes_score_url_folder_excerpt_and_hint(monkeypatch, capsys): + _FakeIndex.search_results = [ + { + "score": 0.875, + "title": "Example", + "url": "https://example.com/docs", + "folder_path": "Work/Docs", + "domain": "example.com", + "relevant_excerpt": "Helpful excerpt.", + } + ] + _install_fake_index(monkeypatch) + + rc = cli.main(["find", "docs", "--excerpt"]) + + assert rc == 0 + captured = capsys.readouterr() + assert "1. Example" in captured.out + assert "score=0.875" in captured.out + assert "folder=Work/Docs" in captured.out + assert "url=https://example.com/docs" in captured.out + assert "โคต Helpful excerpt." in captured.out + assert "Hint: Open a result with:" in captured.out + assert captured.err == "" + + +def test_find_json_preserves_result_list_and_has_no_color(monkeypatch, capsys): + _FakeIndex.search_results = [ + { + "score": 0.5, + "title": "Example", + "url": "https://example.com", + "folder_path": "", + "domain": "example.com", + } + ] + _install_fake_index(monkeypatch) + + rc = cli.main(["find", "example", "--json"]) + + assert rc == 0 + captured = capsys.readouterr() + assert "\033[" not in captured.out + assert json.loads(captured.out) == _FakeIndex.search_results + assert captured.err == "" + + +def test_find_no_results_is_single_actionable_message(monkeypatch, capsys): + _FakeIndex.search_results = [] + _install_fake_index(monkeypatch) + + rc = cli.main(["find", "missing"]) + + assert rc == 1 + captured = capsys.readouterr() + assert captured.out.count("No matching bookmarks.") == 1 + assert "mindmark sync" in captured.out + assert captured.err == "" + + +def test_open_alias_opens_top_result(monkeypatch, capsys): + opened = [] + _FakeIndex.search_results = [ + { + "score": 0.9, + "title": "Open Me", + "url": "https://open.example.com", + "folder_path": "", + "domain": "open.example.com", + } + ] + _install_fake_index(monkeypatch) + monkeypatch.setattr(cli.webbrowser, "open", opened.append) + + rc = cli.main(["open", "open me"]) + + assert rc == 0 + assert opened == ["https://open.example.com"] + assert "Opened 1. Open Me" in capsys.readouterr().out + + +def test_stats_json_uses_stable_dictionary(monkeypatch, capsys): + _install_fake_index(monkeypatch) + + rc = cli.main(["stats", "--json"]) + + assert rc == 0 + payload = json.loads(capsys.readouterr().out) + assert payload == { + "db_path": "fake.db", + "model": "test-model", + "top_domains": [{"count": 2, "domain": "example.com"}], + "top_folders": [{"count": 1, "folder": "Work"}], + "total": 2, + } + + +def test_validate_empty_index_json_is_actionable(monkeypatch, capsys): + _FakeIndex.bookmarks = [] + _install_fake_index(monkeypatch) + + rc = cli.main(["validate", "--json"]) + + assert rc == 1 + payload = json.loads(capsys.readouterr().out) + assert payload["total"] == 0 + assert "mindmark sync" in payload["message"] + + +def test_enrich_json_idle(monkeypatch, capsys): + _FakeIndex.pending = [] + _FakeIndex.enrichment = {"complete": 1} + _install_fake_index(monkeypatch) + monkeypatch.setitem( + sys.modules, + "mindmark.enricher", + SimpleNamespace(enrich_pending=lambda *_args, **_kwargs: None), + ) + + rc = cli.main(["enrich", "--json"]) + + assert rc == 0 + payload = json.loads(capsys.readouterr().out) + assert payload["status"] == "idle" + assert payload["before"] == {"complete": 1} + + +def test_sync_list_browsers_shows_supported(monkeypatch, capsys): + import mindmark.browsers.paths as paths + + monkeypatch.setattr(paths, "detect_browsers", lambda: []) + + rc = cli.main(["sync", "--list-browsers"]) + + assert rc == 0 + out = capsys.readouterr().out + for name in ["Chrome", "Edge", "Brave", "Firefox"]: + assert name in out + + +def test_sync_browser_filter_json(monkeypatch, capsys): + import mindmark.browsers as browsers + import mindmark.browsers.paths as paths + + chrome = BrowserProfile( + browser_name="Chrome", + browser_type="chromium", + profile_name="Default", + bookmark_path=Path("chrome-bookmarks"), + ) + firefox = BrowserProfile( + browser_name="Firefox", + browser_type="firefox", + profile_name="default-release", + bookmark_path=Path("places.sqlite"), + ) + _FakeIndex.sync_calls = [] + _install_fake_index(monkeypatch) + monkeypatch.setattr(paths, "detect_browsers", lambda: [chrome, firefox]) + monkeypatch.setattr(browsers, "parse_browser_bookmarks", lambda _profile: [object()]) + + rc = cli.main(["sync", "--browser", "Firefox", "--json"]) + + assert rc == 0 + payload = json.loads(capsys.readouterr().out) + assert [p["browser"] for p in payload["profiles"]] == ["Firefox"] + assert payload["summary"]["added"] == 1 + assert _FakeIndex.sync_calls == [("firefox:default-release", 1)] + + +def test_runtime_errors_are_concise(monkeypatch, capsys): + class BrokenIndex(_FakeIndex): + def __init__(self, *args, **kwargs): + raise RuntimeError("database is locked") + + _install_fake_index(monkeypatch, BrokenIndex) + + rc = cli.main(["stats"]) + + assert rc == 1 + captured = capsys.readouterr() + assert "database is locked" in captured.err + assert "Traceback" not in captured.err