Skip to content

Unified model catalog across HF, LM Studio, Ollama, llama.cpp + content-based resolution#2012

Draft
AlexCheema wants to merge 2 commits into
mainfrom
alexcheema/unified-model-sources
Draft

Unified model catalog across HF, LM Studio, Ollama, llama.cpp + content-based resolution#2012
AlexCheema wants to merge 2 commits into
mainfrom
alexcheema/unified-model-sources

Conversation

@AlexCheema
Copy link
Copy Markdown
Contributor

Summary

  • Unified model catalog: pluggable ModelSource registry detects locally-installed models from HuggingFace cache, LM Studio, Ollama, llama.cpp, and exo's own dirs. Each entry surfaces in state.local_models tagged with source + format (safetensors / mlx / gguf). The Models page (renamed from Downloads) shows source badges, a source filter, and an "GGUF · n/a — not yet loadable" badge for entries exo can't run today. Inference path-resolution falls through to external sources so MLX/safetensors models in HF cache or LM Studio load without re-downloading.
  • Single-file safetensors models (e.g. Qwen/Qwen3-0.6B): drop the hard requirement on model.safetensors.index.json. fetch_safetensors_size falls back to huggingface_hub.model_info().safetensors.total when the index 404s; _scan_model_directory / is_model_directory_complete recognise a non-.partial model.safetensors as a complete dir.
  • Content-based model resolution: resolve_existing_model now does a convention-pass first (unchanged), then falls through to a content-pass that fingerprints architecture-defining keys in config.json and finds a matching dir regardless of folder name. Rename Qwen--Qwen3-0.6B/ to wrong-typo/ and exo still finds it — no redundant re-download. Fingerprint cost is sub-millisecond at realistic model counts (10 μs/dir scanned).

End-to-end verified live on macOS against a real LM Studio install, a real HF cache, and a real Ollama install.

Test plan

  • uv run basedpyright — 0 errors
  • uv run ruff check — clean
  • nix fmt — applied
  • uv run pytest — 445 passed (+22 new: 19 source scanners + 4 scanner service + 8 fingerprint + 7 content resolution + 7 single-file)
  • npx svelte-check — no new dashboard errors (16 pre-existing, unchanged)
  • GET /sources returns the 5 sources with availability flags
  • GET /state.localModels populates after the worker's first scan tick
  • POST /models/add Qwen/Qwen3-0.6B returns 200 (single-file regression)
  • Chat completion against Qwen/Qwen3-0.6B returns tokens
  • Mispath: mv ~/.exo/models/Qwen--Qwen3-0.6B ~/.exo/models/wrong-typo, then chat — log shows Resolved … via content fingerprint (folder name 'wrong-typo' != convention 'Qwen--Qwen3-0.6B'), no re-download
  • Manual: Models dashboard at /downloads — verify source badges, filter chips, GGUF "not yet loadable" state, and that delete buttons are hidden on non-exo entries
  • Manual: confirm a multi-shard MLX model (e.g. mlx-community/Qwen3-30B-A3B-4bit) still downloads and runs — regression check

Out of scope (named so we don't slip)

  • A GGUF inference engine for Ollama / llama.cpp / LM Studio GGUF entries. Detect-and-display only; "GGUF · n/a" badge is honest about it.
  • Wiring state.local_models into the chat model picker — independent UI follow-up. Today the Models page surfaces them; the picker still searches HF + bundled cards.
  • Persisting a fingerprint index to disk. Sub-ms per resolve at realistic scales — premature optimisation.
  • Auto-renaming wrong-named folders to canonical. Content resolution makes this unnecessary.

🤖 Generated with Claude Code

AlexCheema and others added 2 commits May 1, 2026 02:17
… content-based resolution

Unifies exo's downloads view with locally-installed models from external tools so
users can see (and where format permits, load) what they already have on disk
without re-downloading. Drops the long-standing requirement that every model
ship a safetensors index — single-file models like Qwen/Qwen3-0.6B now work.
Switches model lookup from path-based to content-based fingerprinting so a
mistyped folder name no longer triggers a redundant re-download.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…he receive loop

A peer running an incompatible schema (newer or older) used to take down the entire
gossipsub receive loop with a single ValidationError. Caught locally — log and drop
the bad message, keep the loop alive. Surfaced while testing this branch against a
peer on a different schema; the fix is independent of the schema work.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant