Skip to content

fix: classify fallback upstream errors (gated/not-found/unavailable) instead of collapsing to 404#29

Merged
narugo1992 merged 6 commits intofeat/client-side-previewfrom
fix/fallback-error-classification
Apr 24, 2026
Merged

fix: classify fallback upstream errors (gated/not-found/unavailable) instead of collapsing to 404#29
narugo1992 merged 6 commits intofeat/client-side-previewfrom
fix/fallback-error-classification

Conversation

@narugo1992
Copy link
Copy Markdown

Summary

Fix: fallback upstream errors are no longer collapsed to a misleading local 404.

Before: clicking Download or Preview on a file in a gated-source repo (e.g. animetimm/mobilenetv3_large_150d.dbv4-full) returned 404 RepoNotFound, even though the containing repo was clearly listable via the tree endpoint. The fallback code hit upstream HF, got a 401, called should_retry_source, saw "non-retryable", and returned None — erasing the status, the upstream message, and cutting short the source chain. with_repo_fallback then re-raised the original local 404.

After: every non-2xx/3xx probe is recorded as a per-source attempt, the loop continues to the next source (a mirror might still have the file), and if everything fails the response is an aggregated JSONResponse with the right HTTP status + the right X-Error-Code for huggingface_hub to raise the right exception subclass.

Builds on top of #28 (target branch: feat/client-side-preview) so the UI-side error copy and the backend classification ship together.

The contract

HTTP status priority across all attempts: 401 > 403 > 404 > 502 (5xx / timeout / network all collapse to 502 Bad Gateway).

X-Error-Code deliberately aligns with huggingface_hub.utils._http.hf_raise_for_status:

Aggregate outcome HTTP X-Error-Code hf_hub raises
Any source 401 401 GatedRepo GatedRepoError
All sources 404 404 EntryNotFound EntryNotFoundError
Any source 403 (no 401) 403 generic HfHubHTTPError
5xx / timeout / network 502 generic (retryable)

Inventing our own codes would downgrade gated-repo hf_hub_download calls to RepositoryNotFoundError — losing the exception type users actually catch. Alignment is the whole point.

Body keeps HF-style {error, detail} plus a sources[] array so curl / SDK users can see the full timeline:

{
  "error": "GatedRepo",
  "detail": "Upstream source requires authentication - likely a gated repository. ...",
  "sources": [
    {"name": "HuggingFace", "url": "https://huggingface.co", "status": 401, "category": "auth", "message": "..."},
    {"name": "Mirror", "url": "https://mirror.local", "status": 404, "category": "not-found", "message": "..."}
  ]
}

X-Error-Message header mirrors body.detail so a bare curl -I shows something readable.

Backend

  • src/kohakuhub/api/fallback/operations.py::try_fallback_resolve — rewrote the source loop. Non-success HEAD/GET responses + timeout + transport exceptions are all recorded as build_fallback_attempt(...) dicts and the loop keeps going. At the end, build_aggregate_failure_response(attempts) returns the aggregate, or None if no source was even tried (respects the empty-sources early exit).
  • src/kohakuhub/api/fallback/utils.py — two new helpers:
    • build_fallback_attempt(source, *, response=None, timeout=None, network=None) — normalizes one probe into the serializable dict that lands in body.sources.
    • build_aggregate_failure_response(attempts) — status priority + HF-aligned X-Error-Code + body shaping.
  • should_retry_source is no longer used for flow control (the loop no longer short-circuits). Left intact to avoid touching its other callers.

Other fallback operations (tree, info, paths_info, user_profile, avatars, repo aggregation) still return None on failure. They don't surface the gated-vs-missing confusion in practice (HF's tree endpoint is public for gated repos), so keeping them focused. Happy to extend in a follow-up if reviewers want the same shape everywhere.

Frontend

  • src/kohaku-hub-ui/src/utils/safetensors.jsSafetensorsFetchError.fromResponse(response) parses the aggregated body (defensive: non-JSON bodies just use X-Error-Message) and attaches errorCode / sources / detail to the thrown error.
  • src/kohaku-hub-ui/src/components/repo/preview/FilePreviewDialog.vue — four remediation-specific error states (gated, forbidden, not-found, upstream-unavailable) plus the existing CORS hint and the existing generic fallback. When sources[] is present it collapses into a <details> table for diagnostic. The gated state recommends "attach an access token ... in account settings" rather than anything CORS-flavored — the two remediations are different and must not cross-contaminate.

Tests

Backend (test/kohakuhub/api/fallback/):

  • test_operations.py — 6 new cases pinning the contract: single 401, single 403, 401 + second-source-success, mixed 401/404/503/timeout with priority resolution, all-404, all-5xx/timeout. Two pre-existing tests that pinned the old short-circuit behavior (stops_on_non_retryable_..., returns_none_after_generic_...) rewritten to match the new aggregating semantics.
  • test_hf_hub_interop.py — 3 new HF-compat tests. Feeds the aggregated response into real huggingface_hub.utils._http.hf_raise_for_status and asserts: 401 → GatedRepoError, all-404 → EntryNotFoundError, all-5xx → generic HfHubHTTPError (explicitly NOT any of the subclasses that would signal a wrong remediation). Pins the HF-alignment invariant against future drift.

Frontend (test/kohaku-hub-ui/):

  • utils/test_safetensors.test.js — 2 new cases for SafetensorsFetchError.fromResponse: JSON body populates errorCode / sources / detail; non-JSON falls back to X-Error-Message without crashing.
  • components/test_file_preview_dialog.test.js — 3 new cases for the gated / not-found / upstream-unavailable copy; asserts gated-copy doesn't bleed CORS copy.

Verified live

Against the real gated repo through the patched local backend:

  • HEAD /models/animetimm/mobilenetv3_large_150d.dbv4-full/resolve/main/model.safetensors returns 401 + X-Error-Code: GatedRepo + aggregated JSON body (was: 404 + RepoNotFound).
  • huggingface_hub.HfApi.parse_safetensors_file_metadata against the patched endpoint raises GatedRepoError with the upstream message echoed into the exception text — the HF-native behavior users already handle.

Test plan

  • pytest test/kohakuhub/api/fallback/84 / 84 pass (was 78 on main, +6 new cases, 2 rewritten)
  • pytest test/kohakuhub/api/test_files.py7 / 7 pass (no regression)
  • npm test180 / 180 pass (+3 new cases on top of the 177 in feat: pure-client safetensors/parquet metadata preview (#27) #28)
  • npm run build passes
  • Live repro resolved: gated-repo resolve now 401 + GatedRepo + structured body; huggingface_hub raises GatedRepoError.
  • CI green (waiting)

Not in scope

  • Extending the aggregate-error contract to tree / info / paths_info / avatar / user-repos fallback operations. Reasonable follow-up, but none of them currently surface the gated-vs-missing UX confusion (HF's /api/models/*/tree is public for gated repos), so the fix stays focused on resolve.
  • Globally-configured admin HF token for anonymous users. That's a deploy-policy decision the hub operator takes on their own (legal/licensing risk self-owned) — the default remains "each user brings their own token", which this PR's UI copy explicitly guides.

🤖 Generated with Claude Code

Before this change, whenever a fallback source returned a non-retryable
4xx (typically 401 for a gated HuggingFace repo), ``try_fallback_resolve``
called ``should_retry_source`` which correctly said "do not retry"
but then returned ``None`` — dropping the status code, the upstream
body, and any chance a later source might serve the same artifact.
``with_repo_fallback`` interpreted ``None`` as a fallback miss and
re-raised the original local 404 (``RepoNotFound``), so the browser
saw a misleading "repository not found" for a file whose repo it had
just successfully listed via the fallback tree endpoint.

Reproduced live against ``animetimm/mobilenetv3_large_150d.dbv4-full``
(a gated model) while developing PR#28: tree browsing works, but
download + client-side safetensors preview both died with an unhelpful
404.

### Backend

* ``try_fallback_resolve`` now records every non-success source probe
  (HEAD or GET) as a ``build_fallback_attempt`` dict and keeps iterating.
  A 401 on source 1 no longer short-circuits the chain; a downstream
  mirror that does not gate the same artifact (or that simply has a
  file the first source lacks) still gets a chance to serve the
  request. ``httpx.TimeoutException`` and other transport exceptions
  are recorded the same way instead of being swallowed in an
  ``except``.
* ``build_aggregate_failure_response`` (new, in ``fallback/utils.py``)
  combines per-source attempts into one ``JSONResponse``:
    - HTTP status priority: ``401 > 403 > 404 > 502``. The rationale
      is user-actionability — an auth failure is the most specific
      next step ("attach a token"), 403 is next, a real "not found"
      after that, and 5xx / timeout / network collapse to 502 Bad
      Gateway.
    - ``X-Error-Code`` aligns with ``huggingface_hub.utils._http.
      hf_raise_for_status``: 401 → ``GatedRepo``, all-404 →
      ``EntryNotFound``. 403 and 502 get no specific code (HF's
      generic path is correct there). This is deliberate: inventing
      new codes would downgrade ``hf_hub_download`` on a gated repo
      to ``RepositoryNotFoundError`` and lose the ``GatedRepoError``
      exception users actually catch.
    - Body keeps an HF-style ``{error, detail}`` shape plus a
      ``sources[]`` array listing every attempt with ``{name, url,
      status|null, category, message}``, so a curl / CLI user can see
      the full timeline at a glance.
* ``should_retry_source`` is no longer used for flow control (the
  loop now records every failure regardless). Kept as-is to avoid
  touching its other callers.

### Frontend

* ``SafetensorsFetchError`` carries ``errorCode`` / ``sources`` /
  ``detail`` parsed from the aggregated body + ``X-Error-*`` headers.
  ``fromResponse()`` does a defensive JSON parse so plain-text 4xx
  bodies (e.g. raw HF "Access restricted ...") still produce a
  reasonable message.
* ``FilePreviewDialog`` now renders four remediation-specific error
  states based on the classification: ``gated`` ("Attach an access
  token ..."), ``forbidden``, ``not-found``, ``upstream-unavailable``,
  and a fallback generic. The existing CORS hint still fires for the
  ``TypeError`` / "Failed to fetch" failure mode. When the error
  carries a ``sources[]`` list, it collapses into a ``<details>``
  table for diagnostic use.

### Tests

Backend (``test/kohakuhub/api/fallback/``):

* 6 new cases in ``test_operations.py`` pinning the new contract:
  single 401 → aggregate 401/``GatedRepo``; single 403 → 403; 401 +
  next-source success (continues past 401!); mixed 401/404/503/timeout
  across four sources (priority yields 401); all-404 → 404/
  ``EntryNotFound``; all-5xx/timeout → 502 with no code.
* Two pre-existing tests that pinned the old short-circuit behavior
  (``stops_on_non_retryable_status_and_handles_timeouts``,
  ``returns_none_after_generic_source_failures``) have been rewritten
  to match the new aggregating semantics.
* 3 new HF-interop tests in ``test_hf_hub_interop.py`` feed the
  aggregated response into real ``huggingface_hub.utils._http.
  hf_raise_for_status`` and assert: aggregate 401 raises
  ``GatedRepoError``, aggregate 404 raises ``EntryNotFoundError``,
  aggregate 502 raises generic ``HfHubHTTPError`` (and *not* any of
  the subclasses that would signal a wrong remediation). These pin
  the HF-alignment invariant — a future refactor that drops
  ``X-Error-Code`` will fail these tests immediately.

Frontend (``test/kohaku-hub-ui/``):

* 2 new cases in ``test_safetensors.test.js`` for
  ``SafetensorsFetchError.fromResponse``: aggregated JSON body
  populates ``errorCode`` / ``sources`` / ``detail`` from the body +
  ``X-Error-Code`` header; non-JSON bodies fall back to
  ``X-Error-Message`` header without crashing.
* 3 new cases in ``test_file_preview_dialog.test.js`` rendering the
  three new error states (``gated``, ``not-found``,
  ``upstream-unavailable``) and asserting that gated-copy specifically
  never cross-contaminates with the CORS copy.

### Verified live

Against the real gated repo through the patched local backend:

* ``HEAD /models/animetimm/mobilenetv3_large_150d.dbv4-full/resolve/
  main/model.safetensors`` now returns 401 + ``X-Error-Code:
  GatedRepo`` + aggregated JSON body (was: 404 + ``RepoNotFound``).
* ``huggingface_hub.HfApi.parse_safetensors_file_metadata`` against
  the patched endpoint raises ``GatedRepoError`` with the upstream
  message echoed into the exception text — the HF-native behavior
  users already handle.

### Not in scope

Other fallback operations (``try_fallback_tree``, ``try_fallback_info``,
``try_fallback_paths_info``, avatars, user-repos aggregation) still use
the old "return None on any failure" pattern. They rarely surface
gated-vs-missing confusion in practice (the tree endpoint is public
for HF gated repos, which is why browsing worked while downloading
broke), so this change stays focused on resolve. Extending the
aggregation contract to the other operations is a reasonable
follow-up.

Builds on top of PR#28 (pure-client safetensors / parquet preview).
Target that branch so the UI error copy + the backend classification
land together.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@codecov
Copy link
Copy Markdown

codecov Bot commented Apr 24, 2026

Codecov Report

❌ Patch coverage is 96.80851% with 6 lines in your changes missing coverage. Please review.
✅ Project coverage is 94.18%. Comparing base (b8f7a73) to head (007e855).
⚠️ Report is 7 commits behind head on feat/client-side-preview.

Files with missing lines Patch % Lines
.../src/components/repo/preview/FilePreviewDialog.vue 92.40% 5 Missing and 1 partial ⚠️
Additional details and impacted files
@@                     Coverage Diff                      @@
##           feat/client-side-preview      #29      +/-   ##
============================================================
+ Coverage                     94.12%   94.18%   +0.05%     
============================================================
  Files                           119      119              
  Lines                         15550    15717     +167     
  Branches                        902      932      +30     
============================================================
+ Hits                          14637    14803     +166     
  Misses                          906      906              
- Partials                          7        8       +1     
Flag Coverage Δ
backend 95.48% <100.00%> (+0.03%) ⬆️
frontend 91.50% <95.04%> (+0.13%) ⬆️
frontend-admin 99.30% <ø> (ø)
hf0.20.3 95.48% <100.00%> (+0.03%) ⬆️
hf0.30.2 95.48% <100.00%> (+0.03%) ⬆️
hf0.36.2 95.48% <100.00%> (+0.03%) ⬆️
hf1.0.1 95.48% <100.00%> (+0.03%) ⬆️
hf1.6.0 95.48% <100.00%> (+0.03%) ⬆️
hflatest 95.48% <100.00%> (+0.03%) ⬆️
py3.10 95.48% <100.00%> (+0.03%) ⬆️
py3.11 95.46% <100.00%> (+0.03%) ⬆️
py3.12 95.46% <100.00%> (+0.03%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

narugo1992 and others added 5 commits April 24, 2026 15:27
codecov/patch flagged 93.16% vs 94.12% target on the previous commit
— the ~1% miss was the defensive branches in `build_fallback_attempt`
and `build_aggregate_failure_response` that the integration tests in
`test_operations.py` don't naturally exercise (odd status codes, the
contract-violation fallback, the oversize-message cap, empty attempts
list).

Add 7 targeted unit tests in `test/kohakuhub/api/fallback/test_utils.py`:

- `_categorize_status` maps each of 401 / 403 / 404 / 410 / 503 to the
  right category.
- An unclassifiable status (418) lands in `CATEGORY_OTHER` so the
  aggregate shape stays consistent.
- Contract-violation call (no response / timeout / network supplied)
  returns a safe default rather than throwing.
- Oversized upstream bodies get truncated under the per-attempt cap.
- `timeout=...` and `network=...` paths record the right categories
  and surface the exception message.
- Empty-attempts aggregate is a well-formed 502 rather than a KeyError
  or a misleading 401.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
codecov/patch was still red at 93.16% after the previous unit-test
fill. The residual miss was a defensive four-line `return None` after
the source loop that cannot be reached by construction: the
`if not sources: return None` short-circuit at the top of the function
already handles the empty-sources case, and every branch of the loop
body either returns on success or `attempts.append(...)`s before
`continue`, so the attempts list is always non-empty once the loop
exits.

Replace the `if attempts:` guard + dead fallback with a direct call to
`build_aggregate_failure_response(attempts)` and a comment recording
the invariant, so a future change that adds an early `continue`
without recording an attempt will fail its unit tests (the aggregated
response will then claim "no fallback sources" with an empty sources
list) rather than silently returning None again.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The 3 new HF-compat tests (pattern_D_all_401/404/5xx_raises_...)
feed `httpx.Response` objects directly into
`huggingface_hub.utils._http.hf_raise_for_status`. That function
migrated from `requests` to `httpx` in hf_hub 1.0:

- hf_hub >= 1.0: catches `httpx.HTTPStatusError`, reads X-Error-Code,
  raises the right subclass (GatedRepoError / EntryNotFoundError / ...)
- hf_hub <  1.0: catches `requests.HTTPError` only; our
  httpx.HTTPStatusError escapes the classification path entirely and
  the test sees a bare `httpx.HTTPStatusError` instead of the HF
  subclass it is pinning.

The matrix cells that failed (python_version, hf_hub):
  (3.10, 0.20.3) / (3.10, 0.30.2) / (3.10, 0.36.2) /
  (3.11, 0.20.3) / (3.11, 0.30.2) / (3.11, 0.36.2) / ...

Guard the 3 tests with a version-based `pytest.skipif` matching the
existing pattern for `HAS_XET` higher up in the file (Xet was also a
1.0+ feature). The aggregate-response contract is still upheld on
every cell — it's just that the HF-classification assertion is only
observable on cells where hf_hub speaks httpx. The other 7 tests
in this file remain matrix-wide green, including the pre-existing
pattern A/B/C that use a different assertion surface that works
across the version range.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Supersedes the `skipif(not _HF_SUPPORTS_HTTPX_CLASSIFICATION)` that
the previous commit added. Skipping the three pattern-D tests on
older cells left the HF-compat invariant unverified on exactly the
matrix cells that speak a different Response type — precisely the
cells most likely to silently drift.

The classification logic in `hf_raise_for_status` (X-Error-Code →
GatedRepoError / EntryNotFoundError / generic HfHubHTTPError) is
identical across 0.20.x / 0.30.x / 0.36.x / 1.0.x / 1.6.x / latest.
Only the Response TYPE it accepts differs (requests before 1.0, httpx
after). Build the right type per installed version and assert the
classification everywhere.

Changes:

- New helper `_to_hf_response(resp, request_url=...)` which rehydrates
  the FastAPI aggregate response as either `httpx.Response` (hf_hub
  >= 1.0) or `requests.Response` (< 1.0). Only the three pattern-D
  tests use it; the existing A/B/C tests keep using `_to_httpx`
  because they only read `.status_code` / `.headers`, which exist on
  both shapes.

- New helper `_hf_error(name)` for looking up exception classes
  across the different public-module layouts:
    0.20.x        → huggingface_hub.utils.<Name>
    0.30.x+       → huggingface_hub.errors.<Name> (re-exported from utils)
  `huggingface_hub` (top-level) does NOT export the error classes in
  any version, which is why the original `from huggingface_hub.errors
  import ...` failed on 0.20.3 with `ModuleNotFoundError`.

- Swap `from huggingface_hub.utils._http import hf_raise_for_status`
  (private path, restructured in 1.0) to `from huggingface_hub.utils
  import hf_raise_for_status` (stable public re-export, present in
  every version).

Locally verified against:
  hf_hub 0.20.3 (oldest matrix cell, requests branch) — 10/10 pass
  hf_hub 1.11.0 (latest, httpx branch)                 — 10/10 pass

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two small coverage-quality cleanups that together bring `safetensors.js`
to 100% / 100% / 100% / 100% (statements / branches / functions /
lines), which in turn closes the 0.02% project-coverage regression
codecov was flagging.

- Remove the unreachable `!Number.isFinite(headerLen) || headerLen < 0`
  guard in `parseSafetensorsMetadata`. `DataView.getBigUint64` always
  returns a non-negative BigInt in `[0, 2^64 - 1]`, which `Number()`
  always maps to a finite non-negative float. No real input can hit
  that branch; the `MAX_HEADER_LENGTH` check below is the only upper
  bound that matters.

- Cover three previously untested branches in `SafetensorsFetchError.
  fromResponse` and the tensor-parse fallback:
    * tensor entry without `data_offsets` → parser defaults to [0, 0]
      instead of crashing.
    * aggregate error body with no `X-Error-Code` header → errorCode
      still populated from `body.error`.
    * aggregate error body whose `sources` is not an array → the
      parser refuses to attach a non-list and leaves `sources=null`.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@narugo1992 narugo1992 merged commit 5f9f17f into feat/client-side-preview Apr 24, 2026
22 checks passed
@narugo1992 narugo1992 deleted the fix/fallback-error-classification branch April 24, 2026 08:23
narugo1992 added a commit that referenced this pull request Apr 24, 2026
codecov/patch still flagged 92.62% vs 94.21% after the
probeUrlAndClassify / downloadToastFor extraction — the residual
gap was the tree-error and README-error catch paths in
RepoViewer.vue that land on the aggregated 401 / 404 / 502
responses from #29 + #30's backend. Those paths had no direct
tests; the existing test_repo_viewer_paths.test.js only drove
the happy-path + an empty-tree case.

Add two cases to the existing mount harness:

- Tree root fetch returns the aggregated-gated body (401 +
  X-Error-Code=GatedRepo + sources[]). Assertions: "Authentication
  required" copy renders via the shared ErrorState, the
  "Fallback sources tried (N)" disclosure shows the source count,
  and the misleading empty-tree "No files" copy is absent.
- README resolve returns the same aggregated-gated body. Assertion:
  the card tab shows ErrorState where the README would otherwise
  sit, and the misleading "No README.md found" placeholder is
  NOT shown (that placeholder implies the repo has no README when
  the reality is just that we couldn't read it).

Both tests use the existing MSW-driven `installBaseHandlers` +
`mountViewer` scaffold from the file; no new test infrastructure.

Frontend suite: 235 → 237 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
narugo1992 added a commit that referenced this pull request Apr 24, 2026
* feat: uniform classified-error UX across file-facing surfaces

Extends the aggregated-fallback contract that landed in #29 beyond
the safetensors/parquet preview dialog, so blob pages, edit pages,
RepoViewer (README + tree), and the blob-page download button all
surface the same HF-compatible classification (gated / not-found /
upstream-unavailable / cors / generic) — no more misleading
"File Not Found" when the real cause was an upstream gate.

## Shared plumbing (new)

- ``src/kohaku-hub-ui/src/utils/http-errors.js``. Central
  classification with two entry points:
    * ``classifyResponse(response)`` — reads a ``fetch`` Response's
      ``X-Error-Code`` / ``X-Error-Message`` headers plus the
      aggregate ``{error, detail, sources[]}`` body, returns
      ``{kind, status, errorCode, detail, sources}``.
    * ``classifyError(err)`` — does the same for an AxiosError /
      ``TypeError`` / SafetensorsFetchError / bare Error. Preserves
      the CORS heuristic (TypeError "Failed to fetch") previously
      inlined in ``FilePreviewDialog.isLikelyCorsError``.
    * Per-kind default copy via ``defaultCopyFor(kind)``, used by
      ErrorState below and the blob download toast.
- ``src/kohaku-hub-ui/src/components/common/ErrorState.vue``. Shared
  panel rendering the classified state — mode=``full-page`` for
  routes, ``inline-panel`` for in-place (README slot, tree slot).
  Renders the ``sources[]`` disclosure from the aggregate body when
  present. Optional ``retry`` callback + ``actions`` slot for
  caller-specific CTAs (e.g. "Open account settings" for gated).
- Axios response interceptor in ``utils/api.js`` populates
  ``err.classification`` so callers can ``catch (err) { render(err.classification) }``
  without re-parsing X-Error-Code every time.

## Consumers (re-wired)

- ``FilePreviewDialog.vue`` — error-state template replaced by
  ``<ErrorState>``; removed the inline ``errorKind`` / ``errorCorsLikely``
  / sources-table logic (300+ lines → 6). Preview-scoped title
  override keeps "File header not found" for per-file misses.
- ``pages/.../blob/[branch]/[...file].vue`` —
    * ``<ErrorState mode="full-page">`` replaces the generic
      "File Not Found" block.
    * A retrieved 404 with no ``X-Error-Code`` still retries (commit
      might be processing), but a 404 **with** a specific code
      short-circuits out of the retry loop.
    * ``downloadFile()`` now probes ``GET``-with-``Range: bytes=0-0``
      before handing the browser off via ``window.open``. On failure
      classifies the response and surfaces an ``ElMessage`` toast
      with a kind-specific CTA — no more raw JSON blob dumped into
      a new browser tab on a gated repo (the original symptom).
- ``pages/.../edit/[branch]/[...file].vue`` — same ``ErrorState``
  treatment; additionally disables the Commit dialog when the file
  never loaded, preventing accidental "overwrite with empty content".
- ``components/repo/RepoViewer.vue`` —
    * ``loadReadme`` now classifies non-2xx responses and shows
      ``<ErrorState mode="inline-panel">`` in the README slot instead
      of the misleading "No README.md found" placeholder.
    * ``loadFileTree`` no longer silently swallows axios failures
      into an empty grid; classification drives an inline ErrorState
      in the file-list slot.

## Backend: extend aggregate contract to tree / info / paths-info

Mirrors #29's ``try_fallback_resolve`` rewrite for the three
remaining fallback operations in
``src/kohakuhub/api/fallback/operations.py``. Every non-2xx source
probe becomes an attempt, the loop keeps going, and if nothing
succeeds the function returns ``build_aggregate_failure_response(attempts)``
instead of ``None``.

One nuance encoded into the helper: ``build_aggregate_failure_response``
now takes ``scope="repo"`` (for info / tree) or ``scope="file"`` (for
resolve / paths-info). Repo-scope all-404 maps to
``RepoNotFound`` → ``RepositoryNotFoundError`` on the hf_hub client;
file-scope all-404 stays ``EntryNotFound`` → ``EntryNotFoundError``.
That matches HF's own semantics — the whole repo missing vs. one
file in an existing repo missing.

``should_retry_source`` is no longer referenced from any of the
four fallback operations; it's kept intact for now to avoid
touching callers outside this diff.

## Tests

**Frontend** (``226 → 263`` total tests after this change):

- ``test/kohaku-hub-ui/utils/test_http_errors.test.js`` — 19 cases
  across ``classifyResponse`` / ``classifyError`` / ``defaultCopyFor``:
  null input, explicit ``X-Error-Code=GatedRepo``, bare 401 without
  code, 403, 404/410 plus EntryNotFound/RepoNotFound/RevisionNotFound,
  5xx, aggregated body, body.error fallback, non-array sources,
  non-JSON bodies, axios-shaped responses, AxiosError in interceptor,
  SafetensorsFetchError, TypeError/Failed-to-fetch as cors, AbortError
  swallowed, plain Error fallback, distinct copy per kind.
- ``test/kohaku-hub-ui/components/test_error_state.test.js`` — 18
  cases: per-kind copy, title/hint overrides, detail row visibility,
  default Retry button wiring, Retry absent when no callback,
  sources[] disclosure, disclosure omitted on empty/invalid,
  custom actions slot replaces default, full-page vs inline mode,
  per-kind icon class.
- Dialog test (``test_file_preview_dialog.test.js``) — updated to
  the new shared copy (e.g. "Request failed" ⇌ old "Preview failed",
  "attach a Hugging Face token" ⇌ old "Attach an access token"),
  all 18 original cases stay green.

**Backend** (``91 → 94`` fallback tests):

- ``test_operations.py`` — the pre-existing ``cached_and_failure_paths``
  test was updated to assert the new aggregated-response shape
  (``info`` + ``tree`` → repo-level codes; ``paths-info`` → per-file).
- ``test_hf_hub_interop.py`` — 3 new pattern-E tests driving the
  aggregate response through ``hf_raise_for_status``:
    * tree all-404 → ``RepositoryNotFoundError`` (not EntryNotFoundError)
    * info any-401 → ``GatedRepoError`` via ``scope="repo"``
    * paths-info all-404 → ``EntryNotFoundError``
  Runs across the full matrix (hf_hub 0.20.3 → latest) via the
  version-aware ``_to_hf_response`` helper from #29.

## Live verification

Against ``animetimm/mobilenetv3_large_150d.dbv4-full`` (gated HF
model reached via the fallback source):

- Blob page → Download button triggers
  ``"This repository is gated. Attach a Hugging Face token in
  Settings → Tokens, then retry."`` toast instead of opening a
  new tab with the raw aggregated-JSON blob.
- Preview dialog on the same file still works (unchanged UX,
  internally now uses the shared ErrorState).
- README on that repo happens to be public on HF — the card tab
  correctly renders it via fallback without triggering the error
  path. When README itself is gated (tested synthetically in unit
  tests), ErrorState displaces the "No README.md found" placeholder.

## Non-goals (not in this PR)

- No global 404 / 500 route page. Per-page inline errors carry
  more context and match HF's own UX.
- No auto-retry on 401/403; they are not transient.
- No new ``X-Error-Code`` emission on endpoints that don't already
  carry it — backend is already uniform across surfaces that matter.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: extract probeUrlAndClassify + downloadToastFor helpers and cover them

codecov/patch flagged 92.51% vs 94.21% target on the initial push of
PR #30 — the gap was the new download-button probe logic and the
partial-source-dict fallbacks in ErrorState, both of which lived
inline in their consumers with no direct tests.

Extract the two cleanly-isolable bits into `utils/http-errors.js`:

- `probeUrlAndClassify(url, fetchImpl?)` — the `GET Range: bytes=0-0`
  pre-flight the blob-page Download button used inline. Now a
  testable async helper that returns `{ok, classification}` and
  accepts an injectable `fetchImpl` for the tests to drive.
- `downloadToastFor(classification)` + `DOWNLOAD_TOAST_HINTS` — the
  per-kind toast copy the blob page inlined in a dict literal. Now
  a frozen enum-keyed map so adding a new `ERROR_KIND` forces a
  matching toast entry rather than silently falling through to the
  generic hint.

Blob page `downloadFile()` reduced from ~40 lines of inline probe +
switch to 8 lines calling the two helpers — identical behavior,
fully unit-tested.

Tests (7 new cases in `test_http_errors.test.js` + 1 in
`test_error_state.test.js`):

- `probeUrlAndClassify` ok-path, 401 → gated, transport error →
  cors, and `globalThis.fetch` default when no impl is injected.
- `downloadToastFor` returns a distinct non-empty message for every
  ERROR_KIND and falls back to GENERIC on null / unknown kind.
- `classifyError` detail-falls-to-message branch (err.detail
  missing, err.message populated).
- `ErrorState.vue` renders partial source dicts (missing name /
  status / url / category / message) via the `??` / `typeof`
  fallbacks — regression guard so a future backend that adds / drops
  source fields can't render "undefined" in the disclosure table.

Coverage after this commit:
  http-errors.js   100% lines / 100% funcs (89.71% branches; the
                   only gaps are defensive pre-existing branches
                   outside the new helpers)
  ErrorState.vue   100% / 100% / 100% / 100%

Frontend suite: 226 → 235 passing (+9 new cases this commit).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: cover RepoViewer tree + README error classification branches

codecov/patch still flagged 92.62% vs 94.21% after the
probeUrlAndClassify / downloadToastFor extraction — the residual
gap was the tree-error and README-error catch paths in
RepoViewer.vue that land on the aggregated 401 / 404 / 502
responses from #29 + #30's backend. Those paths had no direct
tests; the existing test_repo_viewer_paths.test.js only drove
the happy-path + an empty-tree case.

Add two cases to the existing mount harness:

- Tree root fetch returns the aggregated-gated body (401 +
  X-Error-Code=GatedRepo + sources[]). Assertions: "Authentication
  required" copy renders via the shared ErrorState, the
  "Fallback sources tried (N)" disclosure shows the source count,
  and the misleading empty-tree "No files" copy is absent.
- README resolve returns the same aggregated-gated body. Assertion:
  the card tab shows ErrorState where the README would otherwise
  sit, and the misleading "No README.md found" placeholder is
  NOT shown (that placeholder implies the repo has no README when
  the reality is just that we couldn't read it).

Both tests use the existing MSW-driven `installBaseHandlers` +
`mountViewer` scaffold from the file; no new test infrastructure.

Frontend suite: 235 → 237 passing.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Classify bare 401 (no GatedRepo code) as RepoNotFound, not gated

HuggingFace returns 401 WITHOUT X-Error-Code for non-existent repos
as anti-enumeration; huggingface_hub's hf_raise_for_status maps that
exact shape to RepositoryNotFoundError (see the "401 is misleading"
comment in utils/_http.py). The fallback aggregator and the SPA were
both treating bare 401 as gated, so a typo'd repo URL showed the
"log in with a token" affordance instead of "not found".

This aligns with hf_hub's own heuristic in two places:

- `_categorize_status(status, error_code, ...)` now inspects the
  upstream X-Error-Code header. 401 with `GatedRepo` stays `auth`;
  bare 401 becomes `not-found`.
- `build_aggregate_failure_response` escalates bare-401 attempts to
  `X-Error-Code: RepoNotFound` even on `scope="file"` ops, so the
  client raises `RepositoryNotFoundError` instead of the wrong
  `EntryNotFoundError`. `build_fallback_attempt` persists the
  upstream error code on each attempt for that decision.

Frontend `classifyStatus` follows suit: bare 401 → NOT_FOUND.

Test updates:
- Existing tests that queued bare 401 expecting the auth path now
  pass `X-Error-Code: GatedRepo` on the mock response to stay on
  the genuine-gated path.
- New unit tests cover _categorize_status's header branch,
  build_fallback_attempt's error_code persistence, and
  build_aggregate_failure_response's repo-miss escalation.
- New Pattern F hf_hub interop tests assert that bare 401 on
  resolve and info raises `RepositoryNotFoundError`, matching
  hf_hub's native behavior against a missing repo.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* test: cover should_retry_source default 3xx fallthrough

Adds a 302 case to the parametrize so the final `return False`
branch in `should_retry_source` is exercised. Brings
fallback/utils.py to 100% line coverage and nudges the
codecov/project ratio back above the previous baseline.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: narugo1992 <narugo1992@users.noreply.github.com>
Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant