mdnmdn · mdnmdn · Mar 31, 2026 · Mar 31, 2026 · Mar 31, 2026
diff --git a/_docs/error-handling-proposal.md b/_docs/error-handling-proposal.md
@@ -0,0 +1,379 @@
+# Error Handling Proposal
+
+## Overview
+
+This document proposes a unified, ergonomic error handling strategy for `bits`. The current state uses `fmt.Errorf` strings throughout, which loses semantic information, makes programmatic handling impossible, and leaks provider internals inconsistently.
+
+Two goals drive this proposal:
+1. **Callers can act on errors** — distinguish auth failures from rate limits from network blips without string parsing.
+2. **Provider specificity is preserved** — the raw provider code/message is always accessible, never silently dropped.
+
+---
+
+## Part 1 — Error Handling Structures
+
+### 1.1 Error Categories
+
+A single `ErrorKind` type covers all recoverable and unrecoverable conditions across every provider:
+
+```go
+// pkg/model/errors.go
+
+type ErrorKind int
+
+const (
+    ErrKindUnknown          ErrorKind = iota // catch-all
+    ErrKindAuth                              // 401 / 403 / invalid API key
+    ErrKindRateLimit                         // 429 / quota exceeded
+    ErrKindNotFound                          // 404 / symbol not found
+    ErrKindInvalidRequest                    // 400 / bad parameters
+    ErrKindServerError                       // 502/503/504 / provider-side transient failure
+    ErrKindNetwork                           // connection refused, DNS failure
+    ErrKindCanceled                          // context.Canceled or context.DeadlineExceeded
+    ErrKindParse                             // unexpected response shape
+    ErrKindUnsupportedMarket                 // already exists, kept for compat
+    ErrKindUnsupportedFeature                // already exists, kept for compat
+)
+
+// Retryable reports whether errors of this kind are worth retrying with backoff.
+// Callers are responsible for applying exponential backoff; this method only
+// signals intent.
+func (k ErrorKind) Retryable() bool {
+    return k == ErrKindRateLimit || k == ErrKindServerError || k == ErrKindNetwork
+}
+```
+
+**Mapping rules** (applied by each provider's HTTP client):
+
+| Condition | Kind |
+|---|---|
+| HTTP 401, 403 | `ErrKindAuth` |
+| HTTP 429 | `ErrKindRateLimit` |
+| HTTP 404 | `ErrKindNotFound` |
+| HTTP 400 | `ErrKindInvalidRequest` |
+| HTTP 502, 503, 504 | `ErrKindServerError` |
+| HTTP 500 (and other 5xx) | `ErrKindServerError` |
+| `net.Error` (refused, DNS) | `ErrKindNetwork` |
+| `context.Canceled` / `context.DeadlineExceeded` | `ErrKindCanceled` |
+| JSON unmarshal failure | `ErrKindParse` |
+| API-body code ≠ success | kind inferred from code; else `ErrKindUnknown` |
+
+> **Note on 5xx and retryability:** All 5xx are marked `ErrKindServerError` and `Retryable() == true`. In practice, callers should apply exponential backoff and a maximum attempt count. A persistent 500 that never clears is a provider bug; retry logic should give up eventually regardless of this flag.
+
+> **Note on context errors:** `ErrKindCanceled` is intentionally excluded from `Retryable()`. A canceled context means the caller no longer wants the result; retrying would be wrong.
+
+### 1.2 The `ProviderError` Type
+
+One concrete type is used everywhere an error crosses a provider boundary:
+
+```go
+// pkg/model/errors.go
+
+type ProviderError struct {
+    // Normalized — callers use these without knowing which provider.
+    Kind       ErrorKind
+    ProviderID string // "binance", "bitget", … empty for resolution-layer errors
+
+    // Raw — always preserved, never dropped.
+    ProviderCode    string // provider's native error code ("00000", "400", …)
+    ProviderMessage string // provider's native message text
+
+    // Optional HTTP layer information.
+    HTTPStatus int // 0 when not an HTTP error
+
+    // Underlying cause (network error, json error, …).
+    Cause error
+}
+
+func (e *ProviderError) Error() string {
+    if e.ProviderCode != "" {
+        return fmt.Sprintf("[%s] %s (code %s)", e.ProviderID, e.ProviderMessage, e.ProviderCode)
+    }
+    if e.HTTPStatus != 0 {
+        return fmt.Sprintf("[%s] HTTP %d: %s", e.ProviderID, e.HTTPStatus, e.ProviderMessage)
+    }
+    if e.ProviderID != "" {
+        return fmt.Sprintf("[%s] %s", e.ProviderID, e.ProviderMessage)
+    }
+    return e.ProviderMessage
+}
+
+func (e *ProviderError) Unwrap() error { return e.Cause }
+
+// Is implements errors.Is matching by Kind for sentinel comparisons.
+// This ensures errors.Is(err, model.ErrUnsupportedMarket) keeps working even
+// when the error is a freshly constructed *ProviderError rather than the exact
+// sentinel pointer.
+func (e *ProviderError) Is(target error) bool {
+    t, ok := target.(*ProviderError)
+    if !ok {
+        return false
+    }
+    // Match sentinels by Kind when the target has no ProviderID (i.e. is a sentinel).
+    if t.ProviderID == "" {
+        return e.Kind == t.Kind
+    }
+    return e == t
+}
+```
+
+**Usage at the call site:**
+
+```go
+res, err := client.GetPrice(ctx, "BTCUSDT", "binance")
+if err != nil {
+    var pe *model.ProviderError
+    if errors.As(err, &pe) {
+        switch pe.Kind {
+        case model.ErrKindRateLimit:
+            time.Sleep(backoff)
+            // retry
+        case model.ErrKindAuth:
+            // surface config problem to user
+        case model.ErrKindCanceled:
+            return // caller canceled, do not retry
+        }
+        // always available for logging / telemetry:
+        log.Printf("provider=%s code=%s http=%d", pe.ProviderID, pe.ProviderCode, pe.HTTPStatus)
+    }
+}
+```
+
+**Provider-specific code access** is explicit and zero-surprise: `pe.ProviderCode` is the raw string the exchange returned (`"00000"`, `"10001"`, `"429"`, …), and `pe.ProviderMessage` is its description verbatim.
+
+### 1.3 Sentinel Errors (backward-compat)
+
+The two existing sentinels are promoted into `*ProviderError` values. The `Is` method on §1.2 ensures `errors.Is(err, model.ErrUnsupportedMarket)` keeps working for any `*ProviderError` with `Kind == ErrKindUnsupportedMarket`, not just the exact pointer — so wrapped errors and freshly constructed errors both match.
+
+```go
+var (
+    ErrUnsupportedMarket  = &ProviderError{Kind: ErrKindUnsupportedMarket,  ProviderMessage: "unsupported market type"}
+    ErrUnsupportedFeature = &ProviderError{Kind: ErrKindUnsupportedFeature, ProviderMessage: "unsupported feature"}
+)
+```
+
+Existing `errors.Is` call sites require no change. `errors.As` call sites that then inspect `.Kind` also work without change.
+
+### 1.4 `ItemError` — partial failure in batch calls
+
+`ItemError` gains a typed `Err` so JSON serialisation is meaningful and callers can programmatically inspect per-symbol failures.
+
+> **Pre-existing issue fixed by this change:** `ItemError.Err` is currently typed `error` (an interface), which marshals to `{}` in JSON — effectively invisible. Changing to `*ProviderError` fixes serialisation as a side effect.
+
+```go
+// pkg/model/response.go
+
+type ItemError struct {
+    Symbol string         `json:"sym"`
+    Err    *ProviderError `json:"err"` // typed — always a *ProviderError
+}
+```
+
+This is the only breaking change in the public model. It allows:
+
+```go
+for _, ie := range res.Errors {
+    if ie.Err.Kind == model.ErrKindNotFound {
+        // skip missing symbols gracefully
+    }
+}
+```
+
+### 1.5 Provider constructor helpers
+
+Each provider package has a small unexported `errors.go` with construction helpers. The `httpStatusToKind` mapping lives here too — **not** in `pkg/model`, which must remain HTTP-agnostic.
+
+```go
+// pkg/provider/binance/errors.go  (pattern repeated per provider)
+
+func providerErr(kind model.ErrorKind, msg string, cause error) *model.ProviderError {
+    return &model.ProviderError{
+        Kind:            kind,
+        ProviderID:      ProviderID, // "binance"
+        ProviderMessage: msg,
+        Cause:           cause,
+    }
+}
+
+func httpErr(status int, body string) *model.ProviderError {
+    return &model.ProviderError{
+        Kind:            httpStatusToKind(status),
+        ProviderID:      ProviderID,
+        HTTPStatus:      status,
+        ProviderMessage: body,
+    }
+}
+
+func apiErr(code, msg string) *model.ProviderError {
+    return &model.ProviderError{
+        Kind:            apiCodeToKind(code), // provider-specific mapping
+        ProviderID:      ProviderID,
+        ProviderCode:    code,
+        ProviderMessage: msg,
+    }
+}
+
+// httpStatusToKind is local to each provider package (not in pkg/model).
+func httpStatusToKind(status int) model.ErrorKind {
+    switch {
+    case status == 401 || status == 403:
+        return model.ErrKindAuth
+    case status == 404:
+        return model.ErrKindNotFound
+    case status == 429:
+        return model.ErrKindRateLimit
+    case status == 400:
+        return model.ErrKindInvalidRequest
+    case status >= 500:
+        return model.ErrKindServerError
+    default:
+        return model.ErrKindUnknown
+    }
+}
+```
+
+> **CoinGecko note:** CoinGecko's `client.go` uses a `get(ctx, path, &result)` helper that decodes JSON internally and returns a plain `error`. It does not follow the two-step fetch-then-unmarshal pattern of the other five providers. Its `errors.go` helpers still apply, but the integration point is inside `get()` rather than a separate `doRequest` method. No structural change to the client is needed — only the error return sites change.
+
+### 1.6 Migration safety valve — `WrapError`
+
+During the transition, some construction sites (e.g. registry failures in the facade, third-party library errors from `go-binance/v2`) may not yet produce a `*ProviderError`. A single escape-hatch function prevents compile errors and leaves an audit trail:
+
+```go
+// pkg/model/errors.go
+
+// WrapError wraps an arbitrary error as ErrKindUnknown. Use only as a
+// temporary shim during migration; replace with a typed providerErr call.
+func WrapError(providerID string, err error) *ProviderError {
+    if err == nil {
+        return nil
+    }
+    // If already a ProviderError, preserve it.
+    var pe *ProviderError
+    if errors.As(err, &pe) {
+        return pe
+    }
+    return &ProviderError{
+        Kind:            ErrKindUnknown,
+        ProviderID:      providerID,
+        ProviderMessage: err.Error(),
+        Cause:           err,
+    }
+}
+```
+
+### 1.7 What is NOT in scope
+
+- Retry logic — callers use `Retryable()` to decide; `bits` does not retry internally.
+- Circuit breaking — out of scope.
+- Logging inside providers — providers return errors, they do not log them.
+- Wrapping third-party library errors (e.g. `go-binance/v2`) beyond the `Cause` field — the library error becomes `Cause`; `ProviderMessage` is set from its `.Error()` string.
+
+---
+
+## Part 2 — Transition
+
+The transition is additive first, then breaking. Each phase compiles and passes tests independently.
+
+### Phase 1 — Foundation (`pkg/model`)
+
+**Files:** `pkg/model/errors.go`, `pkg/model/response.go`
+
+1. Add `ErrorKind`, `ProviderError` (with `Is` and `Unwrap`), and `WrapError` to `errors.go`.
+2. Re-declare the two sentinel vars as `*ProviderError` values.
+3. **Do not** change `ItemError` yet — leave `Err` typed as `error` to avoid a cascade of compile errors.
+
+Result: new types exist, nothing broken, no providers use them yet.
+
+---
+
+### Phase 2 — Provider HTTP clients (parallel, one PR per provider)
+
+**Files:** `pkg/provider/*/client.go`, `pkg/provider/*/market.go`, `pkg/provider/*/exchange.go`
+
+> Phases 2a–2f are fully independent and can land in any order.
+
+For each provider:
+
+1. Add `pkg/provider/<name>/errors.go` with `providerErr`, `httpErr`, `apiErr`, `httpStatusToKind`, and `apiCodeToKind`.
+2. Update the HTTP execution path to check `resp.StatusCode` on every call and return `httpErr` on non-2xx.
+   - **Bug fix:** Bitget, WhiteBit, and Crypto.com currently do not check the HTTP status code at all — a 500 response body is silently passed to `json.Unmarshal`. This change fixes that behavior; add regression tests.
+3. Wrap `net.Error` and connection errors with `providerErr(ErrKindNetwork, …, err)`.
+4. Wrap `context.Canceled` / `context.DeadlineExceeded` with `providerErr(ErrKindCanceled, …, err)`.
+5. Replace API-envelope error checks (`code != "00000"`, `!resp.Success`, etc.) with `apiErr(code, msg)`.
+6. Use `WrapError` for any site not yet convertible (e.g. third-party library errors).
+
+Provider-specific code→kind mappings:
+
+| Provider | Success signal | Auth codes | Rate-limit codes |
+|---|---|---|---|
+| Bitget | `code == "00000"` | `40001`, `40003` | `429xx` series |
+| Crypto.com | `code == 0` | `40001` | `10006` |
+| Binance | HTTP 200 | HTTP 401/403 | HTTP 429 |
+| CoinGecko | HTTP 200 | HTTP 401/403 | HTTP 429 |
+| MEXC | HTTP 200 | HTTP 401/403 | HTTP 429 |
+| WhiteBit | `success == true` | HTTP 401/403 | HTTP 429 |
+
+Each provider's `errors.go` should include table-driven tests mapping HTTP statuses and API codes to expected `ErrorKind` values.
+
+---
+
+### Phase 3 — `ItemError` migration
+
+> **Gate:** Phase 2 must be complete for all six providers before this phase lands. If partial, use `WrapError` to bridge un-migrated paths before opening this PR.
+
+**Files:** `pkg/model/response.go`, all `ItemError` construction sites
+
+Change `ItemError.Err` from `error` to `*model.ProviderError`. Update every construction site to pass a `*ProviderError`; use `WrapError` for any remaining raw errors. This includes:
+
+- `pkg/resolve/fanout.go` — wraps per-symbol errors into `ItemError`
+- all provider `market.go` files that append to `Response.Errors`
+- `pkg/bits/client.go` — `ComparePrices` stores errors into `ItemError{Err: err}` where `err` comes from `GetPrice`; this must be `WrapError`-guarded until Phase 5
+
+---
+
+### Phase 4 — Resolver
+
+**Files:** `pkg/resolve/resolver.go`
+
+Replace `fmt.Errorf("no provider supports …")` and similar with `*model.ProviderError`. Use `ErrKindUnsupportedFeature` / `ErrKindUnsupportedMarket` with an empty `ProviderID` (the resolution layer has no single provider to blame).
+
+`fanout.go` requires no structural change after Phase 3.
+
+---
+
+### Phase 5 — Public `pkg/bits` facade
+
+**Files:** `pkg/bits/client.go`
+
+1. No signature changes — methods still return `(model.Response[T], error)`.
+2. The "no data returned" path wraps with `model.WrapError("", …)` or a direct `*ProviderError{Kind: ErrKindNotFound}`.
+3. `ComparePrices` / `ComparePricesWithResolution`: replace ad-hoc `ItemError{Err: err}` with `WrapError(pid, err)` to satisfy the typed field.
+4. Add godoc on all public methods stating: *returned errors are always `*model.ProviderError`; use `errors.As` to inspect them.*
+
+---
+
+### Phase 6 — CLI layer (optional)
+
+**Files:** `cmd/*.go`
+
+CLI commands can optionally inspect `*model.ProviderError.Kind` for friendlier messages (e.g. "authentication failed — check your API key in config") without breaking anything. Defer until after Phase 5 stabilises.
+
+---
+
+### Migration checklist
+
+```
+[ ] Phase 1:  pkg/model — ErrorKind, ProviderError (with Is/Unwrap), WrapError, sentinels
+[ ] Phase 2a: pkg/provider/binance   — errors.go + HTTP/API error wrapping + tests
+[ ] Phase 2b: pkg/provider/bitget    — errors.go + HTTP/API error wrapping + tests (bug fix: add HTTP status check)
+[ ] Phase 2c: pkg/provider/coingecko — errors.go + HTTP/API error wrapping + tests (note: get() pattern differs)
+[ ] Phase 2d: pkg/provider/cryptocom — errors.go + HTTP/API error wrapping + tests (bug fix: add HTTP status check)
+[ ] Phase 2e: pkg/provider/mexc      — errors.go + HTTP/API error wrapping + tests
+[ ] Phase 2f: pkg/provider/whitebit  — errors.go + HTTP/API error wrapping + tests (bug fix: add HTTP status check)
+[ ] Phase 3:  ItemError.Err → *ProviderError  (gate: all Phase 2 complete)
+[ ] Phase 4:  resolver — ProviderError for resolution failures
+[ ] Phase 5:  pkg/bits facade — WrapError at remaining sites + godoc
+[ ] Phase 6:  (optional) CLI human-readable error messages
+```
+
+Phases 2a–2f are fully parallel. All other phases are sequential.