Skip to content

Wrap raw TypeError/RuntimeError in GMDCommandError#5

Open
karstenm1987 wants to merge 1 commit intomainfrom
fix/input-validation-error-wrapping
Open

Wrap raw TypeError/RuntimeError in GMDCommandError#5
karstenm1987 wants to merge 1 commit intomainfrom
fix/input-validation-error-wrapping

Conversation

@karstenm1987
Copy link
Copy Markdown

Tracked upstream at Global-Macro-Database-Internal#414. Same filter as the recent Stata and R PRs: fix actual bugs, skip nice-to-haves.

Bugs

Both failure paths leaked internal exception types to users instead of the package's own GMDCommandError.

1. gmd(country=840) leaked TypeError: 'int' object is not iterable

Root cause: _tokens() did for item in value: with no type guard. Any scalar non-string (int, float, etc.) tripped Python's iteration protocol, not the package's own error surface. Same class of leak for variables=840, variables=1.5, etc.

2. get_available_versions() leaked RuntimeError when both mirrors were unreachable

gmd.py:333-344 caught the internal RuntimeError from _fetch_from, checked the on-disk cache and a local .dta fallback, and then bare-raise'd the internal error back to the user.

Fix

Two small changes:

  1. _tokens() now rejects non-str / non-list / non-tuple inputs with GMDCommandError(code=198) and a message naming the actual type.
  2. The terminal raise in get_available_versions() becomes a raise GMDCommandError(...) from exc with a message that tells the user what probably went wrong (no internet / transient outage).

Verified (live)

>>> gmd(country=840)
GMDCommandError: Expected a string or sequence of strings, got int.
>>> gmd(country=840.5)
GMDCommandError: Expected a string or sequence of strings, got float.
>>> gmd(variables=840)
GMDCommandError: Expected a string or sequence of strings, got int.

# Regressions:
>>> gmd(country="USA", variables="rGDP")          # DataFrame
>>> gmd(country=["USA","GBR"], variables="rGDP")  # DataFrame
>>> gmd(country=None, variables="rGDP")           # DataFrame
>>> gmd(country=["USA", 840], variables="rGDP")   # already caught; unchanged behavior

# Offline path (both mirrors failing, no local cache):
>>> get_available_versions()
GMDCommandError: Unable to fetch the list of available GMD versions.
                 Check your internet connection or raise an issue.

# Online sanity:
>>> get_available_versions()                       # list of versions

Out of scope

Other items in #414 are UX/consistency rather than correctness bugs and kept for separate PRs:

  • raw/fast/iso truthy-string coercion (raw="no" still loads raw data) — same class as the Stata fast() UX wart, classified as "nice-to-have" in the Stata review.
  • Whole-file download / no retry-backoff / no checksum / unversioned cache — design-level improvements.
  • start_year / end_year — not implementing per maintainer preference.

Also note: the listed-but-404 versions on S3 (2025_01/03/05/06/08) are a release-pipeline issue and not fixable here.

Two places leaked internal exceptions to users:

1. `_tokens()` used `for item in value:` with no type guard. Passing a
   scalar like `gmd(country=840)` produced
     TypeError: 'int' object is not iterable
   instead of a clean command error. Now `_tokens()` rejects non-str /
   non-list / non-tuple inputs with a `GMDCommandError(code=198)` and
   a message that names the actual type received.

2. `get_available_versions()` re-raised the internal `RuntimeError`
   from `_fetch_from` when both the S3 primary and GitHub fallback
   failed and there was no local cache. Wrap that terminal raise in
   `GMDCommandError` with a human-readable message pointing at the
   likely cause (no internet / transient outage).

Internal issue: KMueller-Lab/Global-Macro-Database-Internal#414

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants