Skip to content

fix: init SQLite schema per DB path, not via global flag (#42)#43

Merged
aborruso merged 2 commits into
mainfrom
fix/db-init-per-provider
Jun 10, 2026
Merged

fix: init SQLite schema per DB path, not via global flag (#42)#43
aborruso merged 2 commits into
mainfrom
fix/db-init-per-provider

Conversation

@aborruso

Copy link
Copy Markdown
Member

Fixes #42.

Problem

set_provider() switching (e.g. eurostatoecd) made the first
all_available() on the new provider crash with
sqlite3.OperationalError: no such table: invalid_datasets.

Root cause: _DB_INITIALIZED was a module-level bool. The first
provider created its schema and set the flag to True; the next provider
opens a different DB file (separate cache dir), but the flag stayed
True, so the schema was never created on the new file.

This is a library bug (reproduced via import opensdmx), not a CLI one.

Fix

Replace the global bool with _INITIALIZED_DBS: set[str] keyed on the DB
path, so every new DB file initializes its own schema. (Option 1 from the
issue.)

Verification

  • New regression test test_schema_init_per_db_path reproducing the
    two-provider scenario — proven to fail with the exact issue error on the
    old code, passes with the fix.
  • Updated the 3 existing tests that referenced the old flag.
  • Full suite: 208 passed, ruff clean.
  • End-to-end on the editable-installed library with a fresh cache dir:
    eurostat then oecd both return cleanly, no OperationalError.

🤖 Generated with Claude Code

The module-level _DB_INITIALIZED bool was set True after the first
provider's schema creation. Switching provider opens a new DB file in a
different cache dir, but the flag stayed True so the schema was never
created — raising "no such table: invalid_datasets".

Track initialized DB paths in a set (_INITIALIZED_DBS) so each new DB
file gets its own schema. Add a cross-provider regression test.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings June 10, 2026 20:16
@greptile-apps

greptile-apps Bot commented Jun 10, 2026

Copy link
Copy Markdown

Greptile Summary

This PR fixes a bug where switching providers (e.g. eurostat → oecd) caused the second provider's first all_available() call to crash with sqlite3.OperationalError: no such table: invalid_datasets. The root cause was a module-level boolean that prevented the schema from being initialized on any DB file opened after the first one.

  • Core fix: _DB_INITIALIZED: bool replaced with _INITIALIZED_DBS: set[str] keyed on the resolved absolute DB path, so each distinct cache file gets its own schema initialization. The CREATE TABLE IF NOT EXISTS approach keeps schema creation idempotent and the not db_existed guard correctly handles the edge case where a file is deleted mid-process.
  • Tests: A new regression test (test_schema_init_per_db_path) reproduces the exact two-provider failure scenario; existing test fixtures and HTTP test helpers are updated to reset the new set instead of the old boolean.

Confidence Score: 5/5

Safe to merge — the change is a targeted, well-tested fix with no regressions.

The fix replaces a single module-level boolean with a per-path set, the schema creation is idempotent (CREATE TABLE IF NOT EXISTS), and the new regression test directly exercises the two-provider failure path from issue #42. All 208 tests pass and no existing behaviour is altered.

No files require special attention.

Important Files Changed

Filename Overview
src/opensdmx/db_cache.py Replaces the module-level boolean _DB_INITIALIZED with a per-path set _INITIALIZED_DBS; schema creation is now keyed on the resolved absolute DB path. Logic is correct and idempotent (CREATE TABLE IF NOT EXISTS). The not db_existed branch correctly handles externally-deleted files in long-lived processes.
tests/test_db_cache.py Adds test_schema_init_per_db_path regression test covering the two-provider scenario from issue #42. Fixture updated to reset the new _INITIALIZED_DBS set. Test is clear and targeted.
tests/test_http.py Two setup helpers updated to monkeypatch _INITIALIZED_DBS instead of the old _DB_INITIALIZED boolean — straightforward rename, no logic change.

Reviews (2): Last reviewed commit: "fix: harden DB schema guard (resolve pat..." | Re-trigger Greptile

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Fixes a SQLite cache initialization bug where switching providers (and therefore DB files) could skip schema creation and crash on first access (no such table: invalid_datasets). The fix tracks schema initialization per DB path and adds a regression test for the two-provider scenario.

Changes:

  • Replace the module-level _DB_INITIALIZED boolean with _INITIALIZED_DBS: set[str] keyed by DB path.
  • Add regression test test_schema_init_per_db_path to ensure schema initializes when _get_db_path() changes.
  • Update existing tests to reset _INITIALIZED_DBS instead of the removed flag.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
src/opensdmx/db_cache.py Track schema initialization per DB path so provider switching initializes each DB’s schema.
tests/test_db_cache.py Add regression test covering schema init across two distinct DB paths.
tests/test_http.py Update fixtures to reset the new per-DB initialization tracking between tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread src/opensdmx/db_cache.py
Comment thread src/opensdmx/db_cache.py Outdated
Comment thread src/opensdmx/db_cache.py Outdated
Address Copilot review on #43:
- key _INITIALIZED_DBS on the resolved absolute path so a relative
  OPENSDMX_CACHE_DIR + CWD change can't alias two different DB files
- re-create the schema when the DB file did not exist before connecting,
  covering cache.db deleted during a long-lived process

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI commented Jun 10, 2026

Copy link
Copy Markdown

@aborruso Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub.

You can ask me to try again later by mentioning me in a new comment.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: 2f0e6f35-0ac2-430e-847d-914402efb01e

Sorry for the inconvenience!

@aborruso aborruso merged commit 3adaeeb into main Jun 10, 2026
11 of 12 checks passed
@aborruso aborruso deleted the fix/db-init-per-provider branch June 10, 2026 20:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bug: _DB_INITIALIZED globale non resettato tra provider — tabella invalid_datasets mancante

3 participants