Skip to content

Decouple ignore_datasets_file cache location from fetch logic #549

@lewisjared

Description

@lewisjared

Problem

_get_default_ignore_datasets_file() in config.py hardcodes the cache directory via platformdirs.user_cache_path("climate_ref") and immediately tries to mkdir it. This fails with Errno 30 (read-only filesystem) in environments where the platformdirs cache path is not writable (e.g. containers, read-only mounts).

The function couples two concerns:

  1. Where the file is stored (cache directory)
  2. Fetching the file (downloading from GitHub)

Proposed Solution

Decouple the storage location from the fetch logic:

  • The default download location should still be platformdirs.user_cache_path("climate_ref"), but it should be overridable (e.g. via an environment variable or config field for the cache directory)
  • The fetch logic should handle a read-only or non-existent cache directory gracefully rather than failing during config loading
  • If the cache directory is not writable, fall back to a bundled/embedded copy or skip the download without crashing

Context

The error surfaces as a key_validation_error during config loading because the mkdir call happens inside a default factory for the ignore_datasets_file field on Config, which runs during cattrs structuring.

key_validation_error containing unknown error (Errno 30) Read-only file system: '/ref/cache'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions