diff --git a/docs/adr/2026.03.25-configuration-file-design.md b/docs/adr/2026.03.25-configuration-file-design.md new file mode 100644 index 0000000..1370617 --- /dev/null +++ b/docs/adr/2026.03.25-configuration-file-design.md @@ -0,0 +1,180 @@ +# Configuration File Design + +* Status: `proposed` +* Deciders: `checkup team` +* Proposal date: 25/03/2026 +* Decision date: + +## Context and problem statement + +The [push-based architecture](./2026.03.25-push-based-architecture.md) introduces a CLI that reads configuration from YAML files. We need to design the configuration file format, supporting both single-repo and monorepo layouts, with a good developer experience. + +Key requirements: +- Central teams should be able to define metrics for all products +- Developers should be able to extend or override inherited config +- Configuration should be easy to write and validate +- Support both single-repo and monorepo setups + +## Considered options + +1. **Single config file with all settings**: One `checkup.yaml` per project with all configuration. + +2. **Hierarchical config with inheritance**: Config files can inherit from parent directories, allowing shared config at monorepo root. + +3. **Central config registry**: All config stored centrally, fetched by CLI at runtime. + +## Chosen option + +We are choosing **option 2 (hierarchical config with inheritance)** because: + +- Supports both single-repo and monorepo setups with the same mechanism +- Central team can define shared config at monorepo root, or via project templates +- Developers can extend or override at project level +- Configuration lives in version control alongside the code + +**Option 1 not chosen** because it doesn't support monorepos well, you'd need to duplicate shared configuration across every project. + +**Option 3 not chosen** because it adds operational complexity and requires the CLI to fetch config from a service. + +## Consequences + +With the chosen option, we see the following consequences requiring extra effort: + +1. **Config resolution complexity**: Need to implement directory tree walking and config merging with clear precedence rules. + +2. **JSON Schema generation**: To provide intellisense and autocompletion in `checkup.yaml` files, we will generate a JSON Schema. Users can reference this schema in their IDE (e.g., via YAML Language Server). + +3. **Interactive config generation**: To provide an alternative user experience, we will provide `checkup init` and/or `checkup config` for an interactive form-based CLI setup. + +## More information + +### Configuration file schema + +All fields are optional. The CLI merges configuration from parent directories (child overrides parent). + +```yaml +# checkup.yaml + +tags: + key: value + +providers: + - provider_name: + config_key: config_value + - simple_provider + +metrics: + - metric_name + - metric_name_with_config: + config_key: config_value + +materializer: + type: console | csv | html | sqlalchemy + # ... materializer-specific config +``` + +### Single-repo layout + +``` +my-data-product/ +├── checkup.yaml # all config in one file +├── dbt/ +│ └── ... +└── ... +``` + +```yaml +# checkup.yaml +tags: + ... + +providers: + - dbt: + project_dir: ./dbt + ... + +metrics: + ... + +materializer: + ... +``` + +### Monorepo layout + +``` +monorepo/ +├── checkup.yaml # shared: metrics + materializer +├── products/ +│ ├── product-a/ +│ │ ├── checkup.yaml # project-specific: tags + providers +│ │ └── dbt/ +│ └── product-b/ +│ ├── checkup.yaml +│ └── dbt/ +``` + +Root config (shared): +```yaml +# monorepo/checkup.yaml +metrics: + ... + +materializer: + ... +``` + +Project config (specific): +```yaml +# monorepo/products/product-a/checkup.yaml +tags: + ... + +providers: + - dbt: + project_dir: . + ... +``` + +### Config resolution + +1. Find `checkup.yaml` in current directory +2. Walk up directory tree, collecting parent configs +3. Merge configs (child overrides parent) +4. Apply CLI flag overrides + +Precedence: CLI flags > project config > parent config + +### JSON Schema for IDE support + +Generate a JSON Schema from the config model to enable IDE intellisense: + +```yaml +# yaml-language-server: $schema=https://checkup.example.com/schema.json + +providers: + - provider_name: + config_key: config_value # IDE provides autocomplete +``` + +The schema includes all valid provider names, metric names, and their config options based on installed plugins. + +### Interactive config generation (`checkup init` and `checkup config`) + +**`checkup init`** - Create a new config file: +1. Select providers from installed plugins (dbt, git, python, etc.) +2. Select metrics (filtered to those supported by chosen providers) +3. Configure materializer +4. Generate `checkup.yaml` + +**`checkup config`** - Modify an existing config file: +- Add/remove providers or metrics interactively +- Update materializer settings +- Useful when new plugins are installed or requirements change + +Both commands allow developers to configure checkup without writing YAML manually, while still producing a standard config file they can edit directly if preferred. + +### Related ADRs + +- [Push-based Architecture](./2026.03.25-push-based-architecture.md) +- [Credentials and Secrets](./2026.03.25-credentials-and-secrets.md) diff --git a/docs/adr/2026.03.25-credentials-and-secrets.md b/docs/adr/2026.03.25-credentials-and-secrets.md new file mode 100644 index 0000000..da9eb39 --- /dev/null +++ b/docs/adr/2026.03.25-credentials-and-secrets.md @@ -0,0 +1,97 @@ +# Credentials and Secrets + +* Status: `proposed` +* Deciders: `checkup team` +* Proposal date: 25/03/2026 +* Decision date: + +## Context and problem statement + +The [configuration file design](./2026.03.25-configuration-file-design.md) uses YAML files for checkup configuration. These files often need to reference sensitive values (database URLs, API tokens) that should not be committed to version control. + +We need a mechanism for injecting secrets into configuration at runtime. + +## Considered options + +1. **Explicit substitution with `${VAR}` syntax**: Reference environment variables explicitly in YAML using `${VAR}` syntax. + +2. **Naming convention**: Environment variables matching a naming convention (e.g., `CHECKUP__PROVIDER__DBT__...`) are automatically mapped to config values. + +3. **External secrets manager**: Integrate with secrets managers (Vault, AWS Secrets Manager, etc.) to fetch secrets at runtime. + +## Chosen option + +We are choosing **both option 1 and option 2** because they serve different use cases: + +**Option 1 (`${VAR}` syntax)** is explicit and familiar: +- Clear which values come from environment +- Flexible: any config value can reference any env var +- Familiar from: Docker Compose, GitHub Actions, etc. + +**Option 2 (naming convention)** enables config-free overrides: +- No YAML changes needed: just set env vars +- Familiar from: dlt hub, etc. + +**Option 3 not chosen** for initial implementation because it adds complexity and external dependencies. Can be added later if needed. + +## Consequences + +With the chosen option, we see the following consequences requiring extra effort: + +1. **Two mechanisms to document**: Users need to understand both approaches and when to use each. + +2. **Precedence rules**: Need clear rules for what happens when both are used (`${VAR}` explicit references override naming convention defaults). + +## More information + +### Option 1: Explicit substitution in YAML + +Reference environment variables using `${VAR}` syntax: + +```yaml +materializer: + type: sqlalchemy + connection_url: ${DATABASE_URL} +``` + +### Option 2: Naming convention + +Environment variables matching a naming convention are automatically mapped to config: + +```bash +CHECKUP__MATERIALIZER__SQLALCHEMY__CONNECTION_URL=postgresql://... +CHECKUP__PROVIDER__CONVEYOR__API_TOKEN=xxx +``` + +Structure: `CHECKUP__
____` + +### Precedence + +When both mechanisms are used: + +1. YAML file values are the base +2. Naming convention env vars provide defaults for missing values +3. `${VAR}` explicit substitution wins + +Example: +```yaml +# checkup.yaml +materializer: + type: sqlalchemy + connection_url: ${DATABASE_URL} # explicit reference wins +``` + +```bash +# Environment +DATABASE_URL=postgresql://explicit/db +CHECKUP__MATERIALIZER__SQLALCHEMY__CONNECTION_URL=postgresql://convention/db +``` + +Result: `connection_url` = `postgresql://explicit/db` (`${VAR}` wins) + +The naming convention is useful for providing defaults or overriding values that aren't explicitly referenced in YAML, while `${VAR}` gives precise control when you need it. + +### Related ADRs + +- [Push-based Architecture](./2026.03.25-push-based-architecture.md) +- [Configuration File Design](./2026.03.25-configuration-file-design.md)