Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
180 changes: 180 additions & 0 deletions docs/adr/2026.03.25-configuration-file-design.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,180 @@
# Configuration File Design

* Status: `proposed`
* Deciders: `checkup team`
* Proposal date: 25/03/2026
* Decision date:

## Context and problem statement

The [push-based architecture](./2026.03.25-push-based-architecture.md) introduces a CLI that reads configuration from YAML files. We need to design the configuration file format, supporting both single-repo and monorepo layouts, with a good developer experience.

Key requirements:
- Central teams should be able to define metrics for all products
- Developers should be able to extend or override inherited config
- Configuration should be easy to write and validate
- Support both single-repo and monorepo setups

## Considered options

1. **Single config file with all settings**: One `checkup.yaml` per project with all configuration.

2. **Hierarchical config with inheritance**: Config files can inherit from parent directories, allowing shared config at monorepo root.

3. **Central config registry**: All config stored centrally, fetched by CLI at runtime.

## Chosen option

We are choosing **option 2 (hierarchical config with inheritance)** because:

- Supports both single-repo and monorepo setups with the same mechanism
- Central team can define shared config at monorepo root, or via project templates
- Developers can extend or override at project level
- Configuration lives in version control alongside the code

**Option 1 not chosen** because it doesn't support monorepos well, you'd need to duplicate shared configuration across every project.

**Option 3 not chosen** because it adds operational complexity and requires the CLI to fetch config from a service.

## Consequences

With the chosen option, we see the following consequences requiring extra effort:

1. **Config resolution complexity**: Need to implement directory tree walking and config merging with clear precedence rules.

2. **JSON Schema generation**: To provide intellisense and autocompletion in `checkup.yaml` files, we will generate a JSON Schema. Users can reference this schema in their IDE (e.g., via YAML Language Server).

3. **Interactive config generation**: To provide an alternative user experience, we will provide `checkup init` and/or `checkup config` for an interactive form-based CLI setup.

## More information

### Configuration file schema

All fields are optional. The CLI merges configuration from parent directories (child overrides parent).

```yaml
# checkup.yaml

tags:
key: value

providers:
- provider_name:
config_key: config_value
- simple_provider

metrics:
- metric_name
- metric_name_with_config:
config_key: config_value

materializer:
type: console | csv | html | sqlalchemy
# ... materializer-specific config
```

### Single-repo layout

```
my-data-product/
├── checkup.yaml # all config in one file
├── dbt/
│ └── ...
└── ...
```

```yaml
# checkup.yaml
tags:
...

providers:
- dbt:
project_dir: ./dbt
...

metrics:
...

materializer:
...
```

### Monorepo layout

```
monorepo/
├── checkup.yaml # shared: metrics + materializer
├── products/
│ ├── product-a/
│ │ ├── checkup.yaml # project-specific: tags + providers
│ │ └── dbt/
│ └── product-b/
│ ├── checkup.yaml
│ └── dbt/
```

Root config (shared):
```yaml
# monorepo/checkup.yaml
metrics:
...

materializer:
...
```

Project config (specific):
```yaml
# monorepo/products/product-a/checkup.yaml
tags:
...

providers:
- dbt:
project_dir: .
...
```

### Config resolution

1. Find `checkup.yaml` in current directory
2. Walk up directory tree, collecting parent configs
3. Merge configs (child overrides parent)
4. Apply CLI flag overrides

Precedence: CLI flags > project config > parent config

### JSON Schema for IDE support

Generate a JSON Schema from the config model to enable IDE intellisense:

```yaml
# yaml-language-server: $schema=https://checkup.example.com/schema.json

providers:
- provider_name:
config_key: config_value # IDE provides autocomplete
```

The schema includes all valid provider names, metric names, and their config options based on installed plugins.

### Interactive config generation (`checkup init` and `checkup config`)

**`checkup init`** - Create a new config file:
1. Select providers from installed plugins (dbt, git, python, etc.)
2. Select metrics (filtered to those supported by chosen providers)
3. Configure materializer
4. Generate `checkup.yaml`

**`checkup config`** - Modify an existing config file:
- Add/remove providers or metrics interactively
- Update materializer settings
- Useful when new plugins are installed or requirements change

Both commands allow developers to configure checkup without writing YAML manually, while still producing a standard config file they can edit directly if preferred.

### Related ADRs

- [Push-based Architecture](./2026.03.25-push-based-architecture.md)
- [Credentials and Secrets](./2026.03.25-credentials-and-secrets.md)
97 changes: 97 additions & 0 deletions docs/adr/2026.03.25-credentials-and-secrets.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Credentials and Secrets

* Status: `proposed`
* Deciders: `checkup team`
* Proposal date: 25/03/2026
* Decision date:

## Context and problem statement

The [configuration file design](./2026.03.25-configuration-file-design.md) uses YAML files for checkup configuration. These files often need to reference sensitive values (database URLs, API tokens) that should not be committed to version control.

We need a mechanism for injecting secrets into configuration at runtime.

## Considered options

1. **Explicit substitution with `${VAR}` syntax**: Reference environment variables explicitly in YAML using `${VAR}` syntax.

2. **Naming convention**: Environment variables matching a naming convention (e.g., `CHECKUP__PROVIDER__DBT__...`) are automatically mapped to config values.

3. **External secrets manager**: Integrate with secrets managers (Vault, AWS Secrets Manager, etc.) to fetch secrets at runtime.

## Chosen option

We are choosing **both option 1 and option 2** because they serve different use cases:

**Option 1 (`${VAR}` syntax)** is explicit and familiar:
- Clear which values come from environment
- Flexible: any config value can reference any env var
- Familiar from: Docker Compose, GitHub Actions, etc.

**Option 2 (naming convention)** enables config-free overrides:
- No YAML changes needed: just set env vars
- Familiar from: dlt hub, etc.

**Option 3 not chosen** for initial implementation because it adds complexity and external dependencies. Can be added later if needed.

## Consequences

With the chosen option, we see the following consequences requiring extra effort:

1. **Two mechanisms to document**: Users need to understand both approaches and when to use each.

2. **Precedence rules**: Need clear rules for what happens when both are used (`${VAR}` explicit references override naming convention defaults).

## More information

### Option 1: Explicit substitution in YAML

Reference environment variables using `${VAR}` syntax:

```yaml
materializer:
type: sqlalchemy
connection_url: ${DATABASE_URL}
```

### Option 2: Naming convention

Environment variables matching a naming convention are automatically mapped to config:

```bash
CHECKUP__MATERIALIZER__SQLALCHEMY__CONNECTION_URL=postgresql://...
CHECKUP__PROVIDER__CONVEYOR__API_TOKEN=xxx
```

Structure: `CHECKUP__<SECTION>__<NAME>__<CONFIG_KEY>`

### Precedence

When both mechanisms are used:

1. YAML file values are the base
2. Naming convention env vars provide defaults for missing values
3. `${VAR}` explicit substitution wins

Example:
```yaml
# checkup.yaml
materializer:
type: sqlalchemy
connection_url: ${DATABASE_URL} # explicit reference wins
```

```bash
# Environment
DATABASE_URL=postgresql://explicit/db
CHECKUP__MATERIALIZER__SQLALCHEMY__CONNECTION_URL=postgresql://convention/db
```

Result: `connection_url` = `postgresql://explicit/db` (`${VAR}` wins)

The naming convention is useful for providing defaults or overriding values that aren't explicitly referenced in YAML, while `${VAR}` gives precise control when you need it.

### Related ADRs

- [Push-based Architecture](./2026.03.25-push-based-architecture.md)
- [Configuration File Design](./2026.03.25-configuration-file-design.md)
Loading