refactor(core): introduce staged pipeline by sigilmakes · Pull Request #18 · stfc/goldilocks-core

sigilmakes · 2026-06-10T08:40:13Z

What

Introduces the staged Goldilocks Core pipeline as the shared execution path for Python callers, the CLI, and future HTTP wrappers, and removes the pre-staged shared/ module and legacy aliases.

The Core graph is fixed and inspectable:

Load → Analyze → Advise → Kmesh → Select → Generate → Bundle

Modes stop at different points:

recommend → … Select
generate  → … Generate
bundle    → … Bundle

Public surface

from goldilocks_core import (
    CalculationHints,
    CalculationIntent,
    CoreJobRequest,
    CoreResult,
    Pipeline,
    generate,
    recommend,
    run_core_job,
    write_bundle,
)

CoreJobRequest is the serializable request boundary (data-only). CoreResult is the single result type returned by recommend, generate, write_bundle, and run_core_job — one accumulator of every stage record the graph produces, with Bundle's output carried as a BundleRecord stage record (result.bundle.path, result.bundle.manifest). The request is not echoed on the result; the CLI includes it in its JSON envelope itself.

Pipeline is a frozen dataclass of stage backends with the built-in implementations as field defaults. Composition is construction:

from goldilocks_core import Pipeline, run_core_job, CoreJobRequest
from goldilocks_core.advisors import ml_kmesh_advisor

pipeline = Pipeline(kmesh=ml_kmesh_advisor(spec))
result = run_core_job(CoreJobRequest(structure="Si.cif"), pipeline=pipeline)

Pipeline() is the default; override any field to swap that stage. No base classes, registries, or string resolution in Core. CoreJobRequest carries what to compute; Pipeline carries how.

Stage backends

Stage	Default backend	Output
Analyze	`analyze_structure`	`StructureAnalysisRecord`
Advise	`advise_parameters`	`ParameterAdvice`
Kmesh	`resolve_kpoints_from_advice`	`KPointSelection`
Select	`select_parameters`	`SelectionRecord`
Generate	`generate_inputs`	`tuple[GeneratedFile, ...]`
Bundle	`write_bundle_directory`	`BundleRecord`

Kmesh is its own stage (between Advise and Select) so ML k-point prediction is a first-class backend, not a hint hack: ml_kmesh_advisor(spec) returns a KMeshAdvisor that records provenance.source="model" while still letting operator k-point hints win. Generate currently writes Quantum ESPRESSO SCF input; Bundle writes a deterministic directory (manifest.json + inputs/) with path-traversal guards.

Contract invariants are enforced at construction: KPointAdvice requires exactly one of spacing/explicit_grid; CoreJobRequest validates mode and output_dir.

CLI

uv run goldilocks-core recommend STRUCTURE --json
uv run goldilocks-core generate STRUCTURE --pseudo-root PSEUDOS --json
uv run goldilocks-core bundle STRUCTURE --pseudo-root PSEUDOS --out RUN_DIR --json
uv run goldilocks-core recommend STRUCTURE --model model.joblib --json

--model resolves to Pipeline(kmesh=ml_kmesh_advisor(spec)) outside the request; the model path never appears on CoreJobRequest.

Why

One internal job runner keeps Python, CLI, and future HTTP entry points from drifting into separate recommendation logic. Making k-point resolution its own backend seam lets ML models participate with correct provenance. The result/Pipeline surface was finalized (single CoreResult, Pipeline with default field values, no shims) before merge since there are no consumers — no compatibility bridges were added.

How to test

uv sync --group dev
uv run pytest            # 81 passed
uv run ruff check src tests
uv run ruff format --check src tests
uv run pre-commit run --all-files

Out of scope

Kmesh is a hardcoded layer in the fixed pipeline graph: run_core_job() always calls Analyze → Advise → Kmesh → Select in that order, and Kmesh is the only sub-decision extracted as its own swappable backend. This is deliberate — it gives ML k-point prediction a first-class seam today without a registry or DAG — and it's the jump-off point for the larger future work: splitting the coarse advise/select fields into per-decision backends (advise_smearing, advise_magnetism, advise_pseudopotentials, etc.) so each scientific decision is independently swappable, with new decisions (Hubbard U, Van der Waals, band k-paths) slotting in as new Pipeline fields. That granularity work is tracked in #21 and is not in this PR. The current Pipeline — each stage a standalone callable with a typed signature — is the foundation for that evolution.

Closes #8, #19, #20, #22.

Written by an agent on behalf of Willow.

Add Attributes sections to all 22 dataclasses in contracts.py documenting every field's type, default, valid values, and scientific meaning. Add docstrings to all type aliases (ProvenanceSource, CodeName, CalcTask, etc.). Add class and per-field docstrings to PseudoMetadata and PseudoPolicy. Refs #19.

Add 14 focused documentation files covering the staged Core pipeline: - docs/provenance.md: source types, warning propagation, interpretation guide - docs/conventions.md: units, defaults, heuristics, k-spacing convention, SOC policy - docs/stages/analyze.md: composition, element classification, symmetry, electronic character, edge cases - docs/stages/advise.md: decision trees, defaults, hint precedence, validation - docs/stages/select.md: k-point resolution, pseudo ranking key, cutoff extraction, warnings - docs/stages/generate.md: QE sections, value sources, SOC flag logic, error conditions - docs/stages/bundle.md: directory layout, manifest schema, path traversal protection - docs/serialization.md: to_dict contract, type conversions, example output - docs/cli.md: complete flag reference, boolean options, output formats - docs/tutorial.md: start-to-finish Python walkthrough, common patterns - docs/extension.md: adding generators, codes, advisors, what not to extend - docs/migration.md: removed paths, renamed types, behavior changes - docs/decisions.md: design rationale for non-obvious choices - docs/changelog.md: 0.1.0 refactor entry Refs #19.

…sh stage - Add Pipeline dataclass with typed callable fields for each stage - Add default_pipeline() factory with built-in implementations - Add pipeline parameter to run_core_job(), recommend(), generate(), write_bundle() - Extract Kmesh as a first-class stage between Advise and Select - Move k-point resolution from Select into resolve_kpoints_from_advice() - Select now receives KPointSelection from Kmesh instead of deriving it - Add ml_kmesh_advisor(spec) factory for model-backed Kmesh backends - Add --model, --model-name, --model-version CLI flags - Fix k-point conflict warning propagation through Kmesh stage - CoreJobRequest stays data-only; backend choice lives in Pipeline - Add extensive docs: pipeline.md, backends.md, contracts.md, stages/kmesh.md - Update all existing docs for Kmesh stage and Pipeline injection - Add tests for Pipeline composition, Kmesh hint precedence, ML backends, CLI model flag Closes #20

Replace the CoreRecommendation/CoreJobResult split with a single CoreResult accumulator. Bundle's output is now a BundleRecord stage record on the result (result.bundle.path/.manifest) rather than a separate envelope, dissolving the return-type asymmetry across recommend/generate/write_bundle and the generated_files/warnings duplication. The request is no longer echoed on the result; the CLI includes it in its JSON envelope itself. Move Pipeline to jobs.py as a frozen dataclass with default backends as field values, so composition is Pipeline(kmesh=...) and the default is Pipeline(). Drop default_pipeline() and the dataclasses.replace swap idiom from the public surface. Delete pipeline.py and the load/analyze/advise/select/bundle_recommendation shims; route stage-by-stage usage through Pipeline fields and load_structure. Enforce contract invariants at construction: KPointAdvice requires exactly one of spacing/explicit_grid; CoreJobRequest validates mode and output_dir. Remove the now-dead defensive branch in resolve_kpoints_from_advice and the unreachable to_jsonable branch. No compatibility shims; there are no consumers. Plan and decisions in #22; documented in docs/decisions.md, docs/changelog.md, and docs/migration.md.

sigilmakes · 2026-06-18T12:29:13Z

Review: refactor/core — staged pipeline

Reviewed against main (28 commits, +7130/−681, 65 files). Local verification:

uv run pytest            # 81 passed
uv run ruff check src tests   # clean
uv run ruff format --check    # clean
uv run pre-commit run --all-files   # passed

No CI workflows on the branch, so this is local-only.

Verdict: ready — no critical or high issues. Two medium items warrant a decision before merge; the rest are low nits.

Findings

Severity	File	Issue
medium	`src/goldilocks_core/advice.py`	Smearing type without width produces an incomplete advice record. See note below.
medium	`tests/test_generation.py`	No test for the disordered-structure rejection path (`generate_quantum_espresso_scf_input` raises on `not structure.is_ordered`).
medium	`tests/test_generation.py`	No test for unsupported-code / unsupported-task rejection in `generate_inputs` (the two `Only … is implemented` branches).
low	`src/goldilocks_core/contracts.py`	`StructureAnalysisRecord.formula` docstring says e.g. `Fe2O31`; pymatgen actually returns `'Fe2 O3'` (space-separated). Example is misleading.
low	`src/goldilocks_core/advice.py`	`_advise_pseudopotentials` provenance `source` reflects only the relativistic sub-decision. When SOC forces `full` with no relativistic hint, source becomes `"analysis"` even if the operator supplied `pseudo_mode`/`pseudo_type` hints. Transitional — acknowledged by #21's per-decision backend split.
low	`src/goldilocks_core/advice.py`	`_advise_convergence` uses `hints.conv_thr or DEFAULT_CONV_THR`. Safe today because `_validate_hints` rejects `≤ 0`, but the `or` pattern would silently fall back if validation ever relaxes. Prefer explicit `is not None` checks.
low	`src/goldilocks_core/bundle.py`	`BundleRecord.path = str(output_dir)` carries the raw (possibly relative) input rather than the resolved path. Consistent and deterministic; just be aware `result.bundle.path` may be relative.
low	`src/goldilocks_core/selection.py`	`_metadata_matches_mode` treats SSSP pseudos with no explicit mode label as matching any `pseudo_mode`. Reasonable default but can mask a real efficiency/precision mismatch on incomplete metadata. Mitigated by the existing "does not explicitly match pseudo mode" warning.

Medium item — smearing type without width

When an operator hints smearing_type without smearing_width_ry, _advise_smearing returns:

SmearingAdvice(smearing_type="cold", width_ry=None, provenance=Provenance(source="user_hint", ...))

That record is internally inconsistent — cold/MP/Gaussian smearing requires a width to be meaningful — yet it passes through Kmesh and Select silently and only surfaces as a hard ValueError at Generate with a generator-layer message.

Per the stage separation, the generator refusing to invent a width is correct. The fix belongs in advice: either fill a conservative default width for the requested type (e.g. METALLIC_SMEARING_WIDTH_RY for cold/MP), or validate the type/width pair at _validate_hints and reject early with an advice-layer message. Right now advice records source="user_hint" for a record that isn't actually usable.

This is also the reason the "smearing enabled without width" generation path is unreachable from tests in a meaningful way — the incomplete record is the upstream cause.

Confirmed sound (checked against the dft-basics references)

SOC generation flags — noncolin = .true. + lspinorb = .true. without nspin=2 is correct QE semantics. noncolin implies nspin=4; nspin=2 is collinear-only and would be wrong under SOC.
SOC → fully-relativistic pseudo linkage — _advise_pseudopotentials sets relativistic_mode="full" when SOC is enabled, and Select filters all elements by that single mode. Consistent treatment across elements; no mixing.
No invented magnetic moments — generation writes no starting_magnetization/tot_magnetization. Consistent with not inventing magnetic ordering from structure facts alone.
K-point shift (0,0,0) — unshifted Monkhorst-Pack, Gamma included per QE K_POINTS automatic convention. Valid conservative default.
SOC + missing fully-relativistic pseudo — Select warns (fallback provenance), then Generate hard-stops on filename is None. Correct: warn at selection, refuse to write at generation.
Path-traversal guard in write_bundle_directory is correct and tested.
Contract invariants (KPointAdvice exactly-one, CoreJobRequest mode/output_dir) enforced at construction and tested.
Pipeline composition — frozen dataclass, typed signatures, no registry/base-class/plugins. Default-field override is the right seam for refactor(core): fine-grained Pipeline backends per scientific decision #21.
No leftover references to the removed shared/types.py or PseudoSelection; migration is complete with no shims.
to_jsonable is idempotent across BundleRecord.manifest re-conversion.

Suggested follow-up

Either address the smearing advice issue on this branch, or open a follow-up issue for it plus the two missing generation error-path tests, and let review proceed on the current code. I'd lean toward the follow-up issue since the physics-bearing behaviour (refusing to invent a width) is already correct; only the advice-layer ergonomics need polishing.

Written by an agent on behalf of Willow.

This comment has been minimized.

Sign in to view

sigilmakes force-pushed the test/baseline-coverage branch from a1f79d2 to f535fcd Compare June 10, 2026 09:25

sigilmakes added 18 commits June 10, 2026 10:25

feat(core): add staged pipeline contracts

05962f6

feat(core): add analyse and advise stages

ef12ddb

feat(core): add select and pipeline stages

f7d4a0f

style(core): apply ruff formatting

08e356a

docs(core): document staged pipeline

42ed7e6

refactor(core): centralize boundary contracts

8ed11da

refactor(core): remove legacy compatibility paths

6facea6

refactor(pseudo): remove unused selection model

b6eaac2

docs: polish staged pipeline documentation

8acf033

feat(core): expand analysis and job contracts

e0761f2

feat(core): add job runner and bundle stages

4ee0017

feat(cli): add staged core command

360b88d

feat(pseudo): rank selection candidates

78eeca7

docs(core): document job runner pipeline

39030e1

style(core): format staged pipeline updates

fefd809

fix(core): tighten serialization and qe soc output

eaff4d3

docs(core): add staged pipeline review

9a651b0

docs(core): move review to issue thread

89688a9

sigilmakes force-pushed the refactor/staged-core-pipeline branch from a26f136 to 89688a9 Compare June 10, 2026 09:25

sigilmakes mentioned this pull request Jun 10, 2026

refactor: introduce staged Core pipeline #8

Open

77 tasks

sigilmakes added 2 commits June 10, 2026 10:40

sigilmakes mentioned this pull request Jun 10, 2026

refactor(core): composable pipeline with swappable stage backends #20

Open

52 tasks

This comment has been minimized.

Sign in to view

sigilmakes added 2 commits June 10, 2026 11:30

docs(skills): add sub-issues usage to github-cli skill

84e2fcd

docs(skills): add goldilocks quickstart skill

e8582a9

sigilmakes requested a review from junwen94 June 17, 2026 10:30

sigilmakes marked this pull request as ready for review June 17, 2026 10:36

docs(core): complete staged pipeline docs cleanup

dba0882

sigilmakes mentioned this pull request Jun 18, 2026

refactor(core): staged Core API ergonomics — CoreResult accumulator, Pipeline defaults, shim removal #22

Open

52 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor(core): introduce staged pipeline#18

refactor(core): introduce staged pipeline#18
sigilmakes wants to merge 25 commits into
test/baseline-coveragefrom
refactor/staged-core-pipeline

sigilmakes commented Jun 10, 2026 •

edited

Loading

Uh oh!

This comment has been minimized.

This comment has been minimized.

sigilmakes commented Jun 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sigilmakes commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Public surface

Stage backends

CLI

Why

How to test

Out of scope

Uh oh!

This comment has been minimized.

This comment has been minimized.

sigilmakes commented Jun 18, 2026

Review: refactor/core — staged pipeline

Findings

Medium item — smearing type without width

Confirmed sound (checked against the dft-basics references)

Suggested follow-up

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

sigilmakes commented Jun 10, 2026 •

edited

Loading