Skip to content

refactor(core): introduce staged pipeline#18

Open
sigilmakes wants to merge 25 commits into
test/baseline-coveragefrom
refactor/staged-core-pipeline
Open

refactor(core): introduce staged pipeline#18
sigilmakes wants to merge 25 commits into
test/baseline-coveragefrom
refactor/staged-core-pipeline

Conversation

@sigilmakes

@sigilmakes sigilmakes commented Jun 10, 2026

Copy link
Copy Markdown
Collaborator

What

Introduces the staged Goldilocks Core pipeline as the shared execution path for Python callers, the CLI, and future HTTP wrappers, and removes the pre-staged shared/ module and legacy aliases.

The Core graph is fixed and inspectable:

Load → Analyze → Advise → Kmesh → Select → Generate → Bundle

Modes stop at different points:

recommend → … Select
generate  → … Generate
bundle    → … Bundle

Public surface

from goldilocks_core import (
    CalculationHints,
    CalculationIntent,
    CoreJobRequest,
    CoreResult,
    Pipeline,
    generate,
    recommend,
    run_core_job,
    write_bundle,
)

CoreJobRequest is the serializable request boundary (data-only). CoreResult is the single result type returned by recommend, generate, write_bundle, and run_core_job — one accumulator of every stage record the graph produces, with Bundle's output carried as a BundleRecord stage record (result.bundle.path, result.bundle.manifest). The request is not echoed on the result; the CLI includes it in its JSON envelope itself.

Pipeline is a frozen dataclass of stage backends with the built-in implementations as field defaults. Composition is construction:

from goldilocks_core import Pipeline, run_core_job, CoreJobRequest
from goldilocks_core.advisors import ml_kmesh_advisor

pipeline = Pipeline(kmesh=ml_kmesh_advisor(spec))
result = run_core_job(CoreJobRequest(structure="Si.cif"), pipeline=pipeline)

Pipeline() is the default; override any field to swap that stage. No base classes, registries, or string resolution in Core. CoreJobRequest carries what to compute; Pipeline carries how.

Stage backends

Stage Default backend Output
Analyze analyze_structure StructureAnalysisRecord
Advise advise_parameters ParameterAdvice
Kmesh resolve_kpoints_from_advice KPointSelection
Select select_parameters SelectionRecord
Generate generate_inputs tuple[GeneratedFile, ...]
Bundle write_bundle_directory BundleRecord

Kmesh is its own stage (between Advise and Select) so ML k-point prediction is a first-class backend, not a hint hack: ml_kmesh_advisor(spec) returns a KMeshAdvisor that records provenance.source="model" while still letting operator k-point hints win. Generate currently writes Quantum ESPRESSO SCF input; Bundle writes a deterministic directory (manifest.json + inputs/) with path-traversal guards.

Contract invariants are enforced at construction: KPointAdvice requires exactly one of spacing/explicit_grid; CoreJobRequest validates mode and output_dir.

CLI

uv run goldilocks-core recommend STRUCTURE --json
uv run goldilocks-core generate STRUCTURE --pseudo-root PSEUDOS --json
uv run goldilocks-core bundle STRUCTURE --pseudo-root PSEUDOS --out RUN_DIR --json
uv run goldilocks-core recommend STRUCTURE --model model.joblib --json

--model resolves to Pipeline(kmesh=ml_kmesh_advisor(spec)) outside the request; the model path never appears on CoreJobRequest.

Why

One internal job runner keeps Python, CLI, and future HTTP entry points from drifting into separate recommendation logic. Making k-point resolution its own backend seam lets ML models participate with correct provenance. The result/Pipeline surface was finalized (single CoreResult, Pipeline with default field values, no shims) before merge since there are no consumers — no compatibility bridges were added.

How to test

uv sync --group dev
uv run pytest            # 81 passed
uv run ruff check src tests
uv run ruff format --check src tests
uv run pre-commit run --all-files

Out of scope

Kmesh is a hardcoded layer in the fixed pipeline graph: run_core_job() always calls Analyze → Advise → Kmesh → Select in that order, and Kmesh is the only sub-decision extracted as its own swappable backend. This is deliberate — it gives ML k-point prediction a first-class seam today without a registry or DAG — and it's the jump-off point for the larger future work: splitting the coarse advise/select fields into per-decision backends (advise_smearing, advise_magnetism, advise_pseudopotentials, etc.) so each scientific decision is independently swappable, with new decisions (Hubbard U, Van der Waals, band k-paths) slotting in as new Pipeline fields. That granularity work is tracked in #21 and is not in this PR. The current Pipeline — each stage a standalone callable with a typed signature — is the foundation for that evolution.

Closes #8, #19, #20, #22.


Written by an agent on behalf of Willow.

@sigilmakes

This comment has been minimized.

@sigilmakes sigilmakes force-pushed the test/baseline-coverage branch from a1f79d2 to f535fcd Compare June 10, 2026 09:25
@sigilmakes sigilmakes force-pushed the refactor/staged-core-pipeline branch from a26f136 to 89688a9 Compare June 10, 2026 09:25
Add Attributes sections to all 22 dataclasses in contracts.py
documenting every field's type, default, valid values, and
scientific meaning. Add docstrings to all type aliases
(ProvenanceSource, CodeName, CalcTask, etc.). Add class and
per-field docstrings to PseudoMetadata and PseudoPolicy.

Refs #19.
Add 14 focused documentation files covering the staged Core
pipeline:

- docs/provenance.md: source types, warning propagation,
  interpretation guide
- docs/conventions.md: units, defaults, heuristics, k-spacing
  convention, SOC policy
- docs/stages/analyze.md: composition, element classification,
  symmetry, electronic character, edge cases
- docs/stages/advise.md: decision trees, defaults, hint
  precedence, validation
- docs/stages/select.md: k-point resolution, pseudo ranking key,
  cutoff extraction, warnings
- docs/stages/generate.md: QE sections, value sources, SOC flag
  logic, error conditions
- docs/stages/bundle.md: directory layout, manifest schema, path
  traversal protection
- docs/serialization.md: to_dict contract, type conversions,
  example output
- docs/cli.md: complete flag reference, boolean options, output
  formats
- docs/tutorial.md: start-to-finish Python walkthrough, common
  patterns
- docs/extension.md: adding generators, codes, advisors, what
  not to extend
- docs/migration.md: removed paths, renamed types, behavior
  changes
- docs/decisions.md: design rationale for non-obvious choices
- docs/changelog.md: 0.1.0 refactor entry

Refs #19.
…sh stage

- Add Pipeline dataclass with typed callable fields for each stage
- Add default_pipeline() factory with built-in implementations
- Add pipeline parameter to run_core_job(), recommend(), generate(), write_bundle()
- Extract Kmesh as a first-class stage between Advise and Select
- Move k-point resolution from Select into resolve_kpoints_from_advice()
- Select now receives KPointSelection from Kmesh instead of deriving it
- Add ml_kmesh_advisor(spec) factory for model-backed Kmesh backends
- Add --model, --model-name, --model-version CLI flags
- Fix k-point conflict warning propagation through Kmesh stage
- CoreJobRequest stays data-only; backend choice lives in Pipeline
- Add extensive docs: pipeline.md, backends.md, contracts.md, stages/kmesh.md
- Update all existing docs for Kmesh stage and Pipeline injection
- Add tests for Pipeline composition, Kmesh hint precedence, ML backends, CLI model flag

Closes #20
@sigilmakes

This comment has been minimized.

@sigilmakes sigilmakes requested a review from junwen94 June 17, 2026 10:30
@sigilmakes sigilmakes marked this pull request as ready for review June 17, 2026 10:36
Replace the CoreRecommendation/CoreJobResult split with a single
CoreResult accumulator. Bundle's output is now a BundleRecord stage
record on the result (result.bundle.path/.manifest) rather than a
separate envelope, dissolving the return-type asymmetry across
recommend/generate/write_bundle and the generated_files/warnings
duplication. The request is no longer echoed on the result; the CLI
includes it in its JSON envelope itself.

Move Pipeline to jobs.py as a frozen dataclass with default backends
as field values, so composition is Pipeline(kmesh=...) and the default
is Pipeline(). Drop default_pipeline() and the dataclasses.replace swap
idiom from the public surface. Delete pipeline.py and the
load/analyze/advise/select/bundle_recommendation shims; route
stage-by-stage usage through Pipeline fields and load_structure.

Enforce contract invariants at construction: KPointAdvice requires
exactly one of spacing/explicit_grid; CoreJobRequest validates mode and
output_dir. Remove the now-dead defensive branch in
resolve_kpoints_from_advice and the unreachable to_jsonable branch.

No compatibility shims; there are no consumers. Plan and decisions in
#22; documented in docs/decisions.md, docs/changelog.md, and
docs/migration.md.
@sigilmakes

Copy link
Copy Markdown
Collaborator Author

Review: refactor/core — staged pipeline

Reviewed against main (28 commits, +7130/−681, 65 files). Local verification:

uv run pytest            # 81 passed
uv run ruff check src tests   # clean
uv run ruff format --check    # clean
uv run pre-commit run --all-files   # passed

No CI workflows on the branch, so this is local-only.

Verdict: ready — no critical or high issues. Two medium items warrant a decision before merge; the rest are low nits.

Findings

Severity File Issue
medium src/goldilocks_core/advice.py Smearing type without width produces an incomplete advice record. See note below.
medium tests/test_generation.py No test for the disordered-structure rejection path (generate_quantum_espresso_scf_input raises on not structure.is_ordered).
medium tests/test_generation.py No test for unsupported-code / unsupported-task rejection in generate_inputs (the two Only … is implemented branches).
low src/goldilocks_core/contracts.py StructureAnalysisRecord.formula docstring says e.g. Fe2O31; pymatgen actually returns 'Fe2 O3' (space-separated). Example is misleading.
low src/goldilocks_core/advice.py _advise_pseudopotentials provenance source reflects only the relativistic sub-decision. When SOC forces full with no relativistic hint, source becomes "analysis" even if the operator supplied pseudo_mode/pseudo_type hints. Transitional — acknowledged by #21's per-decision backend split.
low src/goldilocks_core/advice.py _advise_convergence uses hints.conv_thr or DEFAULT_CONV_THR. Safe today because _validate_hints rejects ≤ 0, but the or pattern would silently fall back if validation ever relaxes. Prefer explicit is not None checks.
low src/goldilocks_core/bundle.py BundleRecord.path = str(output_dir) carries the raw (possibly relative) input rather than the resolved path. Consistent and deterministic; just be aware result.bundle.path may be relative.
low src/goldilocks_core/selection.py _metadata_matches_mode treats SSSP pseudos with no explicit mode label as matching any pseudo_mode. Reasonable default but can mask a real efficiency/precision mismatch on incomplete metadata. Mitigated by the existing "does not explicitly match pseudo mode" warning.

Medium item — smearing type without width

When an operator hints smearing_type without smearing_width_ry, _advise_smearing returns:

SmearingAdvice(smearing_type="cold", width_ry=None, provenance=Provenance(source="user_hint", ...))

That record is internally inconsistent — cold/MP/Gaussian smearing requires a width to be meaningful — yet it passes through Kmesh and Select silently and only surfaces as a hard ValueError at Generate with a generator-layer message.

Per the stage separation, the generator refusing to invent a width is correct. The fix belongs in advice: either fill a conservative default width for the requested type (e.g. METALLIC_SMEARING_WIDTH_RY for cold/MP), or validate the type/width pair at _validate_hints and reject early with an advice-layer message. Right now advice records source="user_hint" for a record that isn't actually usable.

This is also the reason the "smearing enabled without width" generation path is unreachable from tests in a meaningful way — the incomplete record is the upstream cause.

Confirmed sound (checked against the dft-basics references)

  • SOC generation flagsnoncolin = .true. + lspinorb = .true. without nspin=2 is correct QE semantics. noncolin implies nspin=4; nspin=2 is collinear-only and would be wrong under SOC.
  • SOC → fully-relativistic pseudo linkage_advise_pseudopotentials sets relativistic_mode="full" when SOC is enabled, and Select filters all elements by that single mode. Consistent treatment across elements; no mixing.
  • No invented magnetic moments — generation writes no starting_magnetization/tot_magnetization. Consistent with not inventing magnetic ordering from structure facts alone.
  • K-point shift (0,0,0) — unshifted Monkhorst-Pack, Gamma included per QE K_POINTS automatic convention. Valid conservative default.
  • SOC + missing fully-relativistic pseudo — Select warns (fallback provenance), then Generate hard-stops on filename is None. Correct: warn at selection, refuse to write at generation.
  • Path-traversal guard in write_bundle_directory is correct and tested.
  • Contract invariants (KPointAdvice exactly-one, CoreJobRequest mode/output_dir) enforced at construction and tested.
  • Pipeline composition — frozen dataclass, typed signatures, no registry/base-class/plugins. Default-field override is the right seam for refactor(core): fine-grained Pipeline backends per scientific decision #21.
  • No leftover references to the removed shared/types.py or PseudoSelection; migration is complete with no shims.
  • to_jsonable is idempotent across BundleRecord.manifest re-conversion.

Suggested follow-up

Either address the smearing advice issue on this branch, or open a follow-up issue for it plus the two missing generation error-path tests, and let review proceed on the current code. I'd lean toward the follow-up issue since the physics-bearing behaviour (refusing to invent a width) is already correct; only the advice-layer ergonomics need polishing.


Written by an agent on behalf of Willow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant