refactor(core): introduce staged pipeline#18
Conversation
This comment has been minimized.
This comment has been minimized.
a1f79d2 to
f535fcd
Compare
a26f136 to
89688a9
Compare
Add Attributes sections to all 22 dataclasses in contracts.py documenting every field's type, default, valid values, and scientific meaning. Add docstrings to all type aliases (ProvenanceSource, CodeName, CalcTask, etc.). Add class and per-field docstrings to PseudoMetadata and PseudoPolicy. Refs #19.
Add 14 focused documentation files covering the staged Core pipeline: - docs/provenance.md: source types, warning propagation, interpretation guide - docs/conventions.md: units, defaults, heuristics, k-spacing convention, SOC policy - docs/stages/analyze.md: composition, element classification, symmetry, electronic character, edge cases - docs/stages/advise.md: decision trees, defaults, hint precedence, validation - docs/stages/select.md: k-point resolution, pseudo ranking key, cutoff extraction, warnings - docs/stages/generate.md: QE sections, value sources, SOC flag logic, error conditions - docs/stages/bundle.md: directory layout, manifest schema, path traversal protection - docs/serialization.md: to_dict contract, type conversions, example output - docs/cli.md: complete flag reference, boolean options, output formats - docs/tutorial.md: start-to-finish Python walkthrough, common patterns - docs/extension.md: adding generators, codes, advisors, what not to extend - docs/migration.md: removed paths, renamed types, behavior changes - docs/decisions.md: design rationale for non-obvious choices - docs/changelog.md: 0.1.0 refactor entry Refs #19.
…sh stage - Add Pipeline dataclass with typed callable fields for each stage - Add default_pipeline() factory with built-in implementations - Add pipeline parameter to run_core_job(), recommend(), generate(), write_bundle() - Extract Kmesh as a first-class stage between Advise and Select - Move k-point resolution from Select into resolve_kpoints_from_advice() - Select now receives KPointSelection from Kmesh instead of deriving it - Add ml_kmesh_advisor(spec) factory for model-backed Kmesh backends - Add --model, --model-name, --model-version CLI flags - Fix k-point conflict warning propagation through Kmesh stage - CoreJobRequest stays data-only; backend choice lives in Pipeline - Add extensive docs: pipeline.md, backends.md, contracts.md, stages/kmesh.md - Update all existing docs for Kmesh stage and Pipeline injection - Add tests for Pipeline composition, Kmesh hint precedence, ML backends, CLI model flag Closes #20
This comment has been minimized.
This comment has been minimized.
Replace the CoreRecommendation/CoreJobResult split with a single CoreResult accumulator. Bundle's output is now a BundleRecord stage record on the result (result.bundle.path/.manifest) rather than a separate envelope, dissolving the return-type asymmetry across recommend/generate/write_bundle and the generated_files/warnings duplication. The request is no longer echoed on the result; the CLI includes it in its JSON envelope itself. Move Pipeline to jobs.py as a frozen dataclass with default backends as field values, so composition is Pipeline(kmesh=...) and the default is Pipeline(). Drop default_pipeline() and the dataclasses.replace swap idiom from the public surface. Delete pipeline.py and the load/analyze/advise/select/bundle_recommendation shims; route stage-by-stage usage through Pipeline fields and load_structure. Enforce contract invariants at construction: KPointAdvice requires exactly one of spacing/explicit_grid; CoreJobRequest validates mode and output_dir. Remove the now-dead defensive branch in resolve_kpoints_from_advice and the unreachable to_jsonable branch. No compatibility shims; there are no consumers. Plan and decisions in #22; documented in docs/decisions.md, docs/changelog.md, and docs/migration.md.
Review: refactor/core — staged pipelineReviewed against No CI workflows on the branch, so this is local-only. Verdict: ready — no critical or high issues. Two medium items warrant a decision before merge; the rest are low nits. Findings
Medium item — smearing type without widthWhen an operator hints SmearingAdvice(smearing_type="cold", width_ry=None, provenance=Provenance(source="user_hint", ...))That record is internally inconsistent — cold/MP/Gaussian smearing requires a width to be meaningful — yet it passes through Kmesh and Select silently and only surfaces as a hard Per the stage separation, the generator refusing to invent a width is correct. The fix belongs in advice: either fill a conservative default width for the requested type (e.g. This is also the reason the "smearing enabled without width" generation path is unreachable from tests in a meaningful way — the incomplete record is the upstream cause. Confirmed sound (checked against the dft-basics references)
Suggested follow-upEither address the smearing advice issue on this branch, or open a follow-up issue for it plus the two missing generation error-path tests, and let review proceed on the current code. I'd lean toward the follow-up issue since the physics-bearing behaviour (refusing to invent a width) is already correct; only the advice-layer ergonomics need polishing. Written by an agent on behalf of Willow. |
What
Introduces the staged Goldilocks Core pipeline as the shared execution path for Python callers, the CLI, and future HTTP wrappers, and removes the pre-staged
shared/module and legacy aliases.The Core graph is fixed and inspectable:
Modes stop at different points:
Public surface
CoreJobRequestis the serializable request boundary (data-only).CoreResultis the single result type returned byrecommend,generate,write_bundle, andrun_core_job— one accumulator of every stage record the graph produces, with Bundle's output carried as aBundleRecordstage record (result.bundle.path,result.bundle.manifest). The request is not echoed on the result; the CLI includes it in its JSON envelope itself.Pipelineis a frozen dataclass of stage backends with the built-in implementations as field defaults. Composition is construction:Pipeline()is the default; override any field to swap that stage. No base classes, registries, or string resolution in Core.CoreJobRequestcarries what to compute;Pipelinecarries how.Stage backends
analyze_structureStructureAnalysisRecordadvise_parametersParameterAdviceresolve_kpoints_from_adviceKPointSelectionselect_parametersSelectionRecordgenerate_inputstuple[GeneratedFile, ...]write_bundle_directoryBundleRecordKmesh is its own stage (between Advise and Select) so ML k-point prediction is a first-class backend, not a hint hack:
ml_kmesh_advisor(spec)returns aKMeshAdvisorthat recordsprovenance.source="model"while still letting operator k-point hints win. Generate currently writes Quantum ESPRESSO SCF input; Bundle writes a deterministic directory (manifest.json+inputs/) with path-traversal guards.Contract invariants are enforced at construction:
KPointAdvicerequires exactly one ofspacing/explicit_grid;CoreJobRequestvalidatesmodeandoutput_dir.CLI
--modelresolves toPipeline(kmesh=ml_kmesh_advisor(spec))outside the request; the model path never appears onCoreJobRequest.Why
One internal job runner keeps Python, CLI, and future HTTP entry points from drifting into separate recommendation logic. Making k-point resolution its own backend seam lets ML models participate with correct provenance. The result/Pipeline surface was finalized (single
CoreResult,Pipelinewith default field values, no shims) before merge since there are no consumers — no compatibility bridges were added.How to test
uv sync --group dev uv run pytest # 81 passed uv run ruff check src tests uv run ruff format --check src tests uv run pre-commit run --all-filesOut of scope
Kmesh is a hardcoded layer in the fixed pipeline graph:
run_core_job()always calls Analyze → Advise → Kmesh → Select in that order, and Kmesh is the only sub-decision extracted as its own swappable backend. This is deliberate — it gives ML k-point prediction a first-class seam today without a registry or DAG — and it's the jump-off point for the larger future work: splitting the coarseadvise/selectfields into per-decision backends (advise_smearing,advise_magnetism,advise_pseudopotentials, etc.) so each scientific decision is independently swappable, with new decisions (Hubbard U, Van der Waals, band k-paths) slotting in as newPipelinefields. That granularity work is tracked in #21 and is not in this PR. The currentPipeline— each stage a standalone callable with a typed signature — is the foundation for that evolution.Closes #8, #19, #20, #22.
Written by an agent on behalf of Willow.