Conventions for contributors and for AI-assisted edits in this repository.
- Minimal diffs: change only what the task requires; avoid drive-by refactors or unrelated formatting.
- Match existing code: naming, imports, type hints, and test style should blend with neighbouring modules.
- No committed artifacts: outputs go under
artifacts/(gitignored). Bundled inputs live underdata/sample_operando/.
- Version:
requires-python >= 3.10(seepyproject.toml). - Types: Prefer explicit annotations on public functions and complex internals; use
Protocolfor pluggable interfaces (e.g. trackingCostFunction). - NumPy / SciPy: Use vectorised operations on hot paths; delegate heavy distance work to
cdist/ BLAS-style matrix ops where possible. - Optional dependencies:
h5py,torch,transformersmay be absent (CI installs a minimal set). Use feature checks or env-driven backends (e.g. mock DINO) so tests stay deterministic.
- Dataset root: always a directory containing
scan*subfolders; discovery is defined inbraggtrack/io/discovery.py. - Resolving CLI paths: use
braggtrack.io.paths.resolve_dataset_root(args.root)so “no argument” prefers bundled sample data when present. - Contracts: serialised tables (
features.csv,embeddings.npz) should stay backward-compatible or bump aschema_versionfield in JSON summaries.
- Framework:
unittest(python -m unittest discover -s tests -v). - Before a PR:
python scripts/pre_pr_check.pyandpython scripts/ci_report.pywhen dependencies are installed. - H5 / NeXus tests: do not assume
h5pyis missing; useunittest.mockto exercise error paths (seetests/test_nexus_dependency.py).
- User-facing workflow:
README.mdquick starts. - Design depth:
docs/architecture.md(this repo’s structure and data flow). - Contributor onboarding:
CONTRIBUTING.md(links here and to architecture).
- Prefer clear, imperative commit subjects (
feat(tracking): …,fix(io): …,docs: …). - PR description should state what changed and why, and mention any schema or path changes (e.g. sample data location).