Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
67 changes: 67 additions & 0 deletions PLANS.md
Original file line number Diff line number Diff line change
Expand Up @@ -69,6 +69,73 @@ a protein construct has been chosen and a DNA expression strategy is needed.
FoldFoundry's valuable near-term layer is protein construct design before
structure prediction, not DNA design.

## Symmetry-Preserving Graft Module

Long-form design document: `docs/design/symmetry_preserving_graft.md`.

This feature is implemented as a first-party FoldForge module:
`src/foldfoundry/modules/symmetry_graft/`.

FoldForge core owns:

1. Module discovery and loading.
2. CLI, workflow, artifact, and report registration hooks.
3. Shared config, logging, error, and output conventions.

The `symmetry_graft` module owns:

1. Gemmi, NumPy, and optional Biotite dependency use.
2. PDB/mmCIF IO and normalized atom/residue indexing.
3. Biological assembly, NCS, crystal packing, and deposited-coordinate
inspection.
4. Biological assembly expansion.
5. Kabsch fitting and source-frame graft atom replacement.
6. Future graft specs, packing shells, clash/contact reports, and model-backend
handoff.

No symmetry or grafting code should live in root `structures/` or `grafting/`
packages. No `symmetry` or `graft` command should be wired directly in root
`cli.py`; the module registers those command groups through
`ModuleContext.cli`.

Implementation milestones:

1. Module-host boundary.
- Keep `symmetry` and `graft` as module-owned CLI groups.
- Register `symmetry_preserving_graft`, `engineered_asu_cif`,
`engineered_assembly_cif`, `engineered_packing_shell_cif`, and
`symmetry_report_json` through module registries.
- Keep Gemmi/NumPy/Biotite out of required root runtime dependencies; expose
them through the symmetry-graft optional extra and dev/test dependencies.
2. Symmetry inspection and reporting.
- Detect mmCIF biological assemblies, PDB BIOMT, mmCIF/PDB NCS, crystal
unit-cell/space-group metadata, and deposited-coordinate fallback.
- Expose `foldforge symmetry inspect STRUCTURE --json --out report.json`.
- Keep opt-in real-data smoke coverage for user-supplied RCSB structures
`7LGE`, `1QBE`, `4V4M`, and `5UU5`. As of the live smoke added here,
`1QBE` and `4V4M` are crystallographic positive cases, while `7LGE` and
`5UU5` exercise non-crystallographic deposited-coordinate fallback. These
must remain live smoke tests, not network-dependent default unit tests or
structure-specific code paths.
3. Kabsch graft into source frame.
- Add anchor mapping, rigid fitting, transform application, and strict
scaffold-preservation tests without requiring any model backend.
4. Biological assembly expansion.
- Apply original biological assembly operators to source-frame atom records.
- Write a minimal mmCIF via `foldforge symmetry expand`.
- Verify deterministic fixtures and use real RCSB structures such as `2MS2`
only as smoke tests, not as structure-specific code paths.
5. Crystal packing shell.
- Generate bounded packing shells by cutoff around edited atoms.
6. Clash/contact report.
- Add intra-ASU and symmetry-mate clash/contact checks, cutpoint summaries,
graft RMSD, and structured warnings.
7. Model-backend integration.
- Add `ModelJobSpec` bundles without making AlphaFold, ColabFold, Chai,
Boltz, OpenFold, or any remote service mandatory.
8. GUI integration.
- Add a thin Swift module/controller only after the CLI-backed module works.

### Roadmap Phases

#### Phase 1: Core Engine And UI Boundary
Expand Down
78 changes: 78 additions & 0 deletions docs/design/symmetry_preserving_graft.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,78 @@
# Symmetry-Preserving Graft Plan

## Summary Of Current Repo

- FoldForge is a Python 3.12 package, managed with `uv`, using Typer, Pydantic, Rich, pytest, ruff, and strict mypy. The public CLI is `foldforge`; the package is `foldfoundry`.
- Existing architecture is CLI-first with thin Swift/SwiftUI macOS GUI over CLI-equivalent workflows. There is no TypeScript/React/web UI or API server today, so Mol* belongs only in a later browser UI.
- Relevant current core modules are `models.py` for `FoldSpec`, `io/` for file helpers, `backends/` for AF3/AlphaFold Server export, `results/` for AF3 output parsing, `hpc/` for reviewable bundle workflows, and `gui/macos/` for the thin app shell.
- FoldForge now has a Python module host with `ModuleMetadata`, `ModuleContext`, a `FoldForgeModule` protocol, CLI/workflow/artifact/report registries, built-in module refs, entry-point discovery under `foldforge.modules`, failure-tolerant loading, and `foldforge modules list/info`.
- Symmetry-graft should be the first real scientific module consuming that host. Structure/mmCIF/grafting code belongs under `src/foldfoundry/modules/symmetry_graft/`, not in root `src/foldfoundry/structures/` or `src/foldfoundry/grafting/`.
- Source-backed focused tests passed: `uv run pytest tests/test_af3_results.py tests/test_af3_export.py -q`. The installed `foldforge` entry point is stale in this worktree, so implementation should verify through source or the repo's known reinstall path until launcher state is repaired.

## Dependencies And Boundaries

- Add `gemmi` as a symmetry-graft module dependency, not a root runtime dependency. It directly supports PDB/mmCIF coordinate files, structure hierarchy, metadata, neighbor/contact search, NCS, biological assemblies from REMARK 350/mmCIF categories, and crystallographic symmetry ([Gemmi docs](https://gemmi.readthedocs.io/en/stable/mol.html)).
- Add `numpy` as a symmetry-graft module dependency for transform matrices, Kabsch/SVD fitting, RMSD, and stable coordinate arrays. Add `scipy` only when clash/contact performance needs `cKDTree`.
- Use `biotite` as an optional independent mmCIF biological-assembly verification path, especially for `_pdbx_struct_assembly_gen`, `_pdbx_struct_oper_list`, and `atom_site`; its `get_assembly()` consumes those categories directly ([Biotite get_assembly docs](https://www.biotite-python.org/latest/apidoc/biotite.structure.io.pdbx.get_assembly.html)).
- Keep ChimeraX optional only. Keep AlphaFold, ColabFold, Chai, Boltz, and OpenFold outside the symmetry/grafting core.
- If a web UI appears later, use Mol* for browser visualization and viewer-state output; Mol* is a modern open-source web toolkit for large molecular visualization ([Mol*](https://molstar.org/)).

## Key Implementation Changes

- Create a first-party module at `src/foldfoundry/modules/symmetry_graft/` for structure IO, atom/residue indexing, transforms, symmetry detection, assembly expansion, packing-shell generation, graft specs, anchor mapping, Kabsch fitting, atom replacement, cutpoint checks, clash/contact analysis, report generation, and orchestration.
- Register the module from `src/foldfoundry/modules/symmetry_graft/plugin.py`. It owns `symmetry` and `graft` CLI groups through `ctx.cli.add_command_group(...)`, and owns workflow, artifact, and report metadata through the module registries.
- Do not hardcode new symmetry-graft command groups in root `src/foldfoundry/cli.py`; root CLI should only load the module host.
- Use runtime dataclasses for non-serializable handles such as `StructureHandle` carrying Gemmi objects. Use Pydantic `StrictModel` style for serializable specs/reports: `Transform`, `SymmetryContext`, `EditSpec`, `ModelJobSpec`, `GraftResult`, and `SymmetryReport`.
- Add module-owned Typer subcommands matching current CLI conventions:
- `foldforge symmetry inspect input.cif --json`
- `foldforge graft prepare input.cif --chain A --mode graft_external_domain --anchor A:10-80 --context-distance 10 --out-dir runs/<job>/graft`
- `foldforge graft apply input.cif model.cif --spec graft_spec.json --symmetry biological_assembly --assembly-id 1 --out-dir runs/<job>/graft`
- `foldforge graft analyze engineered_asu.cif --assembly engineered_assembly.cif --out foldforge_symmetry_report.json`
- Keep the macOS GUI as a later thin module that shells out to those CLI commands and displays paths, reports, and warnings. Do not add Swift structure logic.
- Do not add API routes now; none exist. If a server API is later introduced, expose the same command-shaped workflow: inspect, prepare, apply, analyze, download bundle.

## Algorithmic Flow

- Parse PDB/mmCIF with Gemmi, prefer mmCIF internally, call entity setup, normalize atom keys, chain IDs, residue IDs, insertion codes, altloc policy, and author-vs-label identifiers.
- Detect preservation modes: biological assemblies from mmCIF/REMARK 350, NCS from MTRIX/mmCIF NCS records, crystal packing from unit cell/space group, or deposited coordinates fallback.
- Classify equivalent chains using bounded evidence: sequence identity first, then optional backbone RMSD/environment comparison. Warn when sequence-identical chains are structurally non-equivalent.
- `prepare` extracts edited chain/local region plus nearby context and writes a local model-job bundle without choosing or requiring a modeler.
- `apply` imports modeled PDB/mmCIF, maps shared anchor atoms, computes a rigid Kabsch fit from modeled anchors to native anchors, transforms only replacement atoms, and preserves untouched scaffold coordinates exactly.
- Propagate the engineered source-frame unit through the selected original operators. For crystal packing, generate only a bounded shell within a user cutoff around edited atoms.
- Analyze heavy-atom clashes within source-frame coordinates and across mates, cutpoint geometry, anchor RMSD, minimum graft-to-mate distances, contact loss/gain, operator mapping, scaffold drift, and model confidence warnings when metadata exists.
- Write `engineered_asu.cif`, optional `engineered_assembly.cif`, optional `engineered_packing_shell.cif`, and `foldforge_symmetry_report.json`.

## Test Plan

- Unit tests: Kabsch transform recovery, distance preservation, anchor RMSD, scaffold coordinate invariance outside replacement regions, replacement atom transform application, cutpoint checks, clash detection with bonded-neighbor exclusions, chain-equivalence classification, and chain/operator ID collision handling.
- Symmetry tests: small mmCIF assembly fixture, PDB BIOMT fixture, MTRIX/NCS fixture when supported, and a simple crystal cell for bounded packing-shell neighbor counts.
- Opt-in live RCSB smoke tests: download user-supplied structures `7LGE`, `1QBE`, `4V4M`, and `5UU5` and inspect them through the module-owned code path. `1QBE` and `4V4M` are crystallographic positive cases with unit-cell/space-group metadata; `7LGE` and `5UU5` currently exercise the non-crystallographic deposited-coordinate fallback path. These stay behind the live-test gate and must not become structure-specific production logic.
- Integration tests: mock model output by applying a known transform to a synthetic edited chain, verify graft returns to native frame, verify assembly expansion operator count/chain mapping, and validate JSON report schema.
- Prefer semantic assertions over fragile golden atom-order snapshots: atom counts, operator IDs, RMSD thresholds, chain mapping, clash counts, and unchanged coordinate hashes for preserved scaffold atoms.
- Required repo checks after implementation: `uv run pytest`, `uv run ruff check .`, `uv run ruff format --check .`, and `uv run mypy src`.

## Milestones

- Milestone 1: module-owned Gemmi-backed `symmetry inspect` with JSON report for assemblies, NCS, crystal metadata, deposited fallback, warnings, and tests.
- Milestone 2: module-owned Kabsch graft into source frame for `graft_external_domain` and `replace_local_region`, preserving scaffold coordinates.
- Milestone 3: module-owned biological assembly expansion with Gemmi as primary and Biotite as independent mmCIF verification.
- Milestone 4: Bounded crystal packing shell generation by cutoff around edited atoms.
- Milestone 5: Clash/contact/cutpoint report with machine-readable `SymmetryReport`.
- Milestone 6: Model-backend integration through `ModelJobSpec` bundles and existing backend/export boundaries, without making any modeler mandatory.
- Milestone 7: Swift GUI workflow module, and only if a browser UI exists later, Mol* viewer state/export.

## Risks And Human Review

- Human review needed for default thresholds: anchor RMSD fail/warn, clash distance, scaffold drift, contact-loss scoring, and packing-shell cutoff.
- Altloc, insertion-code, author-vs-label residue numbering, missing atoms, and model/native residue mismatch policy should be explicit in Milestone 1.
- Crystal packing can explode in size; source-frame ASU must remain the primary artifact and shell generation must be cutoff-bounded by default.
- `replace_chain` is useful but riskier than `graft_external_domain`; default the workflow to `graft_external_domain`.
- Avoid folding these models into `FoldSpec` until the interface is proven. Treat graft specs as separate local workflow specs that can reference a source structure and model-job artifacts.

## Recommended First PR

- Scope: module host first, then symmetry-graft as a module. Avoid mixing host infrastructure, GUI work, HPC/browser changes, and scientific implementation in one PR.
- Add or preserve module host tests for built-in discovery, entry-point discovery, failure-tolerant loading, duplicate-name behavior, module CLI registration, and `foldforge modules list/info`.
- Add `src/foldfoundry/modules/symmetry_graft/plugin.py` and module-owned `cli.py` before adding more algorithms. Dependencies should be module metadata and optional extras/dev test dependencies, not required root runtime dependencies.
- Place structure and grafting implementation under `src/foldfoundry/modules/symmetry_graft/`.
- Output only inspect/assembly summaries in early PRs; do not call modelers, launch viewers, or touch the Swift GUI until the CLI-backed module is stable.
10 changes: 10 additions & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -17,13 +17,23 @@ dependencies = [
"typer>=0.12",
]

[project.optional-dependencies]
symmetry-graft = [
"biotite>=1.6.0",
"gemmi>=0.7.0",
"numpy>=2.0",
]

[project.scripts]
foldforge = "foldfoundry.launcher:main"
foldfoundry = "foldfoundry.cli:main"

[dependency-groups]
dev = [
"biotite>=1.6.0",
"gemmi>=0.7.0",
"mypy>=1.10",
"numpy>=2.0",
"pytest>=8.2",
"pytest-cov>=6.0",
"ruff>=0.5",
Expand Down
8 changes: 8 additions & 0 deletions src/foldfoundry/errors.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,14 @@ class ResultParseError(FoldFoundryError):
"""Raised when backend results cannot be parsed."""


class StructureError(FoldFoundryError):
"""Raised when a structure file or its metadata cannot be inspected."""


class GraftError(FoldFoundryError):
"""Raised when a source-frame graft cannot be prepared or applied."""


_UNION_LABELS = {"protein", "dna", "rna", "ligand"}


Expand Down
32 changes: 32 additions & 0 deletions src/foldfoundry/io/json_io.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
"""JSON file writing helpers."""

import json
from pathlib import Path
from typing import Any

from foldfoundry.errors import ExportError


def write_json(
payload: Any,
output_path: Path,
*,
force: bool = False,
sort_keys: bool = False,
directory_message: str = "--out must be a file path, not a directory",
error_label: str = "could not write JSON",
) -> None:
"""Write indented JSON with consistent overwrite and directory checks."""
if output_path.exists() and output_path.is_dir():
raise ExportError(f"{output_path}: {directory_message}")
if output_path.exists() and not force:
raise ExportError(f"{output_path}: file exists; pass --force to overwrite")

try:
output_path.parent.mkdir(parents=True, exist_ok=True)
output_path.write_text(
json.dumps(payload, indent=2, sort_keys=sort_keys) + "\n",
encoding="utf-8",
)
except OSError as exc:
raise ExportError(f"{output_path}: {error_label}: {exc}") from exc
5 changes: 4 additions & 1 deletion src/foldfoundry/modules/discovery.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,10 @@
)

ENTRY_POINT_GROUP = "foldforge.modules"
BUILTIN_MODULES: tuple[str, ...] = ("foldfoundry.modules.example_hello.plugin:module",)
BUILTIN_MODULES: tuple[str, ...] = (
"foldfoundry.modules.example_hello.plugin:module",
"foldfoundry.modules.symmetry_graft.plugin:module",
)

ModuleStatus = Literal["loaded", "disabled", "failed"]
ModuleSource = Literal["builtin", "entry_point", "local_path"]
Expand Down
53 changes: 53 additions & 0 deletions src/foldfoundry/modules/symmetry_graft/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
"""First-party symmetry-preserving graft module."""

from foldfoundry.modules.symmetry_graft.apply import (
atom_key,
fit_anchor_atoms,
graft_replacement_atoms,
map_anchor_atoms,
transform_atom,
)
from foldfoundry.modules.symmetry_graft.assembly import (
expand_biological_assembly,
expand_biological_assembly_file,
select_biological_assembly_context,
)
from foldfoundry.modules.symmetry_graft.fit import (
apply_transform,
coordinate_array,
kabsch_fit,
rmsd,
)
from foldfoundry.modules.symmetry_graft.graft_models import (
AnchorFitResult,
EditSpec,
GraftAtomApplicationResult,
)
from foldfoundry.modules.symmetry_graft.structure_io import (
detect_structure_format,
load_structure,
write_atom_records_mmcif,
)
from foldfoundry.modules.symmetry_graft.symmetry import inspect_structure

__all__ = [
"AnchorFitResult",
"EditSpec",
"GraftAtomApplicationResult",
"apply_transform",
"atom_key",
"coordinate_array",
"detect_structure_format",
"expand_biological_assembly",
"expand_biological_assembly_file",
"fit_anchor_atoms",
"graft_replacement_atoms",
"inspect_structure",
"kabsch_fit",
"load_structure",
"map_anchor_atoms",
"rmsd",
"select_biological_assembly_context",
"transform_atom",
"write_atom_records_mmcif",
]
Loading