Skip to content

Ek/initial experimentation#49

Draft
ekiefl wants to merge 24 commits intoCAODH:masterfrom
Arcadia-Science:ek/initial-experimentation
Draft

Ek/initial experimentation#49
ekiefl wants to merge 24 commits intoCAODH:masterfrom
Arcadia-Science:ek/initial-experimentation

Conversation

@ekiefl
Copy link
Copy Markdown

@ekiefl ekiefl commented Mar 19, 2026

No description provided.

ekiefl added 24 commits March 17, 2026 15:20
…search

Remove tqdm progress bars, loguru logging, gc.collect calls, EMA,
tensorboard, and checkpoint saving from the training loop. Output is
now just epoch summaries + structured results block for agent parsing.
Replace brute-force bridge detection (deepcopy + is_connected per edge)
with nx.bridges() which runs a single O(V+E) DFS.

Also fixes a bug for disconnected molecules (e.g. 1esz, which has a
106-atom component + 1 isolated atom). The old code checked
nx.is_connected(G2) after removing each edge, but if the graph was
already disconnected, *every* edge removal produced a disconnected G2,
so the code never hit `continue`. Then the smallest connected component
was always the pre-existing isolated atom (size 1), so every edge was
filtered by the `len(l) < 2` guard, returning [] even though the large
component had 37 valid bridge torsions. nx.bridges() correctly
identifies bridge edges within each connected component regardless of
the overall graph connectivity.

Verified equivalent output on 500 PDBbind ligands (499/500 match; the
1 difference is the bug fix above).
Add --matching flag (none/original/improved) to select ground truth
conformer at training time. Cache now stores both DE-matched and
L-BFGS-B-matched conformer positions plus the crystal Mol object.
Epoch-end inference uses fresh RDKit conformers for realistic evaluation
and reports RMSD against both matched conformer and crystal pose.

Fix RDKit 2025 EmbedMolecule crash via RemoveStereochemistry, use
fused_tp CUDA kernels for torsion tensor product, and add diagnostic
print statements for cache build failures.
When EMA is disabled (the default), the training loop was still allocating
shadow params, running per-batch EMA updates, and doing 5 full parameter
copies per epoch (store/copy_to/deepcopy/restore/state_dict). This caused
visible GPU utilization dips at epoch boundaries. Now the EMA object is
simply not created when disabled, making the non-EMA path zero-cost.
Validation inference now supports generating multiple diffusion
samples per complex and reporting the best RMSD. Defaults to 1
(existing behavior).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant