Ek/initial experimentation by ekiefl · Pull Request #49 · CAODH/SurfDock

ekiefl · 2026-03-19T16:34:25Z

No description provided.

…search Remove tqdm progress bars, loguru logging, gc.collect calls, EMA, tensorboard, and checkpoint saving from the training loop. Output is now just epoch summaries + structured results block for agent parsing.

Replace brute-force bridge detection (deepcopy + is_connected per edge) with nx.bridges() which runs a single O(V+E) DFS. Also fixes a bug for disconnected molecules (e.g. 1esz, which has a 106-atom component + 1 isolated atom). The old code checked nx.is_connected(G2) after removing each edge, but if the graph was already disconnected, *every* edge removal produced a disconnected G2, so the code never hit `continue`. Then the smallest connected component was always the pre-existing isolated atom (size 1), so every edge was filtered by the `len(l) < 2` guard, returning [] even though the large component had 37 valid bridge torsions. nx.bridges() correctly identifies bridge edges within each connected component regardless of the overall graph connectivity. Verified equivalent output on 500 PDBbind ligands (499/500 match; the 1 difference is the bug fix above).

Add --matching flag (none/original/improved) to select ground truth conformer at training time. Cache now stores both DE-matched and L-BFGS-B-matched conformer positions plus the crystal Mol object. Epoch-end inference uses fresh RDKit conformers for realistic evaluation and reports RMSD against both matched conformer and crystal pose. Fix RDKit 2025 EmbedMolecule crash via RemoveStereochemistry, use fused_tp CUDA kernels for torsion tensor product, and add diagnostic print statements for cache build failures.

When EMA is disabled (the default), the training loop was still allocating shadow params, running per-batch EMA updates, and doing 5 full parameter copies per epoch (store/copy_to/deepcopy/restore/state_dict). This caused visible GPU utilization dips at epoch boundaries. Now the EMA object is simply not created when disabled, making the non-EMA path zero-cost.

Validation inference now supports generating multiple diffusion samples per complex and reporting the best RMSD. Defaults to 1 (existing behavior).

ekiefl added 24 commits March 17, 2026 15:20

Add docker file/config for calculating surfaces

794228f

Make distance threshold a variable

6294e04

Move to preprocessing/

6c98ab7

Checkpoint

a42fa57

CHeckpoint

7cd8e93

Successfully ran toy example

c38f170

scoring training works on CPU

0698df0

Replicate training in autorsearch/ subdir

a08e960

Move into prepare.py and train.py

2dfa612

Checkpoint

a640c61

drop autoresaearch

77db157

Training works on modal

659f841

Remove poorly behaved ligand (fragmented)

8e2ccaa

Touchups to dataloaders

37598d1

Avoid GPU->CPU->GPU. Saves ~5-10%

04c769b

Small optimization for H100

39839be

First port to cuequivariance

9e8de67

Finish port to cuequivariance

58120a4

Speed ups

d3e12d5

Add samples_per_complex for best-of-N validation inference

bf033de

Validation inference now supports generating multiple diffusion samples per complex and reporting the best RMSD. Defaults to 1 (existing behavior).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ek/initial experimentation#49

Ek/initial experimentation#49
ekiefl wants to merge 24 commits intoCAODH:masterfrom
Arcadia-Science:ek/initial-experimentation

ekiefl commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ekiefl commented Mar 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant