Skip to content

Add pLDDT analysis script for FoldX5 vs FoldX5.1 classification agree…#70

Open
angelikivliora wants to merge 3 commits into
mainfrom
foldx-version-comparison-v2
Open

Add pLDDT analysis script for FoldX5 vs FoldX5.1 classification agree…#70
angelikivliora wants to merge 3 commits into
mainfrom
foldx-version-comparison-v2

Conversation

@angelikivliora
Copy link
Copy Markdown

@angelikivliora angelikivliora commented May 12, 2026

No description provided.

Comment on lines +80 to +82
except Exception as e:
print(f" could not process {protein_name}: {e}")
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here it would make more sense to exit with error - otherwise you just skip over cases that might have problems

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to raise a RuntimeError for unexpected failures, while still skipping proteins that fail due to a missing column with a warning. I checked all proteins in the FoldX5.1 dataset and found 7 cases where an expected column is missing from the CSV ( 6 proteins: CNOT3, DEFB108B, FUS, GPC3, MLF1, ZNF738 are missing the pLDDT column, and 1 protein: STKLD1 is missing the ddG column. So i treated them as expected failures and skipped while anything else will raise an error

Comment on lines +216 to +219
plot_plddt_distribution(df_mutations, args.output_dir)
plot_plddt_scatter(df_mutations, args.output_dir)
print("\n[3] per-protein summary...")
plot_per_protein_delta(df_mutations, args.output_dir)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

more of a note for the future:

if you're running 1 call per function, they usually don't need to be functions unless they have a more general use e.g. being imported

overusing function encapsulation can complicate things and makes the script more opaque

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I adapted the code to the comment

# Matching is done by UniProt ID + protein name
# RMSD is computed only on overlapping residues between the two structures
# Run: python rmsd_matrix.py --foldx5_dir /path/to/folder --foldx51_dir /path/to/folder --output_dir ./rmsd_results
# Run: python rmsd_matrix.py -f /data/user/shared_projects/mavisp_ensemble_sim_length/foldx5.1_evaluation/foldx5_initial_structures -i /data/user/shared_projects/mavisp_ensemble_sim_length/foldx5.1_evaluation/data_collection_foldx5.1 -o ./rmsd_results
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should be in a readme file, not in the code

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i removed it and also removed the scatter,violin, and barplot as discussed previously

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants