This repository contains a set of Python scripts designed to automate the process of designing custom RNA nanocages based on desired secondary structure scaffolds. It combines sequence design using ViennaRNA, 3D structure prediction using RNAComposer (via a Selenium web driver interface), and analysis/visualization using Biopython and Matplotlib.
The pipeline aims to generate stable RNA sequences that fold into specific target secondary structures (scaffolds) decorated with stabilizing motifs (GNRA, UUCG tetraloops, and Kissing Loops).
create_rna_data.py: Handles the sequence design and 3D structure prediction.- Defines several target RNA scaffolds (dot-bracket notation).
- Implements motif-aware sequence design using ViennaRNA's inverse folding function (
RNA.inverse_fold). - Submits the designed sequence and secondary structure to the RNAComposer web server via Selenium to obtain a 3D PDB structure.
process_rna_data.py: Performs analysis and visualization of the resulting structures.- Parses PDB or MMCIF files using Biopython.
- Predicts the Minimum Free Energy (MFE) secondary structure for the sequence using
RNA.fold. - Generates arc plots of the secondary structure using Matplotlib.
rna_visualizer.py: Displays the final 3D PDB structure in an interactive Mol*Star viewer using a Selenium web driver.demo.py: The main execution script that orchestrates the entire pipeline, from scaffold selection to final 3D visualization of the most stable candidate.
You need to have Python installed. The scripts rely on external libraries and tools:
- ViennaRNA Package: Used for folding and inverse folding.
- Biopython: Used for parsing PDB/CIF files.
- Selenium: Used to automate web interactions with RNAComposer and Mol*Star.
- Matplotlib: Used for 2D arc plot visualization.
- Requests: Used with Selenium for web interactions.
It is highly recommended to use a virtual environment.
# Create a virtual environment
python3 -m venv .venv
# Activate the environment
source .venv/bin/activate # On Linux/macOS
# .venv\Scripts\activate.bat # On Windows
# Install the required libraries
pip install -r requirements.txtThe primary entry point for the entire pipeline is demo.py. Here is the file structure represented in a tree format, showing the directories and the files.
RNA_tools
βββ demo.py
βββ create_rna_data.py
βββ process_rna_data.py
βββ rna_visualizer.py
|
βββ designed_sequences/
β βββ z_tile_tetramer_cand1.txt
β βββ z_tile_tetramer_cand2.txt
β βββ ...
|
βββ pdb_files/
β βββ new_RNA_1.pdb (Example file mentioned in process_rna_data.py)
β βββ new_RNA_2.pdb
β βββ ...
|
βββ MFE_test/
βββ output_new_RNA_1.pdb/
β βββ RNA_structure_A_sequence.txt
β βββ RNA_structure_A_secondary_structure.txt
β βββ RNA_structure_A_structure_arc_plot.png
βββ output_new_RNA_2.pdb/
βββ ... (Analysis files for other candidates)
- Make sure your virtual environment is activated (
source .venv/bin/activate). - Run the main script:
python demo.py- Scaffold Selection: A GUI prompt (via
easygui) will ask you to select a target secondary structure scaffold (e.g.,z_tile_tetramer,tetrahedron_wireframe, etc.). - Candidate Generation: The script will prompt for the number of candidates to generate.
- Inverse Folding:
create_rna_data.pydesigns multiple sequences that fit the target scaffold and the stabilizing motifs (GNRA, UUCG, Kissing Loops). - 3D Modeling (RNAComposer): Each designed sequence and predicted structure is submitted to the RNAComposer web server. This step is time-consuming (due to a required 35-second waiting period per candidate) because it requires a live internet connection. The resulting PDB files are saved in the
pdb_files/directory. - Analysis:
process_rna_data.pyreads the generated PDB files, calculates their actual MFE and secondary structure, and saves the data. - Selection: The candidate with the lowest (most negative) MFE is identified as the most stable design.
- Visualization: The most stable structure is opened in a Mol*Star web viewer for interactive 3D inspection.
The script generates the following directories:
designed_sequences/: Contains text files with sequence, target/predicted structure, MFE, and motif details for each generated candidate.pdb_files/: Contains the 3D structure files (PDB format) generated by RNAComposer.MFE_test/: Contains subdirectories for each candidate, holding their sequence, predicted secondary structure, MFE analysis, and a 2D arc plot visualization.analysis/: Temporary files used for MFE comparison.
-
scaffold: Dictionary defining pre-configured dot-bracket scaffolds. -
find_hairpin_loops(dot_bracket): Identifies regions in the scaffold for motif insertion. -
sample_motif_configuration(...): Chooses whether a loop receives a stabilizing tetraloop (GNRA/UUCG) or is paired as a Kissing Loop. -
motifs_to_constraints(...): Converts the chosen motifs into base-level constraints (e.g., 'G' allowed at position$i_0$ , 'A' allowed at position$i_0+3$ for a GNRA loop). -
inverse_fold_with_constraints(...): Performs the core sequence design using ViennaRNA'sRNA.inverse_foldwith custom base constraints. -
create_pdb_from_RNAComposer(...): Uses Selenium to interface with the RNAComposer server. Note: The fixedtime.sleep(35)is a necessary, albeit crude, way to wait for the web server to process the job.
plot_arc_diagram(...): Generates a 2D arc plot visualization of the secondary structure.process_structure_file(...): A unified function to parse PDB or MMCIF files, extract the sequence, predict its MFE secondary structure usingRNA.fold, and save the analysis/visualization.
represent(path_to_file): Uses Selenium to upload and display a local PDB file into the online Mol*Star viewer for interactive 3D visualization.
