Skip to content

PaulVerot03/hackaton-code

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 

Repository files navigation

🧬 RNA Parcel Service : Predicting and Designing RNA Nanocages

This repository contains a set of Python scripts designed to automate the process of designing custom RNA nanocages based on desired secondary structure scaffolds. It combines sequence design using ViennaRNA, 3D structure prediction using RNAComposer (via a Selenium web driver interface), and analysis/visualization using Biopython and Matplotlib.

🔬 Project Overview

The pipeline aims to generate stable RNA sequences that fold into specific target secondary structures (scaffolds) decorated with stabilizing motifs (GNRA, UUCG tetraloops, and Kissing Loops).

Key Components

  1. create_rna_data.py: Handles the sequence design and 3D structure prediction.
    • Defines several target RNA scaffolds (dot-bracket notation).
    • Implements motif-aware sequence design using ViennaRNA's inverse folding function (RNA.inverse_fold).
    • Submits the designed sequence and secondary structure to the RNAComposer web server via Selenium to obtain a 3D PDB structure.
  2. process_rna_data.py: Performs analysis and visualization of the resulting structures.
    • Parses PDB or MMCIF files using Biopython.
    • Predicts the Minimum Free Energy (MFE) secondary structure for the sequence using RNA.fold.
    • Generates arc plots of the secondary structure using Matplotlib.
  3. rna_visualizer.py: Displays the final 3D PDB structure in an interactive Mol*Star viewer using a Selenium web driver.
  4. demo.py: The main execution script that orchestrates the entire pipeline, from scaffold selection to final 3D visualization of the most stable candidate.

🛠️ Setup and Installation

Prerequisites

You need to have Python installed. The scripts rely on external libraries and tools:

  • ViennaRNA Package: Used for folding and inverse folding.
  • Biopython: Used for parsing PDB/CIF files.
  • Selenium: Used to automate web interactions with RNAComposer and Mol*Star.
  • Matplotlib: Used for 2D arc plot visualization.
  • Requests: Used with Selenium for web interactions.

Install Python Dependencies

It is highly recommended to use a virtual environment.

# Create a virtual environment
python3 -m venv .venv

# Activate the environment
source .venv/bin/activate  # On Linux/macOS
# .venv\Scripts\activate.bat  # On Windows

# Install the required libraries
pip install -r requirements.txt

🚀 Usage

The primary entry point for the entire pipeline is demo.py. Here is the file structure represented in a tree format, showing the directories and the files.

RNA_tools
├── demo.py
├── create_rna_data.py
├── process_rna_data.py
├── rna_visualizer.py
|
├── designed_sequences/
│   ├── z_tile_tetramer_cand1.txt
│   ├── z_tile_tetramer_cand2.txt
│   └── ...
|
├── pdb_files/
│   ├── new_RNA_1.pdb    (Example file mentioned in process_rna_data.py)
│   ├── new_RNA_2.pdb
│   └── ...
|
└── MFE_test/
    ├── output_new_RNA_1.pdb/
    │   ├── RNA_structure_A_sequence.txt
    │   ├── RNA_structure_A_secondary_structure.txt
    │   └── RNA_structure_A_structure_arc_plot.png
    └── output_new_RNA_2.pdb/
        └── ... (Analysis files for other candidates)

Running the Demo

  1. Make sure your virtual environment is activated (source .venv/bin/activate).
  2. Run the main script:
python demo.py

Script Workflow (demo.py)

  1. Scaffold Selection: A GUI prompt (via easygui) will ask you to select a target secondary structure scaffold (e.g., z_tile_tetramer, tetrahedron_wireframe, etc.).
  2. Candidate Generation: The script will prompt for the number of candidates to generate.
  3. Inverse Folding: create_rna_data.py designs multiple sequences that fit the target scaffold and the stabilizing motifs (GNRA, UUCG, Kissing Loops).
  4. 3D Modeling (RNAComposer): Each designed sequence and predicted structure is submitted to the RNAComposer web server. This step is time-consuming (due to a required 35-second waiting period per candidate) because it requires a live internet connection. The resulting PDB files are saved in the pdb_files/ directory.
  5. Analysis: process_rna_data.py reads the generated PDB files, calculates their actual MFE and secondary structure, and saves the data.
  6. Selection: The candidate with the lowest (most negative) MFE is identified as the most stable design.
  7. Visualization: The most stable structure is opened in a Mol*Star web viewer for interactive 3D inspection.

Output Files

The script generates the following directories:

  • designed_sequences/: Contains text files with sequence, target/predicted structure, MFE, and motif details for each generated candidate.
  • pdb_files/: Contains the 3D structure files (PDB format) generated by RNAComposer.
  • MFE_test/: Contains subdirectories for each candidate, holding their sequence, predicted secondary structure, MFE analysis, and a 2D arc plot visualization.
  • analysis/: Temporary files used for MFE comparison.

📜 Code Structure & Details

create_rna_data.py

  • scaffold: Dictionary defining pre-configured dot-bracket scaffolds.
  • find_hairpin_loops(dot_bracket): Identifies regions in the scaffold for motif insertion.
  • sample_motif_configuration(...): Chooses whether a loop receives a stabilizing tetraloop (GNRA/UUCG) or is paired as a Kissing Loop.
  • motifs_to_constraints(...): Converts the chosen motifs into base-level constraints (e.g., 'G' allowed at position $i_0$, 'A' allowed at position $i_0+3$ for a GNRA loop).
  • inverse_fold_with_constraints(...): Performs the core sequence design using ViennaRNA's RNA.inverse_fold with custom base constraints.
  • create_pdb_from_RNAComposer(...): Uses Selenium to interface with the RNAComposer server. Note: The fixed time.sleep(35) is a necessary, albeit crude, way to wait for the web server to process the job.

process_rna_data.py

  • plot_arc_diagram(...): Generates a 2D arc plot visualization of the secondary structure.
  • process_structure_file(...): A unified function to parse PDB or MMCIF files, extract the sequence, predict its MFE secondary structure using RNA.fold, and save the analysis/visualization.

rna_visualizer.py

  • represent(path_to_file): Uses Selenium to upload and display a local PDB file into the online Mol*Star viewer for interactive 3D visualization.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors