Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions models/rfd3/docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ Tutorials

tutorials/ppi_design_tutorial.md
tutorials/enzyme_design_tutorial.md
tutorials/na_binder_tutorial.md

Examples
--------
Expand Down
215 changes: 215 additions & 0 deletions models/rfd3/docs/tutorials/na_binder_tutorial.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,215 @@
# Nuclei Acid Binder Design in RFdiffusion3

## Before We Get Started...
This tutorial does not cover installing RFD3. Before continuing, you should make sure that RFdiffusion3 (RFD3) is installed and runnable on your system. See the [README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for installation instructions.

```{note}
You will need to clone the repository to access the tutorial files. Using the `pip` commands to install the model does not automatically download the files required to complete this tutorial.
```

RFD3 runs best on GPUs. It is suggested to follow this tutorial on an interactive GPU node if you have access to one.

You will need the file `2r5z.pdb`. This is provided in [`foundry/models/rfd3/docs/input_pdbs`](../input_pdbs/2r5z.pdb). You can clone the [`foundry`](https://github.com/RosettaCommons/foundry) repository to easily access files related to this tutorial.

(na-learning-objectives)=
## Learning Objectives
In this tutorial, we will design a DNA-protein complex to explore the settings available in RFD3 that are useful for nucleic acid binder design.
(na-setup)=
## Setup
Create a directory named `rfd3_na_tutorial` and `cd` into it:
```bash
mkdir rfd3_na_tutorial && cd rfd3_na_tutorial
```
This is where you will be storing the files related to this tutorial.

If you would like to compare your outputs against those generated by the authors of this tutorial, you can find pre-generated output files in `foundry/models/rfd3/docs/tutorials/na_tutorial_files`.
The 'basic' zip file contains outputs that did not use the setting discussed in the [Additional Constraints](#na-additional-constraints) section. The 'hbond' zip file has the outputs resulting from adding [hydrogen bond conditioning constraints](#na-hydrogen-bond-conditioning) and the 'unfix' zip file has the outputs resulting from adding a [constraint that allows the input sequence to be modified](#na-unfix-sequence).

There is also a pre-made JSON file available in `foundry/models/rfd3/docs/na_tutorial_files`. We recommend following the tutorial to create this file yourself to better understand the RFD3 options that are relevant to nucleic acid binder design.

(na-creating-the-yaml-file)=
## Creating the JSON file
In this tutorial, we will be briefly describing each of the settings we will be using for this example.

1. Using your editor of choice, open a new file called `rfd3_na_tutorial.json`. This is where we will be storing the options we will use to constrain our enzyme design.
1. This is a JSON file, so all of the options contained in it need to be encapsulated in curly braces ({}). Go ahead and add a pair of these to your file.
1. Like all designs you will create using RFD3, we need to start by giving our calculation a name. It should be short, but descriptive, so let's call it `dsDNA_complex`. Add this name in quotes to your file and place a colon and another pair of curly brackets after this. Your file should now look like:
```json
{
"dsDNA_complex":{

}
}
```
1. Next we need to specify the structure file (PDB, CIF, etc.) that contains information about any input structures related to our calculation:
```json
"input": "path/to/2r5z.pdb",
```
1. To define the portions of our final structure that will be defined vs. taken from our input structure file we will use the `contig` option:
```json
"contig": "C5-18,/0,D24-37,/0,40-50,A146-154,80-90",
```
Let's break down what's going on here a bit further:
- `C5-18`: Our final design will start with residues C5-C18 from our input PDB
- `/0`: This indicates a chain break, the C5-18 residues will not be connected to anything coming next in our `contig` string.
- `D24-27`: After the chain break we have residues D24-37 from the input structure file.
- `/0,40-50`: After another chain break RFD3 will design a segment with 40-50 residues.
- `A146-154`: residues A146-154 from the input structure will be connected to the C-terminus of the designed residues
- `80-90`: RFD3 will design 80-90 residues connected to the C-terminus of A154 from the input structure
1. Since there are two portions of the designed structure with random lengths, it is useful to specify the overall `length` of our design:
```json
"length": "157-177",
```
1. For the purposes of this design, we happen to know that residues B251-B255 are important to include in our design, but it does not matter where they end up in our final structure. This is referred to as an 'unindexed motif' in the documentation. To include them, we will add the `undindex` option:
```json
"unindex": "/0,/0,B251-B255",
```
Here we have two chain breaks before our unindexed motif to correspond to the contig string, these residues will go in the third chain of the output structure.
1. Next, the portions of our input structure we specified in the `contig` string are automatically held fixed, however it is useful to let some of these residues move in response to the the designed portions of our structure. Here we want certain portions of our DNA strands to be stationary (the middle sections) while the portions towards either end of the double helix can relax:
```json
"select_fixed_atoms": {
"C9-14":"ALL",
"D28-33":"ALL",
"C5-8,C15-18": "",
"D24-27,D34-37": ""
},
```
```{figure} ../.assets/na_tutorial/select_fixed_atoms.png
:width: 60%

Image of the input structure with the fixed residues highlighted in cyan and the residues allowed to move highlighted in red.
```

```{note}
Holding residues C9-14 and D28-33 is not necessary here and was included for the sake of clarity.
```
1. To define where the center of mass of our designed structure should go, we will use an ORI (origin) token:

```json
"ori_token":[25,35,20],
```

```{important}
In this example the ori token is placed close to the center of our input structure. When designing your own enzyme scaffolds, you should try many ORI token placements. See the [RFdiffusion2 paper](https://www.nature.com/articles/s41592-025-02975-x) for more information about how ORI tokens impact the results of diffusion calculations.
```

```{figure} ../.assets/na_tutorial/pseudoatom.png
:width: 60%

The input structure with the addition of a white sphere to represent the location of the ORI token.
```

1. Last, but not least, we want our design to have minimal loops. We will use the `is_non_loopy` option for this. As of the creation of this tutorial, there are no other parameters to control the secondary structure of the designs from RFD3.
```json
"is_non_loopy": true
```
1. Save your file and close it. If you run the file now, your files should be similar to what is stored in `foundry/models/rfd3/docs/tutorials/na_tutorial_files/outputs.zip`.

(na-additional-constraints)=
### Additional Constraints
The steps in this section are optional, skip to [Running RFD3](#na-running-rfd3). If you would like to include them, reopen your JSON file and append to what you added in the previous section.

(na-hydrogen-bond-conditioning)=
#### Hydrogen Bond Conditioning
```{important}
Using hydrogen bond conditioning with RFdiffusion3 requires having HBPLUS installed. See the [RFdiffusion3 README](https://github.com/RosettaCommons/foundry/tree/production/models/rfd3) for more information.

```
Hydrogen bond conditioning can be useful in the design of nucleic acid binders. Here we will apply it to some backbone and base atoms for a few of the DNA bases:
```json
"select_hbond_acceptor": {"C16":"N7,O6", "D31-32":"N7", "D28-30":"OP1,OP2,O3',O5'"},
"select_hbond_donor": {"D31-32":"N6"}
```

(na-unfix-sequence)=
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stray text leftover ? or something is funny with formatting here ? . shows up like below

Image

#### Unfix Sequence
The steps in this section are optional, skip to [Running RFD3](#na-running-rfd3). If you would like to include them, reopen your JSON file and add the following:
1. Add the backbone of the unindexed motif (B251-255) to the list of atoms being fixed:
```json
"select_fixed_atoms": {
"C9-14":"ALL",
"D28-33":"ALL",
"C5-8,C15-18": "",
"D24-27,D34-37": "",
"B251-255": "BKBN"
},
```
2. Unfix the sequence for the unindexed motif:
```json
"select_unfixed_sequence": "B251-255"
```
These constraints will keep this portion of the protein backbone in place while allowing the side chains to change.


(na-running-rfd3)=
## Running RFD3
To actually run RFD3 you need to know:
- the directory you want the outputs to be stored in
- the path to the JSON (or YAML) file that stores the specific settings for the calculation
- the location of your checkpoint files

Once you have these three things you can run something like this from the command line:
```bash
rfd3 design out_dir=na_tutorial_outputs/0 inputs=rfd3_na_tutorial.json ckpt_path=/path/to/your/checkpoint/files/rfd3_latest.ckpt
```

Your output files will be placed in a new directory `na_tutorial_outputs/0`. Your output files will be named `rfd3_na_tutorial_dsDNA_complex_0_model_n.cif.gz` where `n` is the number of the design. `rfd3_na_tutorial` comes from the name of the JSON file and `dsDNA_complex` comes from the name you gave your calculation in the JSON file.

```{note}
You may see several warning messages when you run RFD3, these should not interfere with the calculation.
```

(na-analyzing-the-outputs)=
## Analyzing the Outputs
You should end up with 8 designs, numbered 0-7, each with its own `.cif.gz` and `.json` file. If you want to adjust the number of output designs, add the configuration option `diffusion_batch_size` to your `rfd3 design` command.

The JSON file has many details about your diffusion run, including the options in the JSON file you created. The compressed CIF file contains information about the final diffused structure that you can easily visualize with tools like PyMOL.

Your results should look something like this:
```{figure} ../.assets/na_tutorial/example_output.png
Copy link
Collaborator

@timkartar timkartar Feb 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I look at the md file on the PR branch, the images do not show up. maybe something like below is needed ?

<div align="center">
  <img src="../.assets/na_tutorial/example_output.png" alt="Protein-DNA complex prediction" width="400">
</div>

:width: 60%

Image of a possible output for this calculation visualized in PyMOL.
```

However, if we visualize the location of our original ORI token it is no where near the center of our output structure! This is because RFD3 has completely moved our structure in coordinate space, and has moved the ORI token with it. In the output JSON files for your designs, the new location of the ORI token can be found by looking at the `diffused_com`.

```{figure} ../.assets/na_tutorial/ori_token_output.png
:width: 100%

The input structure is in green with the original location of the ORI token represented by a light pink sphere. An example output structure is shown in cyan with the adjusted location of the ORI token shown as a red sphere.
```

You can also align the 2R5Z structure with any of the outputs to see that the atoms selected in `select_fixed_atoms` have stayed in the same physical locations.

```{figure} ../.assets/na_tutorial/select_fixed_atoms_output.png
:width: 60%

The DNA strands from the input (orange, light green, lightpink) and a possible output structure (dark green, dark pink). The portions held fixed are in pink and the portions allowed to move are in green.
```

Checking that our unindexed motif is present in our designs is a bit more difficult, but the information we need is provided in the JSON file that is created with each design. If you open one of these JSON files, the first piece of information you see is a `diffused_index_map` that connects the residues from the input to residues in the output design. You should see that residues B251-255 have been mapped to residues in your output structure.

```{figure} ../.assets/na_tutorial/unfixed_motif_out.png
:width: 60%

Input structure (green) and model_0 (cyan) from the basic.zip outputs with the unindexed motif highlighted in magenta. For this output (model_0 in outputs.zip), the unindexed motif is found in residues C130-134.
```

If you added the additional hydrogen bonding constraints, the outputs should look very similar to those shown from the 'basic' calculation. The hydrogen bond conditioning is a statistical effect and difficult to visualize when so many other constraints have been applied.

If you added the `select_unfixed_sequence` constraint, you will see that your output JSON files still have a mapping between residues B251-255 in the input to the output structure and that the backbones have remained the same but the side chains have changed. For example:

```{figure} ../.assets/na_tutorial/unfix_sequence_output.png
:width: 60%

The input structure (green) and output structure (cyan) with the unindexed motif colored pink. Note how the backbone structures have remained the same, but the side chains have changed.
```

## What's Next?
For your actual projects, you would want to filter the designed structures based on metrics relevant to your design task. Then, even though RFD3 outputs come with a sequence, it is recommended to still use sequence design tools ([MPNN](https://rosettacommons.github.io/foundry/models/mpnn/index.html)) to redesign the sequence. Finally you will want to see if the sequence refolds into a similar structure as was predicted by RFD3 using tools like [RosettaFold3](https://www.biorxiv.org/content/10.1101/2025.08.14.670328v2).

(na-references-and-further-reading)=
## References and Further Reading
- For more information on the different inference settings in RFD3, see [input.md](input.md)
- A more thorough discussion of the settings and configuration options in RFD3 can be found [here](intro_inference_calculations.md)
Binary file not shown.
Binary file not shown.
16 changes: 16 additions & 0 deletions models/rfd3/docs/tutorials/na_tutorial_files/rfd3_na_tutorial.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
{
"dsDNA_complex": {
"input": "./input_pdbs/2r5z.pdb",
"contig": "C5-18,/0,D24-37,/0,40-50,A146-154,80-90",
"length": "157-177",
"unindex": "/0,/0,B251-B255",
"select_fixed_atoms": {
"C9-14":"ALL",
"D28-33":"ALL",
"C5-8,C15-18": "",
"D24-27,D34-37": ""
},
"ori_token":[25,35,20],
"is_non_loopy": true
}
}
Binary file not shown.
6 changes: 3 additions & 3 deletions models/rfd3/docs/tutorials/ppi_design_tutorial.md
Original file line number Diff line number Diff line change
Expand Up @@ -154,9 +154,9 @@ Feel free to go through the [other tutorials](https://rosettacommons.github.io/f

(ppi-references-and-further-reading)=
## References and Further Reading
- For more information on the different inference settings in RFD3, see [input.md](../input.md)
- For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joseph L. Watson, et al.
- A more thorough discussion of the settings and configuration options in RFD3 can be found [here](../intro_inference_calculations.md)
- For more information on the different inference settings in RFD3, see [input.md](input.md)
- For more information on the example used here, see [*De novo design of protein structure and function with RFdiffusion*](https://www.nature.com/articles/s41586-023-06415-8#Sec12) by Joeseph L. Watson, et. al.
- A more thorough discussion of the settings and configuration options in RFD3 can be found [here](intro_inference_calculations.md)



Loading