ProQ_scripts/README at master · bjornwallner/ProQ_scripts · GitHub

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
PROQ SCRIPTS
Contact: bjornw@ifm.liu.se
git: https://github.com/bjornwallner/ProQ_scripts.git


ProQ is Model Quality Assessment Program that predictis the quality of
individual models. To do this ProQ uses both information that can be
calculated from the 3D coordinates of the model as well as information
that can be predicted from the sequence. Of course the information
that comes from the sequence is the same for all models of the same
sequence. Thus, before scoring models the sequence specific features
needs to be calculated. The ProQ_scripts folder contains the scripts
and programs needed to generate the sequence specific input features
to the ProQ2 and ProQM scoring functions in Rosetta.


INSTALLATION of ProQ2/ProQM scoring functions

1) git clone https://github.com/bjornwallner/ProQ_scripts
2) Download the latest weekly release of Rosetta from
https://www.rosettacommons.org/software/, you need a license but it's
free for academia.
3) compile rosetta according to the instructions.


INSTALLATION of ProQ_scripts to calculate sequence specific features

To install the programs to calculate all sequence specific features
you need to do the following steps

1) Set the $DB variable in bin/run_all_external.pl to a formated
sequence database e.g.  uniref90. (e.g. download the fasta sequences
and run formatdb from the blast package)
2) run ./configure.pl

TEST
To check that everything is set up
run ./run_test.sh
This should create all the files needed without any error messages.


RUNNING
Once set up, the main wrapper program 'bin/run_all_external.pl' will run
everything using either a fasta file or extracting the sequence from a
pdb

bash$ bin/run_all_external.pl
Usage: run_all_external.pl -pdb [pdbfile] -fasta [fastafile] -membrane 1 (if membrane protein) -overwrite 1 (if overwrite)

For water soluable proteins run:
bin/run_all_external.pl -pdb <pdbfile>
OR
bin/run_all_external.pl -fasta <fastafile>

For membrane proteins run:
bin/run_all_external.pl -pdb <pdbfile> -membrane 1
OR
bin/run_all_external.pl -fasta <fastafile> -membrane 1

To use the sequence specific files in rosetta you need to use the flag
-ProQ:basename <basename>, where <basename> is either
the input pdbfile or input fastafile used to generate the sequence
specific features. All features have the following naming: basename.*,
e.g. basename.ss2 for secondary structure prediction or basename.mtx
for psiblast profile etc.

Currently there are no functionality in Rosetta to account for models
that have missing residues. These have to be handled outside, by first
creating the sequence features for the full length sequence and than
use the script 'copy_features_from_master.pl'. For the script to work
you need to install the global alignment program 'needle' in the
EMBOSS package (http://emboss.sourceforge.net/, or sudo apt-get
install emboss)

The following example will create all sequence specifici features for
the full.seq sequence, and than copy the relevant features to the
model1-5.pdb, by aligning the sequences of model1-5.pdb to full.seq.

./run_all_external.pl -fasta full.seq
./copy_features_from_master.pl model1.pdb full.seq
./copy_features_from_master.pl model2.pdb full.seq
./copy_features_from_master.pl model3.pdb full.seq
./copy_features_from_master.pl model4.pdb full.seq
./copy_features_from_master.pl model5.pdb full.seq

RUNNING ProQ or ProQM on a pdb with the name 1abc.pdb

1) Create the sequence specific features as described above. You do this once for a given sequence.
   i.e. bin/run_all_external.pl -pdb 1abc.pdb, will create files with the basename 1abc.pdb
   For membrane proteins
   bin/run_all_external.pl -pdb 1abc.pdb -membrane 1

2a) to score all *.pdb run:
   score.linuxgccrelease -database <rosetta database> -in:file:fullatom -ProQ:basename 1abc.pdb -in:file:s *.pdb -out:file:scorefile ProQ.sc -score:weights ProQ2
For membrane protein:
   score.linuxgccrelease -database <rosetta database> -in:file:fullatom -ProQ:basename 1abc.pdb -in:file:s *.pdb -out:file:scorefile ProQM.sc -score:weights ProQM -ProQ:membrane

This will produce a scorefile with the a global score corresponding to
summed local scores. To get the usual ProQ/ProQM score divide by
length or run the scoring with -ProQ:normalize <target length> to get
normalized scores directly.

To get the local scores run the scoring with
-ProQ:output_local_prediction, this will generate a local quality
estimates with the name ProQM.<pdb>


2b) Side-chain sampling before score


USING PROQ2 (NON-MEMBRANE)
Generate 10 different side-chain conformations for each pdb:
relax.linuxgccrelease  -database <rosetta database> -in:file:fullatom -out:file:silent_struct_type binary -nstruct 10 -relax_script <dir_ProQ_scripts>/resampling/repack.script -in:file:s *.pdb -out:file:silent ProQ2.repacked.silent

Score each of these with ProQ2:
score.linuxgccrelease -database <rosetta database> -in:file:fullatom -score:weights ProQ2 -in:file:silent ProQ2.repacked.silent -out:file:scorefile ProQ2.repacked.silent.score

Select the model with the highest ProQ2 score.

USING PROQM (MEMBRANE)
Generate 10 different side-chain conformations for each pdb:
relax.linuxgccrelease  -database <rosetta database> -in:file:fullatom -out:file:silent_struct_type binary -relax:membrane -membrane:Membed_init -score:weights membrane_highres.wts -in:file:spanfile $basename.span -nstruct 10 -relax_script <dir_ProQ_scripts>/resampling/repack.script -in:file:s *.pdb -out:file:silent ProQM.repacked.silent

Score each of these with ProQM:
score.linuxgccrelease -database <rosetta database> -in:file:fullatom -ProQ:membrane -score:weights ProQM -in:file:silent ProQM.repacked.silent -out:file:scorefile ProQM.repacked.silent.score

Select the model with the highest ProQM score.


2c) ProQM-resample
Follow the instruction above to install ProQ_scripts

The command:
    resampling/resampling.pl <any pdb in the folder (this is only used to get the sequence)>

will do 1+2b (calculate sequence specific features+resampling+scoring)
on all *.pdb in the folder from where it is executed. All *.pdb needs
to be for the same sequence.