unexpected poor performance of e2efold

I have tested e2efold on a set of 361 PDB chains, where secondary structures for RNAs shorter than 600 nucleotides are predicted by  ``e2efold_productive/e2efold_productive_short.py``, while those longer than 600 nucleotides are predicted by ``e2efold_productive/e2efold_productive_long.py``.

To my big surprise, when evaluated against DSSR assigned canonical base pairs of this dataset, e2efold predicted *.ct files have very low average F1 and MCC of 0.2400 and 0.2401, respectively, which are significantly worse than SOTA methods mentioned in Table 2 of the e2efold paper (https://openreview.net/pdf?id=S1eALyrYDH). The following is my benchmark result, ranked in ascending order of F1 score.

| Method | F1 | MCC | Predicted base pairs per RNA |
| :---: | :---: | :---: | :---: |
| e2efold | 0.2400 | 0.2401 | 18.2133 |
| mfold | 0.6275 | 0.6285 | 32.4903 |
| RNAstructure (ProbablePair) | 0.6443 | 0.6475 | 29.4238 |
| CONTRAfold | 0.6617 | 0.6642 | 32.5845 |

I have attached the predicted ct files below. Additionally, I include the 4 sequences listed under e2efold_productive/*_seqs/*seq and make sure that my run generates identical ct files as the one shown in the github repository. 
[e2e.zip](https://github.com/ml4bio/e2efold/files/4783708/e2e.zip)

Could you check whether I run the e2efold program incorrectly and results in such a low performance? In particular, could you check why e2efold has on average only 18.2133 predicted base pairs per RNA chain, while the actual average number of canonical base pairs in the native structure is as many as 28.6648? Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unexpected poor performance of e2efold #5

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Method	F1	MCC	Predicted base pairs per RNA
e2efold	0.2400	0.2401	18.2133
mfold	0.6275	0.6285	32.4903
RNAstructure (ProbablePair)	0.6443	0.6475	29.4238
CONTRAfold	0.6617	0.6642	32.5845

unexpected poor performance of e2efold #5

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions