FastPLMs parity with native embedding models

I benchmarked the FastPLM implementations of ESM2 (650M), ESMC (600M), and E1 (600M) vs the native implementation of ESMC 600M from EvolutionaryScale on an internal dataset of mine using a simple linear SVM to perform a classification task. Just to give a bit of information: the task has two different labeling granularities: one is a 4-class, and the other is a 10-class classification. I have split the data into 10 cross-validation splits for each homology-based threshold and trained a model for each of these (meaning these results are statistically less sensitive to exact splits or classification model parameters).

It seems like the FastPLMs ESMC implementation performs worse than the native one on average (each dot is the average of 10 CV splits). I unfortunately can't provide you with the data, but based on this, I would suggest you run your own benchmark to ensure parity with the base models.

<img width="447" height="706" alt="Image" src="https://github.com/user-attachments/assets/a7a8f9c1-84be-4b86-9829-dfad9a9768d0" />


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FastPLMs parity with native embedding models #27

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

FastPLMs parity with native embedding models #27

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions