Skip to content

FastPLMs parity with native embedding models #27

@avivko

Description

@avivko

I benchmarked the FastPLM implementations of ESM2 (650M), ESMC (600M), and E1 (600M) vs the native implementation of ESMC 600M from EvolutionaryScale on an internal dataset of mine using a simple linear SVM to perform a classification task. Just to give a bit of information: the task has two different labeling granularities: one is a 4-class, and the other is a 10-class classification. I have split the data into 10 cross-validation splits for each homology-based threshold and trained a model for each of these (meaning these results are statistically less sensitive to exact splits or classification model parameters).

It seems like the FastPLMs ESMC implementation performs worse than the native one on average (each dot is the average of 10 CV splits). I unfortunately can't provide you with the data, but based on this, I would suggest you run your own benchmark to ensure parity with the base models.

Image

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions