Generic evaluation script: for a dataset, run all metric

As mentioned below, we wish to implement a generic evaluation method "for one dataset, run all metrics, do a correlation. 

Developing this for ASL Citizen would be good. We could then use it to run both the SignCLIP embeddings metric and the distance metrics in #4 


> so, what this file does, is get distances specifically signclip for asl_citizen?
> that is a start, but, it would be best if what we could do is:
> 
> Given a directory of poses in various classes, for example `poses/class/X`, can iterate over all of the metrics, and run them to calculate the k-nearest neighbor for each sample, for classification.
> Then, once we have like 8 metrics, we can run them all, and see which one gives the best classification score (and that would be considered the best metric, for form based comparison, for single signs) - see https://github.com/sign-language-processing/signwriting-evaluation/blob/main/signwriting_evaluation/evaluation/closest_matches.py#L93-L107
> 
> Another example, would be to have a directory of poses `poses-dgs/` where each pose has a `.txt` file associated with it. Let's assume 1000 sentences in german sign language and german.
> Then, we can perform an all-to-all similarity between the poses, and an all-to-all similarity between the texts (using [xCOMET](https://arxiv.org/pdf/2310.10482) for example) and perform a correlation study. whichever metric correlates best with xCOMET is the best metric for semantic sentence comparison.
> 
> What I am trying to say is: we develop a generic evaluation that tries to say something "for 1 dataset type, run all metrics, correlate with something"
> 
> and then we can perform this on many datasets and inform the reader about the best metrics.
> 
> Then, when someone comes and says "i developed a new metric" they run it on everything, like GLUE basically, and we can see the upsides and downsides.
> 
> _Originally posted by @AmitMY in https://github.com/sign-language-processing/pose-evaluation/pull/5#discussion_r1873528153_
>             

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Generic evaluation script: for a dataset, run all metric #7

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Generic evaluation script: for a dataset, run all metric #7

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions