Regarding the spans in the contrastive loss calculation

Hello,

In the paper it is stated that
> ... given a random list of n documents [d1, d2, ..., dn], we extract randomly from each a pair of spans, [s11, s12, ..., sn1, sn2].

I was wondering how the spans were extracted from a document. Are they sentences, each of which is split by nltk.sentence_tokenizer? Or, are they equally sized chunks extracted using a sliding window? Maybe they are the same as the Condenser pretraining blocks but annotated with a document id to which they belong?

Thank you.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Regarding the spans in the contrastive loss calculation #5

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Regarding the spans in the contrastive loss calculation #5

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions