Skip to content

Bert clips text when word is unknown token#168

Open
amalal-mazrua wants to merge 2 commits into
CAMeL-Lab:masterfrom
amalal-mazrua:BERT_clips_text_when_word_is_unknown_token
Open

Bert clips text when word is unknown token#168
amalal-mazrua wants to merge 2 commits into
CAMeL-Lab:masterfrom
amalal-mazrua:BERT_clips_text_when_word_is_unknown_token

Conversation

@amalal-mazrua
Copy link
Copy Markdown

@amalal-mazrua amalal-mazrua commented Apr 10, 2026

fix [BUG] BERTUnfactoredDisambiguator.pretrained() clips text when special character � exists #159

when bert tokanizer does not recognize the word it return [] which result in clip text.

such as when sentence contains special character � (or any unrecognised word) it will then clips last tokens of text.

@owo owo assigned owo and go-inoue Apr 12, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants