Skip to content

bug: Warning: Duplicate word in word2vec file #887

@bact

Description

@bact

Description

There are hundreds of warnings like this during unit test:

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first

Expected results

No warning.

Current results

(partial)

2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'ต่าง' in word2vec file, ignoring all but first
2023-12-11:03:40:47 WARNING  [gensim.models.keyedvectors:1909] duplicate word '	' in word2vec file, ignoring all but first
...
2023-12-11:03:40:57 WARNING  [gensim.models.keyedvectors:1909] duplicate word '' in word2vec file, ignoring all but first
2023-12-11:03:40:58 WARNING  [gensim.models.keyedvectors:1909] duplicate word 'หยับ' in word2vec file, ignoring all but first

Steps to reproduce

Run unit test

PyThaiNLP version

dev

Python version

3.8

Operating system and version

n/a

More info

No response

Possible solution

No response

Files

No response

Metadata

Metadata

Assignees

Labels

bugbugs in the library

Type

No type
No fields configured for issues without a type.

Projects

Status

To do

Relationships

None yet

Development

No branches or pull requests

Issue actions