What feature would you like to request?
Currently, fastembed supports BM25 sparse embeddings, but it lacks optimized support for Chinese text. Standard whitespace tokenization is ineffective for Chinese, as the language does not use spaces between words. This leads to poor retrieval performance when using fastembed's BM25 implementation for Chinese datasets.
Is there any additional information you would like to provide?
No response
What feature would you like to request?
Currently, fastembed supports BM25 sparse embeddings, but it lacks optimized support for Chinese text. Standard whitespace tokenization is ineffective for Chinese, as the language does not use spaces between words. This leads to poor retrieval performance when using fastembed's BM25 implementation for Chinese datasets.
Is there any additional information you would like to provide?
No response