What happened?
A bug happened!
I am using embedding_model = LateInteractionTextEmbedding("colbert-ir/colbertv2.0") in fastembed==0.3.0 and
following the code in https://qdrant.github.io/fastembed/examples/ColBERT_with_FastEmbed/#colbert-in-fastembed.
I created embeddings for some documents. However, I got an error on this part of the code when running on large collection of documents:
sorted_indices = compute_relevance_scores(
np.array(query_embeddings[0]), np.array(document_embeddings), k=3
)
complaining that it can not create np.array from document_embeddings. Looking into it I realized that sizes of each document_embedding in document_embeddings are different. For instance for 442 documents, the first ~260 documents have embedding size of (182,128) and for the next half the document embedding size is (164, 128). I am wondering if you can help me with that. Thanks.
What Python version are you on? e.g. python --version
Python 3.12
Version
0.2.7 (Latest)
What os are you seeing the problem on?
MacOS
Relevant stack traces and/or logs
Traceback (most recent call last):
np.array(query_embeddings[0]), np.array(document_embeddings), k=3
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (442,) + inhomogeneous part.
What happened?
A bug happened!
I am using embedding_model = LateInteractionTextEmbedding("colbert-ir/colbertv2.0") in fastembed==0.3.0 and
following the code in https://qdrant.github.io/fastembed/examples/ColBERT_with_FastEmbed/#colbert-in-fastembed.
I created embeddings for some documents. However, I got an error on this part of the code when running on large collection of documents:
sorted_indices = compute_relevance_scores(
np.array(query_embeddings[0]), np.array(document_embeddings), k=3
)
complaining that it can not create np.array from document_embeddings. Looking into it I realized that sizes of each document_embedding in document_embeddings are different. For instance for 442 documents, the first ~260 documents have embedding size of (182,128) and for the next half the document embedding size is (164, 128). I am wondering if you can help me with that. Thanks.
What Python version are you on? e.g. python --version
Python 3.12
Version
0.2.7 (Latest)
What os are you seeing the problem on?
MacOS
Relevant stack traces and/or logs
Traceback (most recent call last): np.array(query_embeddings[0]), np.array(document_embeddings), k=3 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 1 dimensions. The detected shape was (442,) + inhomogeneous part.