Conversation
✅ Deploy Preview for poetic-froyo-8baba7 ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
There was a problem hiding this comment.
Pull request overview
This PR prevents NaN values in query vectors from being sent via the gRPC client, aligning gRPC behavior with the local and HTTP clients to avoid returning results with nan scores.
Changes:
- Add NaN validation to REST→gRPC vector conversions and raise an error before issuing gRPC requests.
- Add/extend tests to ensure gRPC queries (and conversion helpers) reject NaN vectors.
- Minor formatting/whitespace cleanups in tests, docs tooling, proto comments, and docs assets.
Reviewed changes
Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
qdrant_client/conversions/conversion.py |
Adds NaN validation for dense/sparse vectors during REST→gRPC conversion. |
tests/congruence_tests/test_query.py |
Updates congruence test to assert gRPC also rejects NaN queries. |
tests/conversions/test_validate_conversions.py |
Adds unit tests ensuring conversion helpers raise on NaN vectors. |
tests/test_migrate.py |
Minor formatting fix (trailing comma / compact init). |
tests/congruence_tests/test_sparse_updates.py |
Minor formatting simplification. |
tests/congruence_tests/test_multivector_updates.py |
Minor formatting simplification. |
tools/generate_docs.sh |
Minor formatting/line normalization. |
qdrant_client/proto/points_service.proto |
Whitespace cleanup in comments. |
docs/images/api-icon.svg |
Line normalization (no functional change). |
docs/.gitignore |
Line normalization (no functional change). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| UnexpectedResponse: If the vector contains NaN values | ||
| """ | ||
| if any(math.isnan(val) for val in vector): | ||
| from qdrant_client.http.exceptions import UnexpectedResponse | ||
| from httpx import Headers | ||
| import json | ||
|
|
||
| raise UnexpectedResponse( | ||
| status_code=400, | ||
| reason_phrase="Bad Request", | ||
| content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(), | ||
| headers=Headers({}), | ||
| ) | ||
|
|
There was a problem hiding this comment.
_validate_no_nan raises qdrant_client.http.exceptions.UnexpectedResponse by manually constructing an HTTP-like response (status_code/reason_phrase/headers/content) even though this is client-side validation in the gRPC conversion layer. This is semantically misleading and couples gRPC-only usage to HTTP error types; consider raising a dedicated client-side validation exception (or ValueError) and letting higher-level client code translate it, or otherwise centralize this error construction so it matches the rest of the client's error semantics.
| UnexpectedResponse: If the vector contains NaN values | |
| """ | |
| if any(math.isnan(val) for val in vector): | |
| from qdrant_client.http.exceptions import UnexpectedResponse | |
| from httpx import Headers | |
| import json | |
| raise UnexpectedResponse( | |
| status_code=400, | |
| reason_phrase="Bad Request", | |
| content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(), | |
| headers=Headers({}), | |
| ) | |
| ValueError: If the vector contains NaN values | |
| """ | |
| if any(math.isnan(val) for val in vector): | |
| raise ValueError("Query vector must not contain NaN values") |
📝 WalkthroughWalkthroughThis pull request introduces NaN validation for vector conversions in the qdrant_client library. A new private utility function Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes 🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
tests/congruence_tests/test_query.py (1)
1550-1551: Consider adding sparse and multi-dense NaN coverage to this congruence test.The congruence test
test_query_with_nancurrently exercises only the dense vector NaN path for gRPC. Sparse and multi-dense NaN rejections are only covered at the unit level (test_validate_conversions.py). Adding end-to-end congruence assertions for those paths here would provide stronger integration confidence.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/congruence_tests/test_query.py` around lines 1550 - 1551, The congruence test only covers the dense-vector NaN path; extend test_query_with_nan to also exercise sparse and multi-dense NaN rejection by invoking grpc_client.query_points with queries that include NaN in a sparse vector and in a multi-dense payload and asserting they raise UnexpectedResponse (same as the existing dense case using COLLECTION_NAME and query), i.e., add additional pytest.raises(UnexpectedResponse) calls around grpc_client.query_points for the sparse and multi-dense query variants so the end-to-end gRPC behavior matches the unit-level tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@qdrant_client/conversions/conversion.py`:
- Around line 3601-3619: _in _validate_no_nan replace the NaN-only check with a
finite check by using math.isfinite (i.e., raise when not math.isfinite(val)) so
±Inf are rejected too, and move the lazy imports (json, Headers from httpx, and
UnexpectedResponse from qdrant_client.http.exceptions) to the module top-level
instead of importing them inside the exception branch; keep the same raised
UnexpectedResponse construction and message but use the module-level names.
---
Nitpick comments:
In `@tests/congruence_tests/test_query.py`:
- Around line 1550-1551: The congruence test only covers the dense-vector NaN
path; extend test_query_with_nan to also exercise sparse and multi-dense NaN
rejection by invoking grpc_client.query_points with queries that include NaN in
a sparse vector and in a multi-dense payload and asserting they raise
UnexpectedResponse (same as the existing dense case using COLLECTION_NAME and
query), i.e., add additional pytest.raises(UnexpectedResponse) calls around
grpc_client.query_points for the sparse and multi-dense query variants so the
end-to-end gRPC behavior matches the unit-level tests.
| @staticmethod | ||
| def _validate_no_nan(vector: list[float]) -> None: | ||
| """validate that vector does not contain NaN values | ||
| Args: | ||
| vector: List of float values to validate | ||
| Raises: | ||
| UnexpectedResponse: If the vector contains NaN values | ||
| """ | ||
| if any(math.isnan(val) for val in vector): | ||
| from qdrant_client.http.exceptions import UnexpectedResponse | ||
| from httpx import Headers | ||
| import json | ||
|
|
||
| raise UnexpectedResponse( | ||
| status_code=400, | ||
| reason_phrase="Bad Request", | ||
| content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(), | ||
| headers=Headers({}), | ||
| ) |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
# Check top-level imports in conversion.py
echo "=== First 50 lines of conversion.py (checking imports) ==="
head -50 qdrant_client/conversions/conversion.py
echo ""
echo "=== Check if json is already imported at module level ==="
rg -n "^import json" qdrant_client/conversions/conversion.py
echo ""
echo "=== Check if Headers is imported at module level ==="
rg -n "from httpx import|import httpx" qdrant_client/conversions/conversion.py
echo ""
echo "=== Check if UnexpectedResponse is imported at module level ==="
rg -n "from qdrant_client.http.exceptions import" qdrant_client/conversions/conversion.pyRepository: qdrant/qdrant-client
Length of output: 2204
🏁 Script executed:
#!/bin/bash
# Check how other parts of the project validate vectors or handle NaN/Inf
echo "=== Check for any existing Inf validation in the codebase ==="
rg "isfinite|isinf|math.isnan" qdrant_client/ --type=py -C2
echo ""
echo "=== Check HTTP client exception usage ==="
rg "UnexpectedResponse" qdrant_client/ --type=py | head -20Repository: qdrant/qdrant-client
Length of output: 2435
🏁 Script executed:
#!/bin/bash
# Check if Headers is used elsewhere in the conversion.py file
echo "=== Check all uses of Headers in conversion.py ==="
rg "Headers" qdrant_client/conversions/conversion.py -n
echo ""
echo "=== Check all uses of json in conversion.py ==="
rg "json\." qdrant_client/conversions/conversion.py -n | head -10Repository: qdrant/qdrant-client
Length of output: 353
Broaden validation to reject ±Inf and move imports to module level.
Two valid observations on _validate_no_nan:
-
Infinity values (
float('inf'),float('-inf')) are equally problematic in distance computations and will propagate silently like NaN. The codebase already validates this pattern elsewhere (seelocal_collection.py:isfinite(),distances.py:isinf(),hybrid/formula.py:isfinite()). Replacemath.isnan(val)withnot math.isfinite(val)for consistency. -
Lazy imports for
json,Headers, andUnexpectedResponseinside the error branch reduce readability and add unnecessary coupling. These should be at module level with the other imports, following standard Python practices.
Suggested refactor
At the top of the file, add:
import json
from httpx import Headers
from qdrant_client.http.exceptions import UnexpectedResponseThen simplify the method:
`@staticmethod`
def _validate_no_nan(vector: list[float]) -> None:
- """validate that vector does not contain NaN values
+ """Validate that vector does not contain NaN or Inf values.
Args:
vector: List of float values to validate
Raises:
- UnexpectedResponse: If the vector contains NaN values
+ UnexpectedResponse: If the vector contains NaN or Inf values
"""
- if any(math.isnan(val) for val in vector):
- from qdrant_client.http.exceptions import UnexpectedResponse
- from httpx import Headers
- import json
-
- raise UnexpectedResponse(
- status_code=400,
- reason_phrase="Bad Request",
- content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
- headers=Headers({}),
- )
+ if any(not math.isfinite(val) for val in vector):
+ raise UnexpectedResponse(
+ status_code=400,
+ reason_phrase="Bad Request",
+ content=json.dumps(
+ {"error": "Query vector must not contain NaN or Inf values"}
+ ).encode(),
+ headers=Headers({}),
+ )🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@qdrant_client/conversions/conversion.py` around lines 3601 - 3619, _in
_validate_no_nan replace the NaN-only check with a finite check by using
math.isfinite (i.e., raise when not math.isfinite(val)) so ±Inf are rejected
too, and move the lazy imports (json, Headers from httpx, and UnexpectedResponse
from qdrant_client.http.exceptions) to the module top-level instead of importing
them inside the exception branch; keep the same raised UnexpectedResponse
construction and message but use the module-level names.
3aea359 to
7f42dc8
Compare
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
🧹 Nitpick comments (1)
tests/congruence_tests/test_multivector_updates.py (1)
56-63: Inconsistent subscript formatting betweenlocal_old_pointandremote_old_point.Line 60 now uses compact
)[0][0], butremote_old_point(lines 61–63) still uses the split)[0][\n 0\n]style. The same asymmetry exists forremote_new_point(lines 82–84).♻️ Proposed consistency fix
- remote_old_point = remote_client.scroll(COLLECTION_NAME, scroll_filter=id_filter, limit=1)[0][ - 0 - ] + remote_old_point = remote_client.scroll(COLLECTION_NAME, scroll_filter=id_filter, limit=1)[0][0]Apply the same change to
remote_new_pointat lines 82–84.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@tests/congruence_tests/test_multivector_updates.py` around lines 56 - 63, The subscripting style is inconsistent: change the split multi-line subscripts for remote_old_point and remote_new_point to the compact form used for local_old_point (i.e., replace the current ")[0][\n 0\n]" style with ")[0][0]") so both local_* and remote_* assignments use the same compact indexing; update the occurrences for remote_old_point and remote_new_point to match local_old_point's ")[0][0]" pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Nitpick comments:
In `@tests/congruence_tests/test_multivector_updates.py`:
- Around line 56-63: The subscripting style is inconsistent: change the split
multi-line subscripts for remote_old_point and remote_new_point to the compact
form used for local_old_point (i.e., replace the current ")[0][\n 0\n]" style
with ")[0][0]") so both local_* and remote_* assignments use the same compact
indexing; update the occurrences for remote_old_point and remote_new_point to
match local_old_point's ")[0][0]" pattern.
description
validate query vectors for nan values in grpc client before sending to the server
before : grpc client would return points with nan scores when given vector that contains nan, while local and http clients properly rejected it
those nan values in query vectors would propagate through distance calculations, this can causes confusion for users who receives invalid results without clear error message
All Submissions:
devbranch. Did you create your branch fromdev?New Feature Submissions:
pre-commitwithpip3 install pre-commitand set up hooks withpre-commit install?Changes to Core Features: