Skip to content

Fix/validate nan query vectors#1156

Open
Goodnight77 wants to merge 3 commits intoqdrant:masterfrom
Goodnight77:fix/validate-nan-query-vectors
Open

Fix/validate nan query vectors#1156
Goodnight77 wants to merge 3 commits intoqdrant:masterfrom
Goodnight77:fix/validate-nan-query-vectors

Conversation

@Goodnight77
Copy link
Copy Markdown

description

validate query vectors for nan values in grpc client before sending to the server
before : grpc client would return points with nan scores when given vector that contains nan, while local and http clients properly rejected it
those nan values in query vectors would propagate through distance calculations, this can causes confusion for users who receives invalid results without clear error message

All Submissions:

  • Contributions should target the dev branch. Did you create your branch from dev?
  • Have you followed the guidelines in our Contributing document?
  • Have you checked to ensure there aren't other open Pull Requests for the same update/change?

New Feature Submissions:

  1. Does your submission pass tests?
  2. Have you installed pre-commit with pip3 install pre-commit and set up hooks with pre-commit install?

Changes to Core Features:

  • Have you added an explanation of what your changes do and why you'd like us to include them?
  • Have you written new tests for your core changes, as applicable?
  • Have you successfully ran tests with your changes locally?

Copilot AI review requested due to automatic review settings February 19, 2026 22:43
@netlify
Copy link
Copy Markdown

netlify bot commented Feb 19, 2026

Deploy Preview for poetic-froyo-8baba7 ready!

Name Link
🔨 Latest commit 98f2b54
🔍 Latest deploy log https://app.netlify.com/projects/poetic-froyo-8baba7/deploys/69979425e198260008f6752e
😎 Deploy Preview https://deploy-preview-1156--poetic-froyo-8baba7.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prevents NaN values in query vectors from being sent via the gRPC client, aligning gRPC behavior with the local and HTTP clients to avoid returning results with nan scores.

Changes:

  • Add NaN validation to REST→gRPC vector conversions and raise an error before issuing gRPC requests.
  • Add/extend tests to ensure gRPC queries (and conversion helpers) reject NaN vectors.
  • Minor formatting/whitespace cleanups in tests, docs tooling, proto comments, and docs assets.

Reviewed changes

Copilot reviewed 6 out of 10 changed files in this pull request and generated 3 comments.

Show a summary per file
File Description
qdrant_client/conversions/conversion.py Adds NaN validation for dense/sparse vectors during REST→gRPC conversion.
tests/congruence_tests/test_query.py Updates congruence test to assert gRPC also rejects NaN queries.
tests/conversions/test_validate_conversions.py Adds unit tests ensuring conversion helpers raise on NaN vectors.
tests/test_migrate.py Minor formatting fix (trailing comma / compact init).
tests/congruence_tests/test_sparse_updates.py Minor formatting simplification.
tests/congruence_tests/test_multivector_updates.py Minor formatting simplification.
tools/generate_docs.sh Minor formatting/line normalization.
qdrant_client/proto/points_service.proto Whitespace cleanup in comments.
docs/images/api-icon.svg Line normalization (no functional change).
docs/.gitignore Line normalization (no functional change).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +3607 to +3620
UnexpectedResponse: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
from qdrant_client.http.exceptions import UnexpectedResponse
from httpx import Headers
import json

raise UnexpectedResponse(
status_code=400,
reason_phrase="Bad Request",
content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
headers=Headers({}),
)

Copy link

Copilot AI Feb 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_validate_no_nan raises qdrant_client.http.exceptions.UnexpectedResponse by manually constructing an HTTP-like response (status_code/reason_phrase/headers/content) even though this is client-side validation in the gRPC conversion layer. This is semantically misleading and couples gRPC-only usage to HTTP error types; consider raising a dedicated client-side validation exception (or ValueError) and letting higher-level client code translate it, or otherwise centralize this error construction so it matches the rest of the client's error semantics.

Suggested change
UnexpectedResponse: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
from qdrant_client.http.exceptions import UnexpectedResponse
from httpx import Headers
import json
raise UnexpectedResponse(
status_code=400,
reason_phrase="Bad Request",
content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
headers=Headers({}),
)
ValueError: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
raise ValueError("Query vector must not contain NaN values")

Copilot uses AI. Check for mistakes.
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Feb 19, 2026

📝 Walkthrough

Walkthrough

This pull request introduces NaN validation for vector conversions in the qdrant_client library. A new private utility function _validate_no_nan is added to check float vectors for NaN values and raise an HTTP 400 UnexpectedResponse when detected. This validation is integrated into convert_dense_vector, convert_sparse_vector, and related conversion functions. Corresponding test cases verify the NaN validation behavior across dense, sparse, and multi-dense vector types. The PR also includes numerous formatting improvements across test files and utility modules, such as single-line indexing expressions, import reorganization, and trailing whitespace cleanup.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 22.22% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'Fix/validate nan query vectors' directly describes the main change: adding NaN validation to query vectors, which aligns perfectly with the core functionality implemented across the conversion module and tests.
Description check ✅ Passed The description clearly explains the problem (gRPC client not validating NaN values while local/HTTP clients do), the solution (adding validation before server requests), and the impact (prevents invalid results and confusion), all of which are directly related to the changeset.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
tests/congruence_tests/test_query.py (1)

1550-1551: Consider adding sparse and multi-dense NaN coverage to this congruence test.

The congruence test test_query_with_nan currently exercises only the dense vector NaN path for gRPC. Sparse and multi-dense NaN rejections are only covered at the unit level (test_validate_conversions.py). Adding end-to-end congruence assertions for those paths here would provide stronger integration confidence.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/congruence_tests/test_query.py` around lines 1550 - 1551, The
congruence test only covers the dense-vector NaN path; extend
test_query_with_nan to also exercise sparse and multi-dense NaN rejection by
invoking grpc_client.query_points with queries that include NaN in a sparse
vector and in a multi-dense payload and asserting they raise UnexpectedResponse
(same as the existing dense case using COLLECTION_NAME and query), i.e., add
additional pytest.raises(UnexpectedResponse) calls around
grpc_client.query_points for the sparse and multi-dense query variants so the
end-to-end gRPC behavior matches the unit-level tests.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@qdrant_client/conversions/conversion.py`:
- Around line 3601-3619: _in _validate_no_nan replace the NaN-only check with a
finite check by using math.isfinite (i.e., raise when not math.isfinite(val)) so
±Inf are rejected too, and move the lazy imports (json, Headers from httpx, and
UnexpectedResponse from qdrant_client.http.exceptions) to the module top-level
instead of importing them inside the exception branch; keep the same raised
UnexpectedResponse construction and message but use the module-level names.

---

Nitpick comments:
In `@tests/congruence_tests/test_query.py`:
- Around line 1550-1551: The congruence test only covers the dense-vector NaN
path; extend test_query_with_nan to also exercise sparse and multi-dense NaN
rejection by invoking grpc_client.query_points with queries that include NaN in
a sparse vector and in a multi-dense payload and asserting they raise
UnexpectedResponse (same as the existing dense case using COLLECTION_NAME and
query), i.e., add additional pytest.raises(UnexpectedResponse) calls around
grpc_client.query_points for the sparse and multi-dense query variants so the
end-to-end gRPC behavior matches the unit-level tests.

Comment on lines +3601 to +3619
@staticmethod
def _validate_no_nan(vector: list[float]) -> None:
"""validate that vector does not contain NaN values
Args:
vector: List of float values to validate
Raises:
UnexpectedResponse: If the vector contains NaN values
"""
if any(math.isnan(val) for val in vector):
from qdrant_client.http.exceptions import UnexpectedResponse
from httpx import Headers
import json

raise UnexpectedResponse(
status_code=400,
reason_phrase="Bad Request",
content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
headers=Headers({}),
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Check top-level imports in conversion.py
echo "=== First 50 lines of conversion.py (checking imports) ==="
head -50 qdrant_client/conversions/conversion.py

echo ""
echo "=== Check if json is already imported at module level ==="
rg -n "^import json" qdrant_client/conversions/conversion.py

echo ""
echo "=== Check if Headers is imported at module level ==="
rg -n "from httpx import|import httpx" qdrant_client/conversions/conversion.py

echo ""
echo "=== Check if UnexpectedResponse is imported at module level ==="
rg -n "from qdrant_client.http.exceptions import" qdrant_client/conversions/conversion.py

Repository: qdrant/qdrant-client

Length of output: 2204


🏁 Script executed:

#!/bin/bash
# Check how other parts of the project validate vectors or handle NaN/Inf
echo "=== Check for any existing Inf validation in the codebase ==="
rg "isfinite|isinf|math.isnan" qdrant_client/ --type=py -C2

echo ""
echo "=== Check HTTP client exception usage ==="
rg "UnexpectedResponse" qdrant_client/ --type=py | head -20

Repository: qdrant/qdrant-client

Length of output: 2435


🏁 Script executed:

#!/bin/bash
# Check if Headers is used elsewhere in the conversion.py file
echo "=== Check all uses of Headers in conversion.py ==="
rg "Headers" qdrant_client/conversions/conversion.py -n

echo ""
echo "=== Check all uses of json in conversion.py ==="
rg "json\." qdrant_client/conversions/conversion.py -n | head -10

Repository: qdrant/qdrant-client

Length of output: 353


Broaden validation to reject ±Inf and move imports to module level.

Two valid observations on _validate_no_nan:

  1. Infinity values (float('inf'), float('-inf')) are equally problematic in distance computations and will propagate silently like NaN. The codebase already validates this pattern elsewhere (see local_collection.py:isfinite(), distances.py:isinf(), hybrid/formula.py:isfinite()). Replace math.isnan(val) with not math.isfinite(val) for consistency.

  2. Lazy imports for json, Headers, and UnexpectedResponse inside the error branch reduce readability and add unnecessary coupling. These should be at module level with the other imports, following standard Python practices.

Suggested refactor

At the top of the file, add:

import json
from httpx import Headers
from qdrant_client.http.exceptions import UnexpectedResponse

Then simplify the method:

     `@staticmethod`
     def _validate_no_nan(vector: list[float]) -> None:
-        """validate that vector does not contain NaN values
+        """Validate that vector does not contain NaN or Inf values.
         Args:
             vector: List of float values to validate
         Raises:
-            UnexpectedResponse: If the vector contains NaN values
+            UnexpectedResponse: If the vector contains NaN or Inf values
         """
-        if any(math.isnan(val) for val in vector):
-            from qdrant_client.http.exceptions import UnexpectedResponse
-            from httpx import Headers
-            import json
-
-            raise UnexpectedResponse(
-                status_code=400,
-                reason_phrase="Bad Request",
-                content=json.dumps({"error": "Query vector must not contain NaN values"}).encode(),
-                headers=Headers({}),
-            )
+        if any(not math.isfinite(val) for val in vector):
+            raise UnexpectedResponse(
+                status_code=400,
+                reason_phrase="Bad Request",
+                content=json.dumps(
+                    {"error": "Query vector must not contain NaN or Inf values"}
+                ).encode(),
+                headers=Headers({}),
+            )
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@qdrant_client/conversions/conversion.py` around lines 3601 - 3619, _in
_validate_no_nan replace the NaN-only check with a finite check by using
math.isfinite (i.e., raise when not math.isfinite(val)) so ±Inf are rejected
too, and move the lazy imports (json, Headers from httpx, and UnexpectedResponse
from qdrant_client.http.exceptions) to the module top-level instead of importing
them inside the exception branch; keep the same raised UnexpectedResponse
construction and message but use the module-level names.

@Goodnight77 Goodnight77 force-pushed the fix/validate-nan-query-vectors branch from 3aea359 to 7f42dc8 Compare February 19, 2026 22:50
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
tests/congruence_tests/test_multivector_updates.py (1)

56-63: Inconsistent subscript formatting between local_old_point and remote_old_point.

Line 60 now uses compact )[0][0], but remote_old_point (lines 61–63) still uses the split )[0][\n 0\n] style. The same asymmetry exists for remote_new_point (lines 82–84).

♻️ Proposed consistency fix
-    remote_old_point = remote_client.scroll(COLLECTION_NAME, scroll_filter=id_filter, limit=1)[0][
-        0
-    ]
+    remote_old_point = remote_client.scroll(COLLECTION_NAME, scroll_filter=id_filter, limit=1)[0][0]

Apply the same change to remote_new_point at lines 82–84.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@tests/congruence_tests/test_multivector_updates.py` around lines 56 - 63, The
subscripting style is inconsistent: change the split multi-line subscripts for
remote_old_point and remote_new_point to the compact form used for
local_old_point (i.e., replace the current ")[0][\n    0\n]" style with
")[0][0]") so both local_* and remote_* assignments use the same compact
indexing; update the occurrences for remote_old_point and remote_new_point to
match local_old_point's ")[0][0]" pattern.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Nitpick comments:
In `@tests/congruence_tests/test_multivector_updates.py`:
- Around line 56-63: The subscripting style is inconsistent: change the split
multi-line subscripts for remote_old_point and remote_new_point to the compact
form used for local_old_point (i.e., replace the current ")[0][\n    0\n]" style
with ")[0][0]") so both local_* and remote_* assignments use the same compact
indexing; update the occurrences for remote_old_point and remote_new_point to
match local_old_point's ")[0][0]" pattern.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants