Skip to content

feat: add fine-grained SPARQL query interfaces#56

Closed
odysa wants to merge 3 commits intomainfrom
vk/395e-feat-query
Closed

feat: add fine-grained SPARQL query interfaces#56
odysa wants to merge 3 commits intomainfrom
vk/395e-feat-query

Conversation

@odysa
Copy link
Owner

@odysa odysa commented Feb 4, 2026

Summary

Adds type-safe, fine-grained SPARQL query methods (select(), ask(), construct(), describe()) to the repository API. Each method returns a specific type instead of the previous generic union type, improving IDE support and type safety.

Why

The existing query() method returns a union type (QuerySolutions | QueryBoolean | QueryTriples), which:

  • Requires runtime type checking to determine the actual result type
  • Provides poor IDE autocomplete and type inference
  • Doesn't align with the RDF4J REST API's support for distinct SPARQL query types

This change provides a more ergonomic API that matches developer expectations and the underlying SPARQL specification.

Key Changes

New Query Methods

  • select(query, infer, bindings, strict)QuerySolutions
  • ask(query, infer, bindings, strict)bool
  • construct(query, infer, bindings, strict)QueryTriples
  • describe(query, infer, bindings, strict)QueryTriples

New Features

  • Variable Bindings: All query methods support a bindings parameter for parameterized queries (maps variable names to RDF terms)
  • Strict Mode: Optional strict=True validates query type before execution, raising QueryTypeMismatchError if mismatched

New Types

  • QueryTypeMismatchError - Exception for strict mode validation errors
  • Term - Type alias for IRI | BlankNode | Literal
  • QueryBindings - Type alias for dict[str, Term]

Updated Methods

  • query() - Added bindings parameter (backward compatible)
  • update() - Simplified signature, added bindings parameter

Files Changed

  • rdf4j_python/_driver/_async_repository.py - Core implementation (+302 lines)
  • rdf4j_python/exception/repo_exception.py - New exception type
  • rdf4j_python/model/term.py - New type aliases
  • tests/test_fine_grained_queries.py - Comprehensive test suite

Impact

  • No breaking changes - Existing query() method signature is backward compatible
  • New dev dependencies - Added pytest and pytest-asyncio for testing

🤖 Generated with Claude Code

Co-Authored-By: Claude Opus 4.5 noreply@anthropic.com


Important

Adds type-safe SPARQL query methods to AsyncRdf4JRepository with strict mode validation and comprehensive tests.

  • Behavior:
    • Adds select(), ask(), construct(), describe() methods to AsyncRdf4JRepository for type-safe SPARQL queries.
    • Each method returns a specific type: select() returns QuerySolutions, ask() returns bool, construct() and describe() return QueryTriples.
    • Introduces strict mode to validate query type before execution, raising QueryTypeMismatchError if mismatched.
  • Models:
    • Adds QueryTypeMismatchError in repo_exception.py for strict mode validation errors.
    • Adds Term and QueryBindings type aliases in term.py.
  • Misc:
    • Updates query() and update() methods in _async_repository.py to support bindings parameter.
    • Adds pytest and pytest-asyncio as dev dependencies in pyproject.toml.
    • Comprehensive test suite added in test_fine_grained_queries.py.

This description was created by Ellipsis for e6585f8. You can customize this summary. It will automatically update as commits are pushed.

odysa and others added 2 commits February 3, 2026 19:31
Add type-safe query methods (select, ask, construct, describe) with specific
return types instead of the generic union type from query(). Each method
returns a specific type for better IDE support and type safety.

New features:
- select() returns QuerySolutions
- ask() returns bool
- construct() returns QueryTriples
- describe() returns QueryTriples
- Variable bindings support via bindings parameter
- Optional strict mode to validate query type before execution

New types:
- QueryTypeMismatchError exception for strict mode validation
- Term type alias for IRI | BlankNode | Literal
- QueryBindings type alias for dict[str, Term]

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
- `pyproject.toml` - contains dev dependency additions (pytest, pytest-asyncio) that were needed for testing
- `requirements-dev.lock` and `requirements.lock` - lock files generated by rye

These are development setup changes, not part of the feature implementation. Would you like me to commit these as well, or should they be discarded?
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed everything up to 8e55dc4 in 11 seconds. Click for details.
  • Reviewed 938 lines of code in 10 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_Utreag8ycovEHIql

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@qodo-code-review
Copy link

CI Feedback 🧐

A test triggered by this PR failed. Here is an AI-generated analysis of the failure:

Action: Lint and Test (3.12)

Failed stage: Lint with Ruff [❌]

Failed test name: ""

Failure summary:

The action failed because the linter (Ruff) reported unused imports (F401) and exited with code 1.
-
rdf4j_python/_driver/_async_repository.py:23:5: rdf4j_python.model.term.Context imported but unused
(F401)
- rdf4j_python/_driver/_async_repository.py:24:5: rdf4j_python.model.term.Literal imported
but unused (F401)
The log indicates these are fixable (e.g., remove the unused imports or run the
linter with --fix), but the CI job treats lint errors as failures.

Relevant error logs:
1:  ##[group]Runner Image Provisioner
2:  Hosted Compute Agent
...

191:  |     ^^^^^^^^^ F401
192:  23 |     Context,
193:  24 |     Literal,
194:  |
195:  = help: Remove unused import
196:  rdf4j_python/_driver/_async_repository.py:24:5: F401 [*] `rdf4j_python.model.term.Literal` imported but unused
197:  |
198:  22 |     BlankNode,
199:  23 |     Context,
200:  24 |     Literal,
201:  |     ^^^^^^^ F401
202:  25 |     Object,
203:  26 |     Predicate,
204:  |
205:  = help: Remove unused import
206:  Found 2 errors.
207:  [*] 2 fixable with the `--fix` option.
208:  ##[error]Process completed with exit code 1.
209:  Post job cleanup.

@odysa odysa changed the title feat: 更小粒度的query接口 (vibe-kanban) feat: add fine-grained SPARQL query interfaces Feb 4, 2026
@qodo-code-review
Copy link

qodo-code-review bot commented Feb 4, 2026

PR Compliance Guide 🔍

Below is a summary of compliance checks for this PR:

Security Compliance
Unbounded memory consumption

Description: The new construct()/describe() implementations parse the full N-Triples HTTP response into
an in-memory og.Store() (for quad in og.parse(response.text, ...) then store.add(quad)),
which can be abused to cause memory/CPU exhaustion if the endpoint (or a MitM/compromised
server) returns an extremely large payload.
_async_repository.py [152-417]

Referred Code
def _build_query_params(
    query: str,
    infer: bool,
    bindings: Optional[QueryBindings] = None,
) -> dict[str, str]:
    """Build query parameters for a SPARQL query request.

    Args:
        query: The SPARQL query string.
        infer: Whether to include inferred statements.
        bindings: Optional variable bindings.

    Returns:
        dict: Query parameters for the HTTP request.
    """
    params: dict[str, str] = {
        "query": query,
        "infer": str(infer).lower(),
    }

    if bindings:


 ... (clipped 245 lines)
Sensitive data in logs

Description: QueryTypeMismatchError embeds up to the first 100 characters of the raw SPARQL query in
the exception message, which can unintentionally expose sensitive data (e.g., embedded
credentials, tokens, PII in literals) if exceptions are logged or surfaced to untrusted
clients.
repo_exception.py [52-70]

Referred Code
class QueryTypeMismatchError(QueryError):
    """Exception raised when query type doesn't match the method called.

    For example, calling select() with an ASK query when strict=True.

    Attributes:
        expected: The expected query type (e.g., "SELECT").
        actual: The detected query type (e.g., "ASK").
        query: The original query string.
    """

    def __init__(self, expected: str, actual: str, query: str):
        self.expected = expected
        self.actual = actual
        self.query = query
        truncated = query[:100] + "..." if len(query) > 100 else query
        super().__init__(
            f"Expected {expected} query but detected {actual}. Query: {truncated}"
        )
Ticket Compliance
🎫 No ticket provided
  • Create ticket/issue
Codebase Duplication Compliance
Codebase context is not defined

Follow the guide to enable codebase context checks.

Custom Compliance
🟢
Generic: Meaningful Naming and Self-Documenting Code

Objective: Ensure all identifiers clearly express their purpose and intent, making code
self-documenting

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Logging Practices

Objective: To ensure logs are useful for debugging and auditing without exposing sensitive
information like PII, PHI, or cardholder data.

Status: Passed

Learn more about managing compliance generic rules or creating your own custom rules

🔴
Generic: Robust Error Handling and Edge Case Management

Objective: Ensure comprehensive error handling that provides meaningful context and graceful
degradation

Status:
Missing binding validation: The new bindings handling does not validate variable names or handle serialization edge
cases (e.g., invalid variable keys or unescaped characters), which can lead to
hard-to-diagnose failures and poor edge-case behavior.

Referred Code
def _build_query_params(
    query: str,
    infer: bool,
    bindings: Optional[QueryBindings] = None,
) -> dict[str, str]:
    """Build query parameters for a SPARQL query request.

    Args:
        query: The SPARQL query string.
        infer: Whether to include inferred statements.
        bindings: Optional variable bindings.

    Returns:
        dict: Query parameters for the HTTP request.
    """
    params: dict[str, str] = {
        "query": query,
        "infer": str(infer).lower(),
    }

    if bindings:


 ... (clipped 5 lines)

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Secure Error Handling

Objective: To prevent the leakage of sensitive system information through error messages while
providing sufficient detail for internal debugging.

Status:
Error leaks response body: The newly added error paths raise QueryError messages containing raw response.text and
QueryTypeMismatchError includes query text, which can expose internal details or sensitive
data to callers.

Referred Code
if response.status_code != httpx.codes.OK:
    raise QueryError(f"Query failed: {response.status_code} - {response.text}")

result = og.parse_query_results(response.text, format=og.QueryResultsFormat.JSON)
if not isinstance(result, og.QuerySolutions):
    raise QueryError(f"Expected QuerySolutions but got {type(result).__name__}")

Learn more about managing compliance generic rules or creating your own custom rules

Generic: Security-First Input Validation and Data Handling

Objective: Ensure all data inputs are validated, sanitized, and handled securely to prevent
vulnerabilities

Status:
Unvalidated external inputs: User-provided bindings keys/values are forwarded into HTTP query parameters (e.g.,
$<var>), but the code does not validate variable names or comprehensively escape
literal content, increasing risk of malformed requests and injection-like behaviors at the
SPARQL endpoint layer.

Referred Code
def _build_query_params(
    query: str,
    infer: bool,
    bindings: Optional[QueryBindings] = None,
) -> dict[str, str]:
    """Build query parameters for a SPARQL query request.

    Args:
        query: The SPARQL query string.
        infer: Whether to include inferred statements.
        bindings: Optional variable bindings.

    Returns:
        dict: Query parameters for the HTTP request.
    """
    params: dict[str, str] = {
        "query": query,
        "infer": str(infer).lower(),
    }

    if bindings:


 ... (clipped 5 lines)

Learn more about managing compliance generic rules or creating your own custom rules

  • Update
Compliance status legend 🟢 - Fully Compliant
🟡 - Partial Compliant
🔴 - Not Compliant
⚪ - Requires Further Human Verification
🏷️ - Compliance label

@qodo-code-review
Copy link

qodo-code-review bot commented Feb 4, 2026

PR Code Suggestions ✨

Explore these optional code suggestions:

CategorySuggestion                                                                                                                                    Impact
General
Escape control characters in literals

In _serialize_binding_value, add escaping for control characters (\n, \r, \t) in
literal values to ensure valid N-Triples serialization.

rdf4j_python/_driver/_async_repository.py [141]

-escaped = term.value.replace("\\", "\\\\").replace('"', '\\"')
+escaped = (
+    term.value
+        .replace("\\", "\\\\")
+        .replace('"', '\\"')
+        .replace("\n", "\\n")
+        .replace("\r", "\\r")
+        .replace("\t", "\\t")
+)
  • Apply / Chat
Suggestion importance[1-10]: 8

__

Why: The suggestion fixes a bug in the N-Triples serialization by correctly escaping control characters in literal values, which is crucial for preventing malformed data and ensuring correctness.

Medium
Improve memory efficiency in query methods

Refactor the construct and describe methods to use a generator for parsing
N-Triples, improving memory efficiency by streaming results instead of building
an intermediate store.

rdf4j_python/_driver/_async_repository.py [363-367]

-# Parse N-Triples response and convert to QueryTriples
-store = og.Store()
-for quad in og.parse(response.text, format=og.RdfFormat.N_TRIPLES):
-    store.add(quad)
-return store.query("CONSTRUCT { ?s ?p ?o } WHERE { ?s ?p ?o }")
+# Parse N-Triples response and yield triples without building a store
+def triple_generator():
+    for quad in og.parse(response.text, format=og.RdfFormat.N_TRIPLES):
+        yield og.Triple(quad.subject, quad.predicate, quad.object)
 
+return triple_generator()
+
  • Apply / Chat
Suggestion importance[1-10]: 7

__

Why: The suggestion correctly identifies a potential memory issue by loading all results into an in-memory store and proposes a more efficient generator-based approach, which is a significant performance improvement.

Medium
Raise specific error for malformed queries

In the update method, check for a 400 Bad Request status code and raise a
QueryError to provide more specific feedback for malformed SPARQL update
queries.

rdf4j_python/_driver/_async_repository.py [481-521]

 async def update(
     self,
     sparql_update: str,
     bindings: Optional[QueryBindings] = None,
 ) -> None:
     """Executes a SPARQL UPDATE command.
 
     Args:
         sparql_update: The SPARQL update string (INSERT, DELETE, CLEAR, etc.).
         bindings: Optional variable bindings for parameterized updates.
             Keys are variable names (without ?), values are RDF terms.
 
     Raises:
         RepositoryNotFoundException: If the repository doesn't exist.
-        RepositoryUpdateException: If the update fails.
+        QueryError: If the update query is malformed.
+        RepositoryUpdateException: If the update fails for other reasons.
 
     Example:
         >>> await repo.update('''
         ...     PREFIX ex: <http://example.org/>
         ...     INSERT DATA { ex:alice ex:age 30 }
         ... ''')
     """
     # SPARQL UPDATE operations return HTTP 204 No Content on success.
     # No result data is returned as per SPARQL 1.1 UPDATE specification.
     path = f"/repositories/{self._repository_id}/statements"
     headers: dict[str, str] = {"Content-Type": Rdf4jContentType.SPARQL_UPDATE}
 
     # Build params for bindings if provided
     params: Optional[dict[str, str]] = None
     if bindings:
         params = {}
         for var_name, term in bindings.items():
             clean_name = var_name.lstrip("?")
             params[f"${clean_name}"] = _serialize_binding_value(term)
 
     response = await self._client.post(
         path, content=sparql_update, headers=headers, params=params
     )
     self._handle_repo_not_found_exception(response)
+
+    if response.status_code == httpx.codes.BAD_REQUEST:
+        raise QueryError(f"Malformed SPARQL update: {response.text}")
     if response.status_code != httpx.codes.NO_CONTENT:
         raise RepositoryUpdateException(f"Failed to update: {response.text}")

[To ensure code accuracy, apply this suggestion manually]

Suggestion importance[1-10]: 6

__

Why: The suggestion correctly proposes to raise a more specific QueryError on a 400 Bad Request in the update method, which improves error handling and provides better feedback to the user.

Low
Strip both $ and ? prefixes

Modify binding variable name cleaning to strip both ? and $ prefixes for greater
flexibility.

rdf4j_python/_driver/_async_repository.py [175]

-clean_name = var_name.lstrip("?")
+clean_name = var_name.lstrip("?$")
  • Apply / Chat
Suggestion importance[1-10]: 4

__

Why: The suggestion improves flexibility by allowing binding variables to be specified with either a ? or $ prefix, which can enhance user experience.

Low
  • Update

Add exhaustive test coverage for strict mode query type validation:

- TestStrictModeMismatch: 13 tests covering all query type mismatches
  (SELECT/ASK/CONSTRUCT/DESCRIBE cross-combinations)
- TestStrictModeBlocksUpdateQueries: 12 tests ensuring INSERT/DELETE/
  CLEAR/DROP queries are blocked on read methods when strict=True

These tests verify that strict mode properly validates query types
before sending requests to the server, preventing accidental use of
UPDATE queries on read endpoints.

Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
Copy link

@ellipsis-dev ellipsis-dev bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Important

Looks good to me! 👍

Reviewed e6585f8 in 26 seconds. Click for details.
  • Reviewed 279 lines of code in 1 files
  • Skipped 0 files when reviewing.
  • Skipped posting 0 draft comments. View those below.
  • Modify your settings and rules to customize what types of comments Ellipsis leaves. And don't forget to react with 👍 or 👎 to teach Ellipsis.

Workflow ID: wflow_c4wDg3mEgOxBCYwK

You can customize Ellipsis by changing your verbosity settings, reacting with 👍 or 👎, replying to comments, or adding code review rules.

@odysa odysa closed this Mar 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant