Skip to content

@201 Add RO-Crate entity filters for targeted lookups#273

Open
arjlai221 wants to merge 1 commit intomainfrom
201-optimize-checking-for-existing-records
Open

@201 Add RO-Crate entity filters for targeted lookups#273
arjlai221 wants to merge 1 commit intomainfrom
201-optimize-checking-for-existing-records

Conversation

@arjlai221
Copy link
Copy Markdown
Collaborator

Summary

This MR fixes the RO-Crate existence-check path by extending the existing
GET /workflows/{id}/ro_crate_entities API with optional filters for:

  • file_id
  • entity_id

Instead of repeatedly fetching the full RO-Crate entity list and scanning it
client-side, callers can now perform targeted lookups against the server.

Problem

The previous implementation relied on listing all RO-Crate entities for a
workflow and then filtering in client code to determine whether a matching
record already existed.

That has two problems:

  • it does unnecessary work for large workflows
  • it does not scale well for repeated existence checks

What Changed

Server / API

  • Added optional file_id and entity_id query parameters to
    GET /workflows/{id}/ro_crate_entities
  • Plumbed those filters through the live router, server contract, HTTP server,
    and RO-Crate transport
  • Updated the RO-Crate list query to apply those filters in SQL
  • Applied the same filters to the count query so pagination metadata stays
    consistent

Client

  • Updated the generated Rust client to pass the new query parameters
  • Updated the checked-in Python and Julia generated clients for parity
  • Added Rust helper methods for filtered lookup:
    • list_ro_crate_entities_with_filters
    • find_ro_crate_entity_by_file_id
    • find_ro_crate_entity_by_entity_id
  • Switched find_entity_for_file() to use a targeted filtered lookup instead of
    full-list fetch + scan
  • Added warning behavior if the Rust find_* helpers observe multiple matches
    and return the first result

Data Integrity

  • Added a migration creating a partial unique index on
    (workflow_id, file_id) where file_id IS NOT NULL
  • Migration deduplicates existing conflicting (workflow_id, file_id) rows
    before creating the unique index

Error Handling

  • Duplicate RO-Crate inserts for the same workflow/file link now return a
    structured 422 response instead of falling through as a generic server error

Tests

Added and strengthened RO-Crate coverage for:

  • filtering by file_id
  • filtering by entity_id
  • combined filters
  • empty / non-matching filtered results
  • pagination metadata on filtered listing
  • duplicate workflow/file link rejection with explicit 422 assertion

Validation

Ran successfully:

  • cargo fmt --all --check
  • cargo check --tests
  • cargo clippy --all --all-targets --all-features -- -D warnings
  • dprint check

Note: full integration test execution is still limited in this environment
because sandboxed test server port binding is restricted.

Notes / Risk

  • The new migration is intentionally data-shaping: if duplicate
    (workflow_id, file_id) rows already exist, it keeps the newest row before
    adding the unique index.
  • The transport trait signature changed to carry the new optional filters, so
    any out-of-tree implementors will need to update accordingly.

@arjlai221 arjlai221 self-assigned this Apr 22, 2026
@arjlai221 arjlai221 requested a review from daniel-thom April 22, 2026 17:24
@arjlai221 arjlai221 changed the title Add RO-Crate entity filters for targeted lookups #203 Add RO-Crate entity filters for targeted lookups Apr 23, 2026
@arjlai221 arjlai221 changed the title #203 Add RO-Crate entity filters for targeted lookups #201 Add RO-Crate entity filters for targeted lookups Apr 23, 2026
@arjlai221 arjlai221 changed the title #201 Add RO-Crate entity filters for targeted lookups @201 Add RO-Crate entity filters for targeted lookups Apr 23, 2026
@@ -0,0 +1,112 @@
# RO-Crate Filter Staged Review
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete?

configuration,
workflow_id,
Some(0),
Some(2),
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why 2?

{
Ok(result) => result,
Err(e) => {
let error_string = e.to_string();
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants