Endpoint /disambiguate with LLM Inference #72

conrabeatriz · 2026-01-28T15:56:12Z

Summary by Sourcery

Add a new LLM-powered entity disambiguation endpoint and supporting utilities for canonical entity refinement and mapping back into NER predictions.

New Features:

Introduce /anonymizer/disambiguatev2 endpoint that combines fuzzy pre-clustering with LLM-based canonical entity curation and role assignment.
Add utilities to enrich canonical entities with contextual windows and perform LLM inference with batching and token-limit handling for entity disambiguation.
Expose mapping logic to synchronize LLM-refined canonical entities back into document-level NER predictions, including role subclasses.

Enhancements:

Adjust CanonicalEntity model by disabling unused relations field for the current LLM disambiguation phase.
Extend experimentation notebooks to define prompts, run LLM-based disambiguation, and grid-search hyperparameters for models and prompt configurations.
Add reusable prompt templates for person entity disambiguation to support the new LLM endpoint.

…int and its own name

…uments

…lar to the NER predictions, now enriched with role and entity_id fields where applicable.

… the DocumentAnnotations list with the role and the canonincal_entity_id where applicable. When there is a prediction that wasn't mapped the program generates a canonical_entity_id

sourcery-ai · 2026-01-28T15:56:36Z

Reviewer's Guide

Adds a new /anonymizer/disambiguatev2 endpoint that uses an LLM-backed inference pipeline to refine canonical entity disambiguation and role assignment, including utilities for token-budgeted batching, context enrichment, and NER prediction remapping, plus supporting prompt templates and experimental notebooks.

Sequence diagram for the new /anonymizer/disambiguatev2 LLM-backed disambiguation endpoint

sequenceDiagram
    actor Client
    participant Router as anonymizer_disambiguate_v2
    participant Utils as anonymizer_utils
    participant LLM as OllamaLLMProvider
    participant HF as AutoTokenizer

    Client->>Router: POST /anonymizer/disambiguatev2
    activate Router

    note over Router: Validate query params (threshold, scorer, processor,
    note over Router: system_prompts, user_prompt_templates,
    note over Router: tokenizer_model, token_limit_frac)

    Router->>Utils: resolve_processor(processor)
    Router->>Utils: build_canonical_entities(labels, target_labels,
    Router-->>Router: canonical_entities_pre_cluster

    loop For each target_label in user_prompt_templates
        Router->>Utils: validate_canonical_entities(canonical_entities_pre_cluster, target_label)
        Router-->>Router: canonical_entities_val

        Router->>Utils: llm_canonical_entities_inference(
        activate Utils
        Utils->>Utils: add_canonical_entities_context(paragraphs,
        Utils-->>Utils: canonical_entities_with_context

        Utils->>HF: from_pretrained(tokenizer_model)
        HF-->>Utils: tokenizer

        loop If decompose_by is None
            Utils->>HF: apply_chat_template + encode
            HF-->>Utils: token_count
            Utils-->>Utils: adjust decompose_by within token_limit
        end

        Utils->>Utils: split canonical_entities_prompt into batches

        loop For each batch
            Utils->>HF: apply_chat_template + encode
            HF-->>Utils: len_tokens
            alt len_tokens <= model_context
                Utils->>LLM: generate(messages, options, format)
                LLM-->>Utils: response.text (JSON CanonicalEntities)
                Utils-->>Utils: parse canonical_entities from response
            else len_tokens > model_context
                Utils-->>Utils: skip batch
            end
        end

        Utils->>Utils: validate_canonical_entities(all_raw_outputs)
        Utils-->>Router: {
        deactivate Utils

        Router-->>Router: append canonical_entities_llm for target_label
    end

    Router->>Utils: map_canonical_entities_NER_preds(paragraphs, canonical_entities_llm)
    Utils-->>Router: predictions_llm

    Router-->>Client: DocumentAnnotations(data=predictions_llm)
    deactivate Router

Class diagram for updated canonical entity and document annotation models

classDiagram
    class CanonicalEntity {
        +UUID entity_id
        +str aymurai_label
        +str canonical_text
        +list~str~ aliases
        +dict~str, Any~ attributes
        +validate_entity_id() CanonicalEntity
    }

    class CanonicalEntities {
        +list~CanonicalEntity~ canonical_entities
    }

    class DocumentInformation {
        +str document
        +list~DocLabel~ labels
    }

    class DocLabel {
        +Any attrs
        +int start_char
        +int end_char
        +str text
    }

    class DocumentAnnotations {
        +list~DocumentInformation~ data
    }

    CanonicalEntities "1" --> "*" CanonicalEntity
    DocumentAnnotations "1" --> "*" DocumentInformation
    DocumentInformation "1" --> "*" DocLabel
    CanonicalEntity "1" --> "0..*" DocLabel : linked via
    CanonicalEntity --> DocumentAnnotations : used to enrich

File-Level Changes

Change	Details	Files
Introduce LLM-based canonical entity disambiguation utilities and mapping logic reused by the new endpoint.	Extend anonymizer utils imports to include DocumentInformation, CanonicalEntities, JSON pretty-print helper, and Ollama LLM provider. Add validate_canonical_entities helper to normalize/validate CanonicalEntity objects and expose hex entity_id. Add add_canonical_entities_context to attach per-entity context windows from paragraph predictions, with optional label filtering and window sizing. Add get_model_tokens to compute token counts for a chat prompt using a configurable tokenizer model. Add llm_canonical_entities_inference to batch canonical entities, respect model context/token limits, invoke Ollama with structured JSON schema, and aggregate LLM-refined canonical entities and prompts. Add map_canonical_entities_NER_preds to sync LLM canonical entities with NER predictions by setting canonical_entity_id and label_subclass (role) or generating new IDs when no match exists. Comment out unused CanonicalEntity.relations to simplify the model for the LLM phase.	`aymurai/api/endpoints/routers/anonymizer/utils.py` `aymurai/meta/entities.py`
Expose a new /anonymizer/disambiguatev2 API endpoint that orchestrates fuzzy pre-clustering, LLM refinement per label, and mapping back into DocumentAnnotations.	Define anonymizer_disambiguate_v2 FastAPI route accepting paragraph-level predictions, per-label system and user prompts, fuzzy clustering parameters, LLM model and tokenizer parameters, and batching controls. Validate query/body parameters including similarity threshold bounds, supported scorer/processor, presence of prompts, tokenizer model, and token_limit_frac range. Reuse existing fuzzy clustering (build_canonical_entities) to build pre-clustered CanonicalEntity list filtered by optional target_labels. Iterate over provided user_prompt_templates by label, invoke llm_canonical_entities_inference for each label with corresponding system prompt, and accumulate LLM canonical entities across labels. Apply map_canonical_entities_NER_preds to propagate canonical_entity_id and role subclasses back into the DocumentInformation predictions and return them wrapped in DocumentAnnotations.	`aymurai/api/endpoints/routers/anonymizer/anonymizer.py`
Add reusable prompt templates for person-role and direction disambiguation and wire them into experiments and environment configuration.	Create prompt_templates.py with Spanish system and user prompts for PER-role assignment and canonical cleanup. Introduce extensive prompt sets and templates for PER and DIRECCION experiments within new LLM disambiguation notebook, plus utilities for token counting, context enrichment, LLM-based grid search, and evaluation. Update smoke-test notebook to support the v2 disambiguation endpoint, PER and DIRECCION labels, and LLM/ tokenizer configuration via environment variables. Add disambiguate_v2 client helper in the smoke notebook sending paragraphs, prompt dicts, and LLM parameters to /anonymizer/disambiguatev2, and adjust document iteration slices and runtime config. Change default API_BASE_URL and data paths in notebooks to new ports/directories and expand target labels to include DIRECCION.	`aymurai/api/endpoints/routers/anonymizer/prompt_templates.py` `notebooks/experiments/entity-disambiguation/07-LLM-entities-disambiguation.ipynb` `notebooks/experiments/entity-disambiguation/08-disambiguation-endpoint-smoke.ipynb`

Tips and commands

Interacting with Sourcery

Trigger a new review: Comment @sourcery-ai review on the pull request.
Continue discussions: Reply directly to Sourcery's review comments.
Generate a GitHub issue from a review comment: Ask Sourcery to create an
issue from a review comment by replying to it. You can also reply to a
review comment with @sourcery-ai issue to create an issue from it.
Generate a pull request title: Write @sourcery-ai anywhere in the pull
request title to generate a title at any time. You can also comment
@sourcery-ai title on the pull request to (re-)generate the title at any time.
Generate a pull request summary: Write @sourcery-ai summary anywhere in
the pull request body to generate a PR summary at any time exactly where you
want it. You can also comment @sourcery-ai summary on the pull request to
(re-)generate the summary at any time.
Generate reviewer's guide: Comment @sourcery-ai guide on the pull
request to (re-)generate the reviewer's guide at any time.
Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
pull request to resolve all Sourcery comments. Useful if you've already
addressed all the comments and don't want to see them anymore.
Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
request to dismiss all existing Sourcery reviews. Especially useful if you
want to start fresh with a new review - don't forget to comment
@sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

Enable or disable review features such as the Sourcery-generated pull request
summary, the reviewer's guide, and others.
Change the review language.
Add, remove or edit custom review instructions.
Adjust other review settings.

Getting Help

Contact our support team for questions or feedback.
Visit our documentation for detailed guides and information.
Keep in touch with the Sourcery team by following us on X/Twitter, LinkedIn or GitHub.

sourcery-ai

Hey - I've found 4 issues, and left some high level feedback:

In utils.validate_canonical_entities the type hints say it returns list[CanonicalEntity], but it actually returns a list of dicts (model_dump() plus entity_id); either adjust the annotations/return type or avoid model_dump and return actual models to keep the contract consistent.
get_model_tokens in utils.py uses AutoTokenizer but the module never imports it (unlike the notebook); add the appropriate from transformers import AutoTokenizer import or wire the tokenizer in via dependency injection to avoid runtime errors.
llm_canonical_entities_inference is annotated to return a dict but may return None when even one entity exceeds the token limit; consider making the return type dict | None (and handling None at the call site) or raising an exception instead, so the behavior is explicit and type-safe.

Prompt for AI Agents

Please address the comments from this code review:

## Overall Comments
- In `utils.validate_canonical_entities` the type hints say it returns `list[CanonicalEntity]`, but it actually returns a list of dicts (`model_dump()` plus `entity_id`); either adjust the annotations/return type or avoid `model_dump` and return actual models to keep the contract consistent.
- `get_model_tokens` in `utils.py` uses `AutoTokenizer` but the module never imports it (unlike the notebook); add the appropriate `from transformers import AutoTokenizer` import or wire the tokenizer in via dependency injection to avoid runtime errors.
- `llm_canonical_entities_inference` is annotated to return a `dict` but may return `None` when even one entity exceeds the token limit; consider making the return type `dict | None` (and handling `None` at the call site) or raising an exception instead, so the behavior is explicit and type-safe.

## Individual Comments

### Comment 1
<location> `aymurai/api/endpoints/routers/anonymizer/utils.py:240-242` </location>
<code_context>
     return canonical_entities
+
+
+def validate_canonical_entities(
+    canonical_entities_raw: list[CanonicalEntity], target_label: str | None = None
+) -> list[CanonicalEntity]:
+
+    if target_label:
</code_context>

<issue_to_address>
**issue (bug_risk):** The return type annotation of `validate_canonical_entities` does not match its actual return value (dicts vs `CanonicalEntity` instances).

Currently the function converts inputs to `CanonicalEntity`, immediately `model_dump()`s them, adds an `entity_id` hex, and returns those dicts. That means the real return type is `list[dict]`, not `list[CanonicalEntity]`. Please either (a) return `CanonicalEntity` instances and do `model_dump`/`entity_id` decoration at the call sites that need dicts, or (b) change the annotation to something like `list[dict[str, Any]]` (or a named alias) so it matches the actual output and doesn’t mislead callers or static analysis.
</issue_to_address>

### Comment 2
<location> `aymurai/api/endpoints/routers/anonymizer/utils.py:482-491` </location>
<code_context>
+    new_ids_map = {}
</code_context>

<issue_to_address>
**issue (bug_risk):** `map_canonical_entities_NER_preds` calls `uuid.uuid4` but `uuid` is not imported in this module.

This will raise a `NameError` when that code path runs. Please add the missing `uuid` import in this module, or route ID generation through a helper that already handles `uuid` usage.
</issue_to_address>

### Comment 3
<location> `aymurai/api/endpoints/routers/anonymizer/utils.py:407-408` </location>
<code_context>
+
+            current_batch_size -= 1
+
+        if not decompose_by:  # Safety check if even 1 entity is too large
+            return None
+
+    # 4. Prepare batches. If decompose_by is 0 or less, we put all entities in one single list (one batch)
</code_context>

<issue_to_address>
**issue (bug_risk):** `llm_canonical_entities_inference` can return `None` despite being annotated as returning `dict`, and the caller assumes a dict.

When `decompose_by` is `None` and even a single-entity batch exceeds the token limit, this branch returns `None`, but the function is annotated as `-> dict` and the caller in `anonymizer_disambiguate_v2` immediately does `llm_response_dict["canonical_entities_llm"]`. That will raise when `None` is returned. Please either (1) always return a dict (with an explicit status/empty list) or (2) update the type to allow `None` and handle that case explicitly at the call site (e.g., skip the label or raise a controlled HTTP error).
</issue_to_address>

### Comment 4
<location> `aymurai/api/endpoints/routers/anonymizer/anonymizer.py:287-294` </location>
<code_context>
+
+    canonical_entities_llm = []
+
+    for target_label, user_prompt_template in user_prompt_templates.items():
+        system_prompt = system_prompts[target_label]
+
+        canonical_entities_val = validate_canonical_entities(
+            canonical_entities_raw=canonical_entities, target_label=target_label
+        )
+
+        llm_response_dict = llm_canonical_entities_inference(
+            paragraphs=paragraphs,
+            canonical_entities_pre_cluster=canonical_entities_val,
</code_context>

<issue_to_address>
**issue (bug_risk):** The loop over `user_prompt_templates` does not guard against empty canonical-entity sets, which can trigger the `None` return path in `llm_canonical_entities_inference`.

You’re calling `validate_canonical_entities(..., target_label=target_label)` and `llm_canonical_entities_inference` for every label, even when `canonical_entities_val` is empty. In that case, `llm_canonical_entities_inference` can return `None` (e.g., when `decompose_by` is `None`), causing `for ce in llm_response_dict["canonical_entities_llm"]` to fail. Please either skip labels with empty `canonical_entities_val` or explicitly handle a `None`/empty response before iterating or appending, especially for labels present in `system_prompts` / `user_prompt_templates` but not in `target_labels` or predictions.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨

_{Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.}

aymurai/api/endpoints/routers/anonymizer/utils.py

sourcery-ai · 2026-01-28T15:58:40Z

aymurai/api/endpoints/routers/anonymizer/utils.py

+    new_ids_map = {}
+
+    for document in predictions_llm:
+        if not document.labels:
+            continue
+
+        for label in document.labels:
+            label_text = label.attrs.aymurai_alt_text
+            if (
+                label.attrs.canonical_entity_id is None


issue (bug_risk): map_canonical_entities_NER_preds calls uuid.uuid4 but uuid is not imported in this module.

This will raise a NameError when that code path runs. Please add the missing uuid import in this module, or route ID generation through a helper that already handles uuid usage.

sourcery-ai · 2026-01-28T15:58:40Z

aymurai/api/endpoints/routers/anonymizer/utils.py

+        if not decompose_by:  # Safety check if even 1 entity is too large
+            return None


issue (bug_risk): llm_canonical_entities_inference can return None despite being annotated as returning dict, and the caller assumes a dict.

When decompose_by is None and even a single-entity batch exceeds the token limit, this branch returns None, but the function is annotated as -> dict and the caller in anonymizer_disambiguate_v2 immediately does llm_response_dict["canonical_entities_llm"]. That will raise when None is returned. Please either (1) always return a dict (with an explicit status/empty list) or (2) update the type to allow None and handle that case explicitly at the call site (e.g., skip the label or raise a controlled HTTP error).

sourcery-ai · 2026-01-28T15:58:40Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+    for target_label, user_prompt_template in user_prompt_templates.items():
+        system_prompt = system_prompts[target_label]
+
+        canonical_entities_val = validate_canonical_entities(
+            canonical_entities_raw=canonical_entities, target_label=target_label
+        )
+
+        llm_response_dict = llm_canonical_entities_inference(


issue (bug_risk): The loop over user_prompt_templates does not guard against empty canonical-entity sets, which can trigger the None return path in llm_canonical_entities_inference.

You’re calling validate_canonical_entities(..., target_label=target_label) and llm_canonical_entities_inference for every label, even when canonical_entities_val is empty. In that case, llm_canonical_entities_inference can return None (e.g., when decompose_by is None), causing for ce in llm_response_dict["canonical_entities_llm"] to fail. Please either skip labels with empty canonical_entities_val or explicitly handle a None/empty response before iterating or appending, especially for labels present in system_prompts / user_prompt_templates but not in target_labels or predictions.

aymurai/api/endpoints/routers/anonymizer/utils.py

jedzill4 · 2026-01-28T19:45:37Z

aymurai/api/endpoints/routers/anonymizer/utils.py

+    token_limit_frac: float = 2 / 3,
+    temperature: int = 0,
+    decompose_by: int | None = 0,
+) -> dict:


about this function:

avoid returning dict objects, it would be better for manteneability to use pydantic objects (if validation is required) or dataclasses instead

use google format for doctrings

avoid adding unnecesary llm coments overexplaning the code. If its necessary to explain a block of code its better to add a comment as a note. i.e # NOTE: this block of code do ...

jedzill4 · 2026-01-28T19:53:59Z

.env.common

+API_BASE_URL=http://localhost:8000
 REQUEST_TIMEOUT=300


be aware that this changes over the .env.common affect everyone

jedzill4 · 2026-01-28T19:58:49Z

aymurai/api/endpoints/routers/anonymizer/utils.py

+    return canonical_entities_val
+
+
+def add_canonical_entities_context(


too much complexity of this function. too much for loops & if statements, if they are really necessary, extract the code and create a helper function that run the code inside the loop/statement

avoid returning dict. use instead pydantic basemodels (if validation is required) or dataclasses

jedzill4 · 2026-01-28T20:09:45Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py


+# MARK: Disambiguate
+@router.post("/disambiguatev2", response_model=DocumentAnnotations)
+async def anonymizer_disambiguate_v2(


just use one version, we cant have 2 endpoints. You can add a new query paramater named backend that can be a Literal["fuzzyregex", "llm"] to manage which route should be taken

system_prompt/user_prompt should be optional. by default should load the promnts on prompt_templates

scorer/processor looks like choices ? there are any other options different to "ligth_normalizer" ? if so, use Literal["ligth_normalizer", ...] or Enums to declare literal options (if they are 2+ i would prefer Enums). If not, remove parameter

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

jedzill4 · 2026-01-28T20:28:05Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+    if threshold < 0 or threshold > 100:
+        raise HTTPException(status_code=400, detail="threshold must be 0-100.")
+
+    scorer_fn = SCORER_MAP.get(scorer.lower())


Now i see. there are other options. So yes, instead of validating manually, use Literal or Enums. specially in this case I would use Enums

from enum import StrEnum, auto class Scorer(StrEnum): ratio = auto() partial_ratio = auto() ... _SCORER_MAPPER = { 'ratio': ratio_fn, ... } def get_scorer(name: str) -> Callable[...]: # This would raise an exception if scorer is wrong, we like fast fails instead of try/excepts. Also the endpoint already handle that the option exists return _SCORER_MAPPER[name]

from aymurai.endpoints.router.anonimizer.utils import Scorer, get_scorer async def anonymizer_disambiguate_v2( ... scorer: Scorer = Query(...), ... ) scorer_fn = get_scorer(scorer)

Also, we will support all those alternative scorers ? remember this is a production endpoint. I like flexibility but we need to think on what we also can mantain

jedzill4 · 2026-01-28T20:28:19Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+    if scorer_fn is None:
+        raise HTTPException(status_code=400, detail=f"Unsupported scorer: {scorer}")
+
+    if processor.lower() not in PROCESSOR_MAP:


same as above

jedzill4 · 2026-01-28T20:53:48Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+        )
+    processor_fn = resolve_processor(processor)
+
+    for label, prompt in system_prompts.items():


system_prompt needs to be a pydantic base model instead of dict, so we can handle the validation automatically

from pydantic import BaseModel class NamedPrompt(BaseModel): label: str # Actually, we should expose the possible labels instead of just let the user handle it as str prompt: str class NamedPrompts(RootModel): _root: list[NamedPrompt] = Field(default_factory=list) @cached_property def as_dict(self) -> dict[str, NamedPrompt]: return {p.label: p for p in self._root} def get(self, label: str) -> NamedPrompt | None: return self.as_dict.get(label) def get_default_prompt(label: str) -> str: <load the default prompt by label from a file (i would suggest use .yaml instead of .py, but its a matter of choice)> return prompt

then

async def anonymizer_disambiguate_v2( ... # be aware that we are using plural so its more readable what its the object system_prompts: NamedPrompts = Query(default_factory=NamedPrompts, ...) user_template_prompts: NamedPrompts = Query(default_factory=NamedPrompts, ...) ... ): ... system_prompts = [ system_prompts.get(label) or get_default_prompt(label) for label in Labels ] # Same for user_template_prompt

Note: this all is just psudo code

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

jedzill4 · 2026-01-28T20:55:59Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+                detail=f"user_prompt_template for label {label} doesn't exist.",
+            )
+
+    if not tokenizer_model:


why one would use a tokenizer different than the model ? its a needed paramater ?

jedzill4 · 2026-01-28T21:02:58Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+        "microsoft/phi-4",
+        description="Tokenizer instance to get the tokens of our prompt",
+    ),
+    decompose_by: int


that is the decompose_by parameter? its needed for the user ?

jedzill4 · 2026-01-28T21:04:30Z

aymurai/api/endpoints/routers/anonymizer/anonymizer.py

+    context_window_length: Optional[int] = Query(
+        120, description="Length of context window. Use None for full paragraph."
+    ),
+    token_limit_frac: float = Query(


Does the user really needs to be able to change this parameter ? if not, can we make it an env variable instead ? define it as Setting

conrabeatriz added 10 commits January 14, 2026 13:02

Changes in old 07 notebook adding the usage of the disambiguate endpo…

ab0cd55

…int and its own name

New token counter to check if the LLM inference won't allucinate

1d4feed

New tokenizer function for token counting and proessing specifics doc…

568b9e9

…uments

Batch optimization feature in llm-inference function

4e25df0

Mapping feature added to llm-inference function

f7a4644

Updated the /disambiguate endpoint to return DocumentAnnotations simi…

2975da5

…lar to the NER predictions, now enriched with role and entity_id fields where applicable.

New /disambiguatev2 endpoint which makes the LLM inference and return…

e176b1e

… the DocumentAnnotations list with the role and the canonincal_entity_id where applicable. When there is a prediction that wasn't mapped the program generates a canonical_entity_id

New /disambiguatev2 endpoint which makes the LLM inference and return…

575af2a

… the DocumentAnnotations list with the role and the canonincal_entity_id where applicable. When there is a prediction that wasn't mapped the program generates a canonical_entity_id

New /disambiguatev2 endpoint which makes the LLM inference and return…

3a528a4

… the DocumentAnnotations list with the role and the canonincal_entity_id where applicable. When there is a prediction that wasn't mapped the program generates a canonical_entity_id

New updates on endpoint /disambiguatev2 and notebook 07

11b81e9

sourcery-ai bot reviewed Jan 28, 2026

View reviewed changes

jedzill4 reviewed Jan 28, 2026

View reviewed changes

aymurai/api/endpoints/routers/anonymizer/utils.py Show resolved Hide resolved

jedzill4 reviewed Jan 28, 2026

View reviewed changes

aymurai/api/endpoints/routers/anonymizer/anonymizer.py Show resolved Hide resolved

jedzill4 reviewed Jan 28, 2026

View reviewed changes

aymurai/api/endpoints/routers/anonymizer/anonymizer.py Show resolved Hide resolved

jedzill4 reviewed Jan 28, 2026

View reviewed changes

aymurai/api/endpoints/routers/anonymizer/anonymizer.py Show resolved Hide resolved

jedzill4 reviewed Jan 28, 2026

View reviewed changes

aymurai/api/endpoints/routers/anonymizer/anonymizer.py Show resolved Hide resolved

jedzill4 reviewed Jan 28, 2026

View reviewed changes

		if not decompose_by: # Safety check if even 1 entity is too large
		return None

		return canonical_entities_val


		def add_canonical_entities_context(

		API_BASE_URL=http://localhost:8000
		REQUEST_TIMEOUT=300

Endpoint /disambiguate with LLM Inference #72

Are you sure you want to change the base?

Endpoint /disambiguate with LLM Inference #72

Uh oh!

Conversation

conrabeatriz commented Jan 28, 2026 • edited by sourcery-ai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by Sourcery

Uh oh!

sourcery-ai bot commented Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviewer's Guide

Sequence diagram for the new /anonymizer/disambiguatev2 LLM-backed disambiguation endpoint

Class diagram for updated canonical entity and document annotation models

File-Level Changes

Interacting with Sourcery

Customizing Your Experience

Getting Help

Uh oh!

sourcery-ai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sourcery-ai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

sourcery-ai bot Jan 28, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

jedzill4 Jan 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

conrabeatriz commented Jan 28, 2026 •

edited by sourcery-ai bot

Loading

sourcery-ai bot commented Jan 28, 2026 •

edited

Loading

jedzill4 Jan 28, 2026 •

edited

Loading