Skip to content

feat(wpp-retry): PR-1 — idempotency_key migration + silent dedup#78

Open
mt-alarcon wants to merge 2 commits into
evolution-foundation:mainfrom
mt-alarcon:feat/wpp-retry-pr1-clean
Open

feat(wpp-retry): PR-1 — idempotency_key migration + silent dedup#78
mt-alarcon wants to merge 2 commits into
evolution-foundation:mainfrom
mt-alarcon:feat/wpp-retry-pr1-clean

Conversation

@mt-alarcon
Copy link
Copy Markdown

@mt-alarcon mt-alarcon commented May 13, 2026

Context

First of a 3-PR series implementing the WhatsApp retry pattern. Tracks an in-flight feature that adds idempotency to WhatsApp message delivery via the Evolution Go integration.

This PR establishes the foundation: schema migration for idempotency_key and silent dedup on the trigger handler. PR-2 (backoff + jitter) and PR-3 (DLQ + UI Replay + instrumentation) follow.

Changes

dashboard/backend/models.py:

  • Add idempotency_key column to triggers table (nullable, unique index)

dashboard/backend/app.py:

  • Run idempotency migration at startup if column doesn't exist
  • Migration is idempotent (safe to re-run)
  • Includes `EXPLAIN QUERY PLAN` validation to confirm index is used

dashboard/backend/routes/triggers.py:

  • Extract idempotency_key from incoming events (6 patterns covered)
  • Silent dedup on second arrival (IntegrityError → 200 OK, log only)
  • Handles race condition via DB unique constraint

Verification

  • Migration runs cleanly on a fresh DB and on a DB with existing trigger rows
  • Duplicate event arrival returns 200 OK instead of retry storm
  • `EXPLAIN QUERY PLAN` shows index hit for idempotency_key lookups

Series

  • PR-1 (this): migration + silent dedup
  • PR-2: exponential backoff + jitter (depends on PR-1 — may have conflicts to resolve)
  • PR-3: DLQ classification + UI Replay + instrumentation (depends on PR-2)

Summary by Sourcery

Add idempotency support and foundational retry metadata for trigger executions, enabling silent deduplication of duplicate WhatsApp webhook events.

New Features:

  • Introduce idempotency_key, error_category, and last_replay_at fields on trigger executions for retry and classification workflows.
  • Implement silent deduplication in the webhook trigger handler using an extracted idempotency key from incoming events.

Enhancements:

  • Auto-migrate the trigger_executions table at app startup to add new columns and indexes, including a partial unique index enforcing per-trigger idempotency.

…ps 1+2)

Step 1 — Migration (models.py + app.py):
- TriggerExecution ganha 3 colunas nullable: idempotency_key, error_category, last_replay_at
- to_dict() exposto com os 3 campos novos
- Auto-migrate idempotente no startup: ALTER TABLE + IF NOT EXISTS em cada bloco
- Partial unique index uq_trigger_idem (trigger_id, idempotency_key) WHERE NOT NULL
- Basic index ix_trigger_executions_idem_key para lookups por key
- SQLite 3.51 confirmado — partial index nativo; EXPLAIN QUERY PLAN confirma uso do índice

Step 2 — Silent dedup (triggers.py):
- webhook_receiver extrai idem_key de idempotency_key / messageId / data.messageId
- Se key já existe: log idempotent_replay + 200 OK silencioso (pattern F6)
- Race condition: IntegrityError no db.commit() → rollback + 200 OK silencioso
- Legado (GitHub, Stripe, Linear): sem key → idem_key=None → fluxo normal inalterado
- Limpeza: current_app movido para import no topo; imports inline removidos

Testes passados: migration up/down idempotente, partial index unicidade, NULLs livres,
extração de key (6 casos), race condition via IntegrityError, EXPLAIN QUERY PLAN.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@sourcery-ai
Copy link
Copy Markdown

sourcery-ai Bot commented May 13, 2026

Reviewer's Guide

Implements the first step of the WhatsApp retry pattern by adding idempotency-related columns and indexes to TriggerExecution, running an auto-migration at app startup, and wiring webhook handling to extract an idempotency key and silently deduplicate duplicate executions using both application logic and a DB-level unique constraint.

Sequence diagram for webhook_receiver idempotent deduplication

sequenceDiagram
    actor Source
    participant Flask as webhook_receiver
    participant TE as TriggerExecution
    participant DB as db.session

    Source->>Flask: POST /triggers/<id>/webhook_receiver
    Flask->>Flask: extract idem_key from event_data

    alt [idem_key present]
        Flask->>TE: TriggerExecution.query.filter_by(trigger_id, idempotency_key)
        TE-->>Flask: existing or None
        alt [existing found]
            Flask->>Flask: current_app.logger.info(evt=idempotent_replay)
            Flask-->>Source: 200 OK
        else [no existing]
            Flask->>DB: add(TriggerExecution(..., idempotency_key))
            Flask->>DB: commit()
            alt [commit ok]
                DB-->>Flask: success
                Flask-->>Source: 200 OK (execution created)
            else IntegrityError
                DB-->>Flask: IntegrityError
                Flask->>DB: rollback()
                Flask->>Flask: current_app.logger.info(evt=idempotent_replay_race)
                Flask-->>Source: 200 OK
            end
        end
    else [no idem_key]
        Flask->>DB: add(TriggerExecution(..., idempotency_key=None))
        Flask->>DB: commit()
        DB-->>Flask: success
        Flask-->>Source: 200 OK (execution created)
    end
Loading

File-Level Changes

Change Details Files
Add idempotency and retry-related fields to TriggerExecution and its serialization.
  • Extend TriggerExecution model with nullable idempotency_key, error_category, and last_replay_at columns so existing code continues to work without them set.
  • Index idempotency_key at the ORM level for query performance.
  • Expose idempotency_key, error_category, and last_replay_at via TriggerExecution.to_dict, formatting timestamps consistently with existing fields.
  • Document rollback steps and semantics for the new fields in model comments.
dashboard/backend/models.py
Auto-migrate trigger_executions table at startup to support idempotent executions and future retry metadata.
  • Inspect trigger_executions columns via PRAGMA table_info and conditionally add idempotency_key, error_category, and last_replay_at as nullable columns so the migration is safe and idempotent.
  • Create a non-unique index on idempotency_key for lookups by key alone, guarding with IF NOT EXISTS and try/except for resilience.
  • Create a partial unique index uq_trigger_idem on (trigger_id, idempotency_key) where idempotency_key IS NOT NULL to enforce uniqueness only when a key is present and to guard race conditions.
  • Document rollback procedures (dropping both indexes) and explain how these indices fit into the WhatsApp retry pattern.
dashboard/backend/app.py
Implement application-level idempotency key extraction and silent deduplication in the webhook receiver, backed by DB constraints.
  • Import current_app at the module level (also removing redundant inner imports) and add IntegrityError handling from SQLAlchemy.
  • Extract an idempotency key from incoming event payloads by checking idempotency_key, messageId, and data.messageId fields when event_data is a dict, leaving the key as None when not found.
  • Before creating a new TriggerExecution, query for an existing execution with the same trigger_id and idempotency_key; if found, log an idempotent_replay event and return 200 OK without re-running the trigger.
  • Attach the derived idempotency_key to new TriggerExecution rows so deduplication and indexing work as intended.
  • Wrap the execution insert/commit in a try/except IntegrityError to handle race conditions where the partial unique index rejects a duplicate insert; on error, roll back, log idempotent_replay_race, and return 200 OK.
  • Keep webhook semantics unchanged for requests without an idempotency key by skipping the deduplication logic entirely in that case.
dashboard/backend/routes/triggers.py

Tips and commands

Interacting with Sourcery

  • Trigger a new review: Comment @sourcery-ai review on the pull request.
  • Continue discussions: Reply directly to Sourcery's review comments.
  • Generate a GitHub issue from a review comment: Ask Sourcery to create an
    issue from a review comment by replying to it. You can also reply to a
    review comment with @sourcery-ai issue to create an issue from it.
  • Generate a pull request title: Write @sourcery-ai anywhere in the pull
    request title to generate a title at any time. You can also comment
    @sourcery-ai title on the pull request to (re-)generate the title at any time.
  • Generate a pull request summary: Write @sourcery-ai summary anywhere in
    the pull request body to generate a PR summary at any time exactly where you
    want it. You can also comment @sourcery-ai summary on the pull request to
    (re-)generate the summary at any time.
  • Generate reviewer's guide: Comment @sourcery-ai guide on the pull
    request to (re-)generate the reviewer's guide at any time.
  • Resolve all Sourcery comments: Comment @sourcery-ai resolve on the
    pull request to resolve all Sourcery comments. Useful if you've already
    addressed all the comments and don't want to see them anymore.
  • Dismiss all Sourcery reviews: Comment @sourcery-ai dismiss on the pull
    request to dismiss all existing Sourcery reviews. Especially useful if you
    want to start fresh with a new review - don't forget to comment
    @sourcery-ai review to trigger a new review!

Customizing Your Experience

Access your dashboard to:

  • Enable or disable review features such as the Sourcery-generated pull request
    summary, the reviewer's guide, and others.
  • Change the review language.
  • Add, remove or edit custom review instructions.
  • Adjust other review settings.

Getting Help

Copy link
Copy Markdown

@sourcery-ai sourcery-ai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 2 issues, and left some high level feedback:

  • The except IntegrityError in webhook_receiver will treat any integrity violation as an idempotent replay; consider constraining this to the idempotency_key unique index (e.g., by checking idem_key is set and/or inspecting the error message) to avoid masking other DB issues.
  • You’re creating an explicit ix_trigger_executions_idem_key index in the startup migration and also declare idempotency_key with index=True on the model; this will lead to two separate indexes—consider relying on just the partial unique index plus either the model-level index or the raw SQL one to avoid redundant indexes.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- The `except IntegrityError` in `webhook_receiver` will treat any integrity violation as an idempotent replay; consider constraining this to the idempotency_key unique index (e.g., by checking `idem_key` is set and/or inspecting the error message) to avoid masking other DB issues.
- You’re creating an explicit `ix_trigger_executions_idem_key` index in the startup migration and also declare `idempotency_key` with `index=True` on the model; this will lead to two separate indexes—consider relying on just the partial unique index plus either the model-level index or the raw SQL one to avoid redundant indexes.

## Individual Comments

### Comment 1
<location path="dashboard/backend/routes/triggers.py" line_range="353-355" />
<code_context>
+        _cur.execute("ALTER TABLE trigger_executions ADD COLUMN last_replay_at TIMESTAMP")
+        _conn.commit()
+    # Basic index for idempotency lookups by key alone
+    try:
+        _cur.execute(
+            "CREATE INDEX IF NOT EXISTS ix_trigger_executions_idem_key "
</code_context>
<issue_to_address>
**issue (bug_risk):** Differentiate idempotency unique constraint violations from other IntegrityError cases

Catching all IntegrityError instances and treating them as idempotent replays can hide unrelated DB issues. Please either inspect the underlying exception/constraint to confirm it’s specifically the uq_trigger_idem violation, or at least log the full exception and re-raise for non-idempotency cases so real data problems aren’t returned as 200 OK.
</issue_to_address>

### Comment 2
<location path="dashboard/backend/app.py" line_range="630-638" />
<code_context>
+        _cur.execute("ALTER TABLE trigger_executions ADD COLUMN last_replay_at TIMESTAMP")
+        _conn.commit()
+    # Basic index for idempotency lookups by key alone
+    try:
+        _cur.execute(
+            "CREATE INDEX IF NOT EXISTS ix_trigger_executions_idem_key "
+            "ON trigger_executions (idempotency_key)"
</code_context>
<issue_to_address>
**suggestion (bug_risk):** Avoid silently ignoring failures to create the partial unique index for idempotency

The except block currently swallows index-creation failures (e.g., unsupported SQLite version), leaving only best-effort app-level dedup and a race window where duplicates can slip through unnoticed. At minimum, log the failure (including SQLite version and error message) so it’s visible when the DB-level safeguard isn’t active, or consider failing the migration outright if partial indexes aren’t supported on the target runtime.

```suggestion
    # Basic index for idempotency lookups by key alone
    try:
        _cur.execute(
            "CREATE INDEX IF NOT EXISTS ix_trigger_executions_idem_key "
            "ON trigger_executions (idempotency_key)"
        )
        _conn.commit()
    except Exception as exc:
        # Do not silently ignore failures to create this index; log enough context
        # (including SQLite versions) so it's visible when the DB-level safeguard
        # is not active and we are relying solely on app-level deduplication.
        import logging
        import sqlite3

        library_version = getattr(sqlite3, "sqlite_version", None)
        try:
            runtime_version = _cur.execute("select sqlite_version()").fetchone()[0]
        except Exception:
            runtime_version = None

        logger = logging.getLogger(__name__)
        logger.warning(
            "Failed to create ix_trigger_executions_idem_key index; "
            "falling back to best-effort app-level idempotency only. "
            "sqlite_library_version=%r sqlite_runtime_version=%r error=%r",
            library_version,
            runtime_version,
            exc,
        )
```
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

Comment thread dashboard/backend/routes/triggers.py Outdated
Comment thread dashboard/backend/app.py Outdated
mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
…ência

Endereça review do Sourcery em PR evolution-foundation#78:

Antes, `except IntegrityError` no `webhook_receiver` capturava qualquer
violação de integridade e respondia 200 OK silencioso — tratando como
"replay idempotente". Qualquer outro problema (NOT NULL, FK quebrada,
novo unique constraint adicionado depois) seria mascarado como sucesso
e o cliente nunca saberia que o evento não foi processado.

Agora o catch só absorve o erro se:
  1. `idem_key` está definido (sem chave de idempotência não há como
     ser violação de uq_trigger_idem)
  2. A mensagem do erro do driver menciona `uq_trigger_idem` ou
     `idempotency_key` (constraint específica)

Para os demais IntegrityError, loga com contexto completo (trigger_id,
key, mensagem original) e re-raise. O webhook retornará 500 e o erro
fica visível em logs em vez de virar 200 silencioso.

Não há teste dedicado pro path nessa branch (testes WPP só entram no
PR-2/PR-3); sintaxe validada com py_compile.
mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
…iggers

Endereça review do Sourcery em PR evolution-foundation#80 (Comment 1) e estende a mesma
proteção para o endpoint de replay (não citado pelo Sourcery, mas
contém o mesmo bug latente — `except IntegrityError` genérico).

Dois pontos corrigidos em dashboard/backend/routes/triggers.py:

1. `webhook_receiver` (linha ~389): mesmo bug do PR evolution-foundation#78. Capturava
   qualquer IntegrityError e respondia 200 OK silencioso. Agora só
   absorve se a mensagem do driver menciona `uq_trigger_idem` ou
   `idempotency_key` e `idem_key` está definido. Caso contrário,
   loga erro completo e re-raise.

2. Endpoint `/triggers/executions/<id>/replay` (linha ~476): cria
   nova execution reusando `ex.idempotency_key`. Tinha o mesmo
   catch genérico, mascarando outros IntegrityError como
   "idempotent_skip". Aplicada a mesma proteção, usando
   `ex.idempotency_key` no check.

Sem teste dedicado pra esse path nessa branch — sintaxe validada
com py_compile.
mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
Polish do review do Sourcery em PR evolution-foundation#78 (sugestões code-quality):

1. Remove `index=True` de `TriggerExecution.idempotency_key` em
   models.py. Estávamos criando 3 índices para a mesma coluna:
   - `ix_trigger_executions_idempotency_key` (auto-gerado por
     SQLAlchemy via index=True)
   - `ix_trigger_executions_idem_key` (raw SQL no startup)
   - `uq_trigger_idem` (partial unique, raw SQL)
   Mantemos só os dois últimos — explicitamente criados pela
   migration de startup, com nomes versionados no rollback plan.

2. Substitui `except Exception: pass` por log estruturado nos dois
   `CREATE INDEX` (basic e partial unique). Antes, qualquer falha
   (ex.: versão de SQLite sem suporte a partial index) era engolida
   silenciosamente — operador só descobriria no primeiro race.
   Agora loga `sqlite_lib`, `sqlite_runtime` e a exceção original.

Sem mudança de comportamento em runtime saudável; só melhora
observabilidade quando algo falha.
… índice + logs

Consolidado dos fixes pedidos pelo Sourcery no review do PR evolution-foundation#78:

1. `except IntegrityError` restrito à violação de idempotência
   (dashboard/backend/routes/triggers.py)

   Antes, qualquer IntegrityError no `webhook_receiver` virava 200 OK
   silencioso — tratado como "replay idempotente". Outros problemas
   (NOT NULL, FK quebrada, novo unique constraint adicionado depois)
   ficariam mascarados como sucesso e o cliente nunca saberia que o
   evento não foi processado.

   Agora o catch só absorve o erro se:
     a) `idem_key` está definido (sem chave de idempotência não há como
        ser violação de uq_trigger_idem)
     b) A mensagem do erro do driver menciona `uq_trigger_idem` ou
        `idempotency_key` (constraint específica)

   Para os demais IntegrityError, loga com contexto completo
   (trigger_id, key, mensagem original) e re-raise — webhook retorna
   500 e o erro fica visível em logs em vez de virar 200 silencioso.

2. Remove índice redundante (models.py)

   `TriggerExecution.idempotency_key` estava criando 3 índices pra
   mesma coluna:
     - `ix_trigger_executions_idempotency_key` (auto-gerado via
       index=True no model)
     - `ix_trigger_executions_idem_key` (raw SQL no startup)
     - `uq_trigger_idem` (partial unique, raw SQL)

   Mantemos só os dois últimos — explicitamente criados pela migration
   de startup, com nomes versionados no rollback plan. `index=True`
   removido da definição do model.

3. Loga falha de CREATE INDEX em vez de silenciar (app.py)

   Substitui `except Exception: pass` por log estruturado nos dois
   `CREATE INDEX` (basic e partial unique). Antes, qualquer falha
   (ex.: versão de SQLite sem suporte a partial index) era engolida
   silenciosamente — operador só descobriria no primeiro race.
   Agora loga `sqlite_lib`, `sqlite_runtime` e a exceção original.

Sintaxe validada com `python3 -m py_compile`. Sem teste dedicado pro
path nessa branch (testes WPP só entram nos PRs evolution-foundation#79/evolution-foundation#80).
@mt-alarcon mt-alarcon force-pushed the feat/wpp-retry-pr1-clean branch from db9e954 to 9c7cd5e Compare May 13, 2026 23:23
mt-alarcon pushed a commit to mt-alarcon/evo-nexus that referenced this pull request May 13, 2026
… SQL, ApiError

Consolidado dos fixes pedidos pelo Sourcery no review do PR evolution-foundation#80:

1. `except IntegrityError` restrito à violação de idempotência
   (dashboard/backend/routes/triggers.py — 2 pontos)

   Mesmo bug do PR evolution-foundation#78 estendido a este PR. Dois pontos no triggers.py:

   a) `webhook_receiver` (~linha 389): mesma correção do PR evolution-foundation#78 —
      só absorve o erro se a mensagem do driver menciona
      `uq_trigger_idem` ou `idempotency_key` E `idem_key` está set;
      caso contrário loga com contexto completo e re-raise.

   b) Endpoint `/triggers/executions/<id>/replay` (~linha 476):
      cria nova execution reusando `ex.idempotency_key`. Tinha o
      mesmo catch genérico, mascarando outros IntegrityError como
      "idempotent_skip". Aplicada a mesma proteção usando
      `ex.idempotency_key` no check. (Sourcery não citou este
      segundo ponto, mas é o mesmo bug — corrigido por simetria.)

2. `_classify_error` — tighten typing

   - Define alias `ErrorCategory = Literal["transient", "permanent"]`.
   - Atualiza return type pra refletir o que a função realmente
     retorna.
   - Atualiza comentário em `TriggerExecution.error_category`
     (models.py) que listava 4 categorias (validation/unknown) que
     nunca eram emitidas pelo classificador — consumidores não vão
     mais depender de valores inexistentes.

3. `trigger_stats` — parametriza SQL

   - `datetime('now', '-{days} days')` interpolado vira `:cutoff`
     bindado (cutoff calculado em Python com `timedelta`).
   - `WHERE trigger_id IN ({id_list})` vira `IN :wpp_ids` com
     `bindparam("wpp_ids", expanding=True)` — SQLAlchemy expande pra
     `(:id_1, :id_2, ...)` e binda cada valor.
   - Inputs hoje são "seguros" (ints + days constrained), mas
     parametrizar é mais robusto contra regressões e mais legível.

4. `confirmReplay` (Triggers.tsx) — error handling estruturado

   - Cria `ApiError` em `lib/api.ts` carregando `status` (número HTTP)
     e `code` (string do JSON do backend, ex.: `rate_limited`).
   - `buildError` agora retorna `ApiError` em vez de `Error` simples.
   - `Triggers.tsx::confirmReplay` checa `e instanceof ApiError &&
     (e.status === 429 || e.code === 'rate_limited')` em vez de
     `msg.includes('429')`. Mais resiliente a mudanças no formato da
     mensagem (ex.: tradução, prefixo).
   - `.message` preservado no mesmo formato
     (`"<status> <text>: <detail>"`) pra não quebrar os 3 outros
     call sites que usam `msg.includes(...)` (Backups.tsx,
     StepBrainConnect.tsx, RestoreSelectRepo.tsx) — migração desses
     fica pra um PR dedicado.

Sintaxe Python validada com `python3 -m py_compile`. Typecheck do
frontend deixado pro CI do PR (deps não instaladas localmente).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant