Skip to content

Fix analytics accuracy using tenant-scoped queries and cache-based deduplication#469

Open
Yogender-verma wants to merge 1 commit into
Kuldeeep18:mainfrom
Yogender-verma:main
Open

Fix analytics accuracy using tenant-scoped queries and cache-based deduplication#469
Yogender-verma wants to merge 1 commit into
Kuldeeep18:mainfrom
Yogender-verma:main

Conversation

@Yogender-verma

@Yogender-verma Yogender-verma commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Pull Request

🔗 Related Issue

Closes #20


📝 Summary of Changes

  • Fixed campaign analytics accuracy issues
  • Enforced strict tenant isolation across analytics queries and webhook processing
  • Added cache-based deduplication for webhook and async event handling
  • Prevented duplicate event processing across campaigns

🏷️ Type of Change

  • 🐛 Bug fix
  • ✨ New feature
  • ♻️ Refactor
  • 📝 Documentation update
  • 🎨 UI / Style change
  • 🔧 Chore

🧪 Testing

How this was tested:

  1. Ran full Django test suite (python manage.py test)
  2. Verified tenant isolation via automated tests
  3. Tested webhook deduplication using repeated event simulation
  4. Confirmed analytics stability across duplicate event triggers

✅ Checklist

  • No merge conflicts
  • Changes follow project guidelines
  • Related issue linked
  • Tests passed locally
  • No UI/API breaking changes

Summary by CodeRabbit

  • Bug Fixes
    • Improved email tracking reliability by preventing duplicate bounce, reply, open, and click events from being processed more than once.
    • Reduced accidental cross-tenant updates when multiple records could match the same email event.
    • Kept lead activity timestamps accurate by ignoring repeated webhook and click submissions instead of overwriting existing activity.

@coderabbitai

coderabbitai Bot commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Adds cache-based event deduplication (48-hour TTL) for bounce, reply, open, and click email events. A new process_email_event helper centralizes deduplication and CampaignLead state updates in views.py. WebhookView and ClickTrackingView are refactored to use it, with added tenant-isolation filtering. Background tasks gain parallel dedup guards for bounce and reply loops. New tests cover tenant isolation and duplicate suppression.

Changes

Email Event Deduplication

Layer / File(s) Summary
process_email_event helper and webhook refactor
backend/campaigns/views.py
Adds process_email_event with 48-hour cache.add deduplication keyed per CampaignLead + event. WebhookView.post now derives provider_event_id, filters by campaign_id/organization_id to prevent cross-tenant leakage, and delegates all state updates to process_email_event. ClickTrackingView.get routes click analytics through the same helper.
Bounce and reply deduplication in tasks
backend/campaigns/tasks.py
Imports cache and adds cache.add guards in both the bounce-marking loop and the Gmail reply-polling loop; duplicate events are logged and skipped.
Analytics deduplication and tenant isolation tests
backend/campaigns/tests.py
Adds CampaignAnalyticsPatchTests with setUp clearing cache, and tests for webhook tenant isolation, webhook open deduplication, and click-tracking deduplication.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Poem

🐇 Hop hop, no double stamps today,
The cache says "seen this!" — bounce away!
Each open, click, reply tracked once,
No duplicate metrics from this bunny's stunts.
Tenants stay separate, events stay true,
Clean analytics, through and through! ✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly matches the main change: tenant-scoped analytics fixes with cache-based deduplication.
Linked Issues check ✅ Passed The PR addresses the linked campaign analytics inaccuracy by improving tenant isolation and preventing duplicate event processing.
Out of Scope Changes check ✅ Passed The changes stay focused on analytics accuracy, deduplication, tenant scoping, and tests, with no obvious unrelated additions.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@backend/campaigns/tasks.py`:
- Around line 452-458: The bounce and reply task loops set the dedupe cache key
before the state transition finishes, so if processing fails the event is lost
on retry; update the bounce path around _mark_campaign_lead_bounced() and the
reply-processing block to remove the dedupe key whenever an exception occurs
during marking, saving, or routing, then re-raise/continue so the next run can
retry. Use the existing dedupe_key and cache.add() flow in
backend/campaigns/tasks.py, and apply the same cleanup pattern in both task
loops.

In `@backend/campaigns/tests.py`:
- Around line 1976-1985: The test setup in the User.objects.create_user calls is
using hardcoded password literals that are never used, so remove the password
argument from the user_a and user_b fixture creation in this test class. Keep
the create_user calls for the existing User model setup, but eliminate the
unused password= values to satisfy Ruff and avoid unnecessary test data.

In `@backend/campaigns/views.py`:
- Around line 389-392: The dedupe marker is being set too early in the webhook
flow, so a failure after cache.add() in the campaign event handler can cause
valid retries to be treated as duplicates. Update the dedupe handling around the
relevant campaign webhook processing path in views.py (the logic using
dedupe_key, cache.add, and the subsequent save/branch helpers) so the marker is
only committed after successful processing, or is explicitly removed if an
exception occurs before completion. Keep the duplicate check behavior in place,
but ensure failures do not leave a stale 48-hour marker behind.
- Around line 384-390: The deduplication key for bounce/reply terminal events is
inconsistent between the webhook path and the task path because `dedupe_key` in
the webhook logic prefers `provider_event_id` while the task guards use
`last_sent_message_id`. Update the dedupe-key construction in the webhook
handling around `cache.add` to use the same identifier precedence as the task
logic, so both paths generate identical `evt_dedupe` keys for the same event.
Verify the symbols `dedupe_key`, `provider_event_id`, `message_id`, and
`last_sent_message_id` all resolve to one shared scheme for bounce/reply events.
- Around line 501-516: Require a single trusted lead match before any processing
in the webhook handler. In the view logic around the `org_id` filtering and
`cleads = list(base_qs)`, do not treat unauthenticated payload tenant fields
(`organization_id`, `org_id`, `orgId`) as sufficient to resolve ambiguity; only
narrow the queryset when a trusted server-generated identifier like `message_id`
or `campaign_id` is present. Update the `CampaignLead` selection path so that if
the filtered result still yields multiple rows, the request is ignored rather
than iterating through all matches in the `for cl in cleads` block, and keep the
existing ambiguity warning/response for `AllowAny` requests.
- Around line 854-859: The click redirect flow in the code path that calls
process_email_event() should remain best-effort even if analytics processing
fails. Update the click handling block around process_email_event in the lead
redirect logic to catch and ignore transient exceptions from
analytics/cache/database work, while still only special-casing
CampaignLead.DoesNotExist as needed, so the destination redirect can continue
even when event processing fails.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro Plus

Run ID: f6ae228c-25bc-410b-8282-9e2ac36171f0

📥 Commits

Reviewing files that changed from the base of the PR and between 4a33158 and 039bf98.

📒 Files selected for processing (3)
  • backend/campaigns/tasks.py
  • backend/campaigns/tests.py
  • backend/campaigns/views.py

Comment thread backend/campaigns/tasks.py
Comment thread backend/campaigns/tests.py
Comment thread backend/campaigns/views.py
Comment thread backend/campaigns/views.py
Comment thread backend/campaigns/views.py
Comment thread backend/campaigns/views.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

campaign analytics

1 participant