feat lead scoring and sorting#375
Conversation
📝 WalkthroughWalkthroughAdds an end-to-end lead scoring system. Django signal handlers on ChangesLead Scoring Engine
Sequence Diagram(s)sequenceDiagram
participant User as User<br/>(Frontend)
participant Frontend as leads.html<br/>(JS Filter)
participant API as LeadViewSet<br/>(Backend)
participant Signals as CampaignLead<br/>Signals
participant Lead as Lead Model
User->>Frontend: Set min_score, max_score, sort_by
Frontend->>Frontend: getFilterParams()
Frontend->>Frontend: updateFilterIndicator()
Frontend->>API: GET /leads?min_score=X&max_score=Y&sort_by=Z
API->>API: parse min_score, max_score, sort_by
API->>Lead: filter(score__gte=min_score, score__lte=max_score).order_by(...)
API-->>Frontend: filtered, sorted lead list
Note over Signals: Score maintained via signals
Signals->>Lead: _update_lead_score(lead)
Signals->>Lead: save(update_fields=['score'])
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes Possibly related issues
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (2)
backend/campaigns/signals.py (1)
60-79: ⚡ Quick winConsider aggregating counts in a single query for better performance.
_calculate_lead_scoreexecutes 5 separateCOUNTqueries againstCampaignLeadfor each lead. While acceptable in signal handlers (single lead at a time), this becomes an N+1 issue in the backfill command where it's called in a loop.A single aggregated query would reduce database round-trips:
♻️ Proposed optimization using aggregate
def _calculate_lead_score(lead): """ Derive a simple engagement score for a lead from related CampaignLead rows. """ - qs = CampaignLead.objects.filter(lead=lead) - open_count = qs.filter(last_opened_at__isnull=False).count() - click_count = qs.filter(last_clicked_at__isnull=False).count() - reply_count = qs.filter(status='REPLIED').count() - bounced_count = qs.filter(status='BOUNCED').count() - active_count = qs.filter(status='ACTIVE').count() + from django.db.models import Count, Q + + agg = CampaignLead.objects.filter(lead=lead).aggregate( + open_count=Count('id', filter=Q(last_opened_at__isnull=False)), + click_count=Count('id', filter=Q(last_clicked_at__isnull=False)), + reply_count=Count('id', filter=Q(status='REPLIED')), + bounced_count=Count('id', filter=Q(status='BOUNCED')), + active_count=Count('id', filter=Q(status='ACTIVE')), + ) score = ( - (open_count * 5) - + (click_count * 7) - + (reply_count * 10) - + (active_count * 2) - - (bounced_count * 5) + (agg['open_count'] * 5) + + (agg['click_count'] * 7) + + (agg['reply_count'] * 10) + + (agg['active_count'] * 2) + - (agg['bounced_count'] * 5) - (5 if lead.global_unsubscribe else 0) ) return max(score, 0)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/campaigns/signals.py` around lines 60 - 79, The _calculate_lead_score function performs 5 separate database COUNT queries on the CampaignLead queryset (for open_count, click_count, reply_count, bounced_count, and active_count), which causes N+1 query problems when called in loops. Replace these 5 separate filter().count() calls with a single aggregate() query using Django's Count, Case, and When functions to calculate all metrics in one database round-trip, then extract the aggregated values to compute the score.backend/leads/management/commands/backfill_lead_scores.py (1)
18-30: ⚖️ Poor tradeoffBackfill will be slow for large datasets; consider batching and progress feedback.
For organizations with many leads, this command will be slow (5+ queries per lead, no batched writes). Consider:
- Adding periodic progress output so operators know backfill is progressing
- Using
bulk_updatefor batched writes♻️ Proposed improvement with progress feedback
def handle(self, *args, **options): lead_id = options.get('lead_id') leads = Lead.objects.all() if lead_id: leads = leads.filter(id=lead_id) processed = 0 for lead in leads.iterator(): lead.score = _calculate_lead_score(lead) lead.save(update_fields=['score']) processed += 1 + if processed % 1000 == 0: + self.stdout.write(f'Processed {processed} leads...') self.stdout.write(self.style.SUCCESS(f'Successfully backfilled scores for {processed} lead(s).'))🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/management/commands/backfill_lead_scores.py` around lines 18 - 30, The handle method processes leads one at a time with individual save() calls, which is inefficient for large datasets. Replace the current loop that iterates through leads and calls save(update_fields=['score']) for each lead with a batched approach using bulk_update. Collect the updated leads in batches (e.g., every 100-1000 leads) and call bulk_update on them together, which will reduce the number of database queries. Additionally, add periodic progress output within the loop (using self.stdout.write) at regular intervals to provide feedback to operators that the backfill is progressing, such as every 100 or 1000 processed leads.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/leads/views.py`:
- Around line 95-99: The issue is that when sort_by is empty (the default
frontend selection), the code returns qs.distinct() without applying any
ordering, resulting in arbitrary database order instead of the expected
newest-first ordering. To fix this, modify the logic to check if sort_by is
empty or not in the allowed list, and apply a default ordering of '-created_at'
in those cases. Keep the existing allowed set check for valid sort options, but
add an elif condition (or modify the return logic) to handle the empty/invalid
sort_by case by returning qs.order_by('-created_at').distinct() instead of just
qs.distinct().
---
Nitpick comments:
In `@backend/campaigns/signals.py`:
- Around line 60-79: The _calculate_lead_score function performs 5 separate
database COUNT queries on the CampaignLead queryset (for open_count,
click_count, reply_count, bounced_count, and active_count), which causes N+1
query problems when called in loops. Replace these 5 separate filter().count()
calls with a single aggregate() query using Django's Count, Case, and When
functions to calculate all metrics in one database round-trip, then extract the
aggregated values to compute the score.
In `@backend/leads/management/commands/backfill_lead_scores.py`:
- Around line 18-30: The handle method processes leads one at a time with
individual save() calls, which is inefficient for large datasets. Replace the
current loop that iterates through leads and calls save(update_fields=['score'])
for each lead with a batched approach using bulk_update. Collect the updated
leads in batches (e.g., every 100-1000 leads) and call bulk_update on them
together, which will reduce the number of database queries. Additionally, add
periodic progress output within the loop (using self.stdout.write) at regular
intervals to provide feedback to operators that the backfill is progressing,
such as every 100 or 1000 processed leads.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 5c28677e-9a7f-47ff-b1db-1666bd14159c
📒 Files selected for processing (6)
backend/campaigns/signals.pybackend/leads/management/__init__.pybackend/leads/management/commands/__init__.pybackend/leads/management/commands/backfill_lead_scores.pybackend/leads/views.pyfrontend/leads.html
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/campaigns/tests.py (1)
1467-1496: ⚡ Quick winMissing
Lead.scoreassertions despite PR objectives.The PR objective explicitly calls for lead scoring, and
signals.pycalls_update_lead_score(instance.lead)on save/delete. However, none of these tests assert onself.lead.score. Consider adding assertions to verify the lead score is updated correctly when engagement events change:Suggested additions
def test_campaignlead_create_updates_cached_counts_incrementally(self): with patch('campaigns.signals._update_campaign_counters', side_effect=AssertionError('full recount should not run')): CampaignLead.objects.create( organization=self.organization, campaign=self.campaign, lead=self.lead, status='ACTIVE', last_opened_at=timezone.now(), ) self.campaign.refresh_from_db() self.assertEqual(self.campaign.leads_count, 1) # ... existing assertions ... + + self.lead.refresh_from_db() + self.assertGreater(self.lead.score, 0, "Lead score should increase on engagement") def test_campaignlead_update_and_delete_apply_delta_changes(self): # ... existing test code ... campaign_lead.status = 'REPLIED' campaign_lead.last_opened_at = timezone.now() campaign_lead.last_clicked_at = timezone.now() campaign_lead.save() self.campaign.refresh_from_db() # ... existing campaign assertions ... + + self.lead.refresh_from_db() + score_before_delete = self.lead.score + self.assertGreater(score_before_delete, 0) campaign_lead.delete() self.campaign.refresh_from_db() # ... existing assertions ... + + self.lead.refresh_from_db() + # Score should be recalculated (likely 0 if no other campaign engagements)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/campaigns/tests.py` around lines 1467 - 1496, The test method test_campaignlead_update_and_delete_apply_delta_changes verifies delta changes to campaign counts but is missing assertions for lead score updates. The signals.py module calls _update_lead_score(instance.lead) on save and delete operations, but the test does not verify this scoring behavior. After updating the campaign_lead with engagement events (status change to REPLIED, last_opened_at, last_clicked_at), refresh self.lead from the database and add assertions to verify the score has been incremented correctly based on the engagement activity. After deleting the campaign_lead, refresh self.lead again and add assertions to verify the score has been decremented or adjusted appropriately.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/campaigns/signals.py`:
- Around line 191-193: The try/except block that catches all Exception types is
too broad and masks actual database errors. Remove the broad except Exception
clause entirely since the guard in _apply_campaign_counter_delta already handles
the case of a missing campaign_id, or if you must keep exception handling,
narrow it to only catch specific expected exceptions like DatabaseError from the
database module. This ensures that actual programming errors and database
connectivity issues are not silently suppressed.
---
Nitpick comments:
In `@backend/campaigns/tests.py`:
- Around line 1467-1496: The test method
test_campaignlead_update_and_delete_apply_delta_changes verifies delta changes
to campaign counts but is missing assertions for lead score updates. The
signals.py module calls _update_lead_score(instance.lead) on save and delete
operations, but the test does not verify this scoring behavior. After updating
the campaign_lead with engagement events (status change to REPLIED,
last_opened_at, last_clicked_at), refresh self.lead from the database and add
assertions to verify the score has been incremented correctly based on the
engagement activity. After deleting the campaign_lead, refresh self.lead again
and add assertions to verify the score has been decremented or adjusted
appropriately.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: 6462c75a-2ba2-4274-82a7-6fd774a50099
📒 Files selected for processing (2)
backend/campaigns/signals.pybackend/campaigns/tests.py
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
backend/leads/management/commands/backfill_lead_scores.py (1)
27-28: 🏗️ Heavy liftBackfill still does one score-aggregation query per lead.
bulk_updateimproves writes, butLine 28calls_calculate_lead_score(lead), which runs an aggregate query per lead. On large datasets this becomes an N-query read bottleneck. Consider replacing per-lead score calculation with a set-based aggregation keyed bylead_id, then batch-apply computed scores.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@backend/leads/management/commands/backfill_lead_scores.py` around lines 27 - 28, The loop iterating through leads calls `_calculate_lead_score(lead)` for each individual lead, triggering an aggregate query per lead which creates a read bottleneck on large datasets. Refactor the code to perform a single set-based aggregation keyed by lead_id to compute scores for all leads at once, then batch-apply these pre-computed scores to the leads collection instead of calculating scores one-by-one within the iteration loop.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@backend/leads/tests.py`:
- Line 469: The list comprehension at line 469 uses the ambiguous
single-character loop variable `l` which triggers the Ruff E741 lint error and
reduces code readability. Rename the loop variable `l` to a descriptive name
like `lead` throughout the list comprehension `[l['email'] for l in resp.data]`
so it becomes `[lead['email'] for lead in resp.data]` to satisfy linting
requirements and improve clarity.
---
Nitpick comments:
In `@backend/leads/management/commands/backfill_lead_scores.py`:
- Around line 27-28: The loop iterating through leads calls
`_calculate_lead_score(lead)` for each individual lead, triggering an aggregate
query per lead which creates a read bottleneck on large datasets. Refactor the
code to perform a single set-based aggregation keyed by lead_id to compute
scores for all leads at once, then batch-apply these pre-computed scores to the
leads collection instead of calculating scores one-by-one within the iteration
loop.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro Plus
Run ID: fd4c0704-01ef-4789-bf90-dbfa0cd9f000
📒 Files selected for processing (5)
backend/campaigns/signals.pybackend/campaigns/tests.pybackend/leads/management/commands/backfill_lead_scores.pybackend/leads/tests.pybackend/leads/views.py
🚧 Files skipped from review as they are similar to previous changes (3)
- backend/leads/views.py
- backend/campaigns/tests.py
- backend/campaigns/signals.py
| def test_no_filter_defaults_to_newest_first(self): | ||
| resp = self._get() | ||
| self.assertEqual(resp.status_code, status.HTTP_200_OK) | ||
| emails = [l['email'] for l in resp.data] |
There was a problem hiding this comment.
Rename ambiguous loop variable to satisfy lint and readability
Line 469 uses l, which triggers Ruff E741. Rename it to a descriptive name (e.g., lead) to keep CI lint green.
Suggested patch
- emails = [l['email'] for l in resp.data]
+ emails = [lead['email'] for lead in resp.data]📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| emails = [l['email'] for l in resp.data] | |
| emails = [lead['email'] for lead in resp.data] |
🧰 Tools
🪛 Ruff (0.15.17)
[error] 469-469: Ambiguous variable name: l
(E741)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@backend/leads/tests.py` at line 469, The list comprehension at line 469 uses
the ambiguous single-character loop variable `l` which triggers the Ruff E741
lint error and reduces code readability. Rename the loop variable `l` to a
descriptive name like `lead` throughout the list comprehension `[l['email'] for
l in resp.data]` so it becomes `[lead['email'] for lead in resp.data]` to
satisfy linting requirements and improve clarity.
Source: Linters/SAST tools
|
Hi @saurabhhhcodes 👋 LeadOrbit Bot here 🤖 A repository star ⭐ is required for PR review and merge. We noticed that you haven't starred the repository yet. Please star the repo and reply once completed so we can continue with the review process. Thanks for contributing to LeadOrbit! 🚀 |
|
Hi @saurabhhhcodes 👋 LeadOrbit Bot here 🤖 We noticed you've opened a Pull Request but haven't starred the repository yet. ⭐ Starring the repository is mandatory for PR review and merge. Please:
Once you've done that, the bot will continue processing your PR. Note: PRs from contributors who haven't starred the repository will remain pending until this requirement is completed. Thanks for contributing to LeadOrbit! 🚀 |
What changed
Why
Validation
python3 -m py_compile /tmp/leadorbit-335/backend/campaigns/signals.py /tmp/leadorbit-335/backend/leads/views.py /tmp/leadorbit-335/backend/leads/tests.py /tmp/leadorbit-335/backend/leads/management/commands/backfill_lead_scores.pygit -C /tmp/leadorbit-335 diff --checkCloses #311
Closes #181
Closes #91
Summary by CodeRabbit