Skip to content

fix(insights): history dedup uses title fallback, limits after dedup#461

Merged
izadoesdev merged 1 commit into
stagingfrom
izadoesdev/fix-history-dedup
May 28, 2026
Merged

fix(insights): history dedup uses title fallback, limits after dedup#461
izadoesdev merged 1 commit into
stagingfrom
izadoesdev/fix-history-dedup

Conversation

@izadoesdev
Copy link
Copy Markdown
Member

@izadoesdev izadoesdev commented May 28, 2026

Summary

Fixes three issues from PR #460 review:

  • Empty subjectKey no longer collapses unrelated insights into one (P1)
  • Fetch 50 rows then dedup to 12 unique subjects (was LIMIT 12 before dedup)
  • Remove unused runId from select

Test plan

  • bun test in apps/insights

Summary by cubic

Fix insight history dedup to prevent empty subjectKey from merging unrelated insights, and apply the 12-item cap after dedup by fetching 50 rows first. Remove the unused runId from the query and clean up HTML spacing in packages/evals/ui.

Written for commit 50b1ddc. Summary will update on new commits.

Review in cubic

- Empty subjectKey no longer collapses unrelated insights (P1)
- Fetch 50 rows then dedup to 12 unique subjects (was LIMIT 12 before dedup)
- Remove unused runId from select
@izadoesdev izadoesdev merged commit 5693333 into staging May 28, 2026
3 of 4 checks passed
@vercel
Copy link
Copy Markdown

vercel Bot commented May 28, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
databuddy-status Building Building Preview, Comment May 28, 2026 2:36pm
2 Skipped Deployments
Project Deployment Actions Updated (UTC)
dashboard Skipped Skipped May 28, 2026 2:36pm
documentation Skipped Skipped May 28, 2026 2:36pm

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: a63108f9-fdff-4db4-bac4-a017271cd64a

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch izadoesdev/fix-history-dedup

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 28, 2026

Greptile Summary

This PR fixes three issues from review #460: it prevents empty subjectKey from collapsing unrelated insights during dedup, fetches 50 rows before applying the 12-subject limit, and drops the unused runId from the select.

  • The subjectKey || title fallback correctly prevents empty-key collisions, and the LIMIT 50 → dedup → 12 pipeline ordering is now right.
  • The dedup loop's early-exit guard (lines.length >= RECENT_INSIGHTS_PROMPT_LIMIT) counts total output lines rather than unique subjects; since each insight can contribute up to 3 lines (title, description, rootCause), the loop can stop after as few as ~4 subjects instead of the intended 12.
  • The packages/evals/ui/index.html change is cosmetic indentation only.

Confidence Score: 3/5

The core dedup fix is correct, but the early-exit guard in the dedup loop still misbehaves when insights carry description and rootCause fields.

An org with fully-populated insights would see only 4-6 subjects in the history context instead of the intended 12, quietly degrading AI prompt quality on every run.

apps/insights/src/prompts.ts — the dedup loop and its early-exit condition

Important Files Changed

Filename Overview
apps/insights/src/prompts.ts Fixes empty-subjectKey dedup collapse and moves LIMIT after dedup, but the limit guard checks total output lines instead of unique-subject count, which can prematurely stop iteration at far fewer than 12 subjects.
packages/evals/ui/index.html Whitespace/indentation-only changes; no functional impact.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[DB query LIMIT 50 ordered by createdAt desc] --> B{rows empty?}
    B -- Yes --> C[return empty string]
    B -- No --> D[Build subjectCounts map using subjectKey OR title fallback]
    D --> E[Loop over rows]
    E --> F{lines.length >= RECENT_INSIGHTS_PROMPT_LIMIT?}
    F -- Yes --> G[break]
    F -- No --> H{key already in seen set?}
    H -- Yes --> E
    H -- No --> I[Add key to seen and push 1 to 3 lines per row]
    I --> E
    G --> J[Return formatted context block]
    I --> J
Loading

Reviews (1): Last reviewed commit: "fix(insights): history dedup uses title ..." | Re-trigger Greptile

Comment on lines 126 to +136
const seen = new Set<string>();
const lines: string[] = [];
for (const row of rows) {
if (seen.has(row.subjectKey)) {
if (lines.length >= RECENT_INSIGHTS_PROMPT_LIMIT) {
break;
}
const key = row.subjectKey || row.title;
if (seen.has(key)) {
continue;
}
seen.add(row.subjectKey);
seen.add(key);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Use a dedicated counter for unique subjects rather than lines.length, so insights with description and rootCause don't erroneously reduce the number of distinct subjects shown. The PR description explicitly calls for "12 unique subjects" but the current guard measures output lines.

Suggested change
const seen = new Set<string>();
const lines: string[] = [];
for (const row of rows) {
if (seen.has(row.subjectKey)) {
if (lines.length >= RECENT_INSIGHTS_PROMPT_LIMIT) {
break;
}
const key = row.subjectKey || row.title;
if (seen.has(key)) {
continue;
}
seen.add(row.subjectKey);
seen.add(key);
const seen = new Set<string>();
const lines: string[] = [];
let subjectCount = 0;
for (const row of rows) {
if (subjectCount >= RECENT_INSIGHTS_PROMPT_LIMIT) {
break;
}
const key = row.subjectKey || row.title;
if (seen.has(key)) {
continue;
}
seen.add(key);
subjectCount++;

Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant