feat(llma): add evaluation report models and API by andrewm4894 · Pull Request #54363 · PostHog/posthog

andrewm4894 · 2026-04-13T22:01:24Z

Problem

LLM Analytics needs scheduled evaluation reports — an LLM agent periodically analyzes evaluation results and delivers a detailed, example-grounded report. This PR adds the data layer foundation.

Part 1 of 5 in a stacked PR series. See reference PR #53400 for full context.

Stack:

Models + API (this PR)
Report Agent (feat(llma): add evaluation report agent #54364)
Delivery + Workflows (feat(llma): add evaluation report delivery and temporal workflows #54365)
Frontend (feat(llma): add evaluation reports frontend #54366)
Wiring + Generated (feat(llma): wire up auto-create hook, generated types, and MCP tools #54367)

Changes

Add EvaluationReport and EvaluationReportRun Django models with 5 migrations
Add DRF viewset with CRUD endpoints and a manual generation trigger action
Register routes on the environments router
Add semgrep IDOR rules for new models
Comprehensive model and API tests

How did you test this code?

Unit tests for model creation, field defaults, and constraints
API tests for CRUD operations, permissions, and edge cases
All tests pass locally

No

Docs update

skip-inkeep-docs

🤖 LLM context

Agent-assisted PR stack creation from a single reference branch. Code was written iteratively on andy/llma-eval-reports and split into this stack for review.

github-actions · 2026-04-13T22:01:36Z

Hey @andrewm4894! 👋\nThis pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future.

andrewm4894 · 2026-04-13T22:01:38Z

This stack of pull requests is managed by Graphite. Learn more about stacking.

greptile-apps · 2026-04-13T22:05:23Z

Prompt To Fix All With AI

This is a comment left during a code review.
Path: products/llm_analytics/backend/api/evaluation_reports.py
Line: 80-93

Comment:
**Empty `delivery_targets` list passes validation**

`validate_delivery_targets` only validates the structure of each item but never checks that at least one target is present. An empty list silently returns, causing `test_create_requires_delivery_targets` to receive a `201` instead of the expected `400`.

```suggestion
    def validate_delivery_targets(self, value: list) -> list:
        if not isinstance(value, list):
            raise serializers.ValidationError("Delivery targets must be a list.")
        if not value:
            raise serializers.ValidationError("At least one delivery target is required.")
        for target in value:
            if not isinstance(target, dict):
                raise serializers.ValidationError("Each delivery target must be an object.")
            target_type = target.get("type")
            if target_type not in ("email", "slack"):
                raise serializers.ValidationError(f"Invalid delivery target type: {target_type}")
            if target_type == "email" and not target.get("value"):
                raise serializers.ValidationError("Email delivery target must include a 'value' field.")
            if target_type == "slack" and (not target.get("integration_id") or not target.get("channel")):
                raise serializers.ValidationError("Slack delivery target must include 'integration_id' and 'channel'.")
        return value
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/llm_analytics/backend/models/evaluation_reports.py
Line: 103-115

Comment:
**`rrule` raises `KeyError` for `EVERY_N` frequency**

`freq_map` covers only `HOURLY`, `DAILY`, and `WEEKLY`. Accessing `freq_map[self.frequency]` on an `EVERY_N` report raises `KeyError`. `set_next_delivery_date` guards against this call, but the property itself has no guard — any future caller (scheduler code, admin tooling) would get an unhandled exception.

```suggestion
    @property
    def rrule(self):
        if self.is_count_triggered:
            raise ValueError("rrule is not available for count-triggered reports (frequency='every_n').")
        freq_map: dict[str, int] = {
            self.Frequency.HOURLY: HOURLY,
            self.Frequency.DAILY: DAILY,
            self.Frequency.WEEKLY: WEEKLY,
        }
        freq = cast(Literal[0, 1, 2, 3, 4, 5, 6], freq_map[self.frequency])
        return rrule(
            freq=freq,
            dtstart=self.start_date,
            byweekday=_to_rrule_weekdays(self.byweekday) if self.byweekday else None,
        )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/llm_analytics/backend/api/evaluation_reports.py
Line: 179-187

Comment:
**`asyncio.run()` inside a sync view fails under ASGI**

`asyncio.run()` creates a new event loop, but if this view is dispatched under an ASGI server (uvicorn/daphne) there is already a running event loop and the call raises `RuntimeError: This event loop is already running`. The same Temporal workflow kick-off pattern elsewhere in PostHog uses a thread-based workaround or `sync_to_async`. Consider checking how other `sync_connect` callers in the codebase handle this.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/llm_analytics/backend/migrations/0026_alter_evaluationreport_frequency_and_more.py
Line: 1-36

Comment:
**Five separate migrations for a single feature**

Migrations 0023–0027 all relate to the same `EvaluationReport` table introduced in this PR. Before merging, squash them into a single migration so the history is clean and the table is created in its final state in one shot. This avoids unnecessary intermediate `AlterField` operations on a table that has no rows yet.

How can I resolve this? If you propose a fix, please make it concise.

_{Reviews (1): Last reviewed commit: "feat(llma): add evaluation report models..." | Re-trigger Greptile}

products/llm_analytics/backend/api/evaluation_reports.py

products/llm_analytics/backend/models/evaluation_reports.py

products/llm_analytics/backend/api/evaluation_reports.py

products/llm_analytics/backend/migrations/0026_alter_evaluationreport_frequency_and_more.py

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e80736bfb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

posthog/api/__init__.py

products/llm_analytics/backend/api/evaluation_reports.py

Add EvaluationReport and EvaluationReportRun Django models with migrations, a DRF viewset with CRUD + manual generation trigger, and comprehensive tests. This provides the data layer for scheduled evaluation reports that will be wired to a Temporal workflow in a follow-up PR.

- Add guard for rrule property on count-triggered reports to prevent KeyError when frequency is 'every_n' - Replace asyncio.run() with async_to_sync() for Temporal workflow start to avoid RuntimeError under ASGI servers - Update test to reflect design intent: empty delivery_targets are allowed (reports are viewable in UI without delivery configured)

Truthiness check treated 0 as absent, falling back to the instance value and bypassing min-threshold validation on PATCH.

Consolidate migrations 0023-0027 into a single 0023 that creates both tables in their final state. No intermediate AlterField ops needed since the tables have no existing rows.

Same bug as trigger_threshold — attrs.get with default fallback ignored the instance's current value during PATCH, validating against the default instead.

andrewm4894 self-assigned this Apr 13, 2026

greptile-apps bot reviewed Apr 13, 2026

View reviewed changes

chatgpt-codex-connector bot reviewed Apr 13, 2026

View reviewed changes

posthog/api/__init__.py Show resolved Hide resolved

products/llm_analytics/backend/api/evaluation_reports.py Show resolved Hide resolved

products/llm_analytics/backend/api/evaluation_reports.py Outdated Show resolved Hide resolved

graphite-app bot reviewed Apr 13, 2026

View reviewed changes

products/llm_analytics/backend/api/evaluation_reports.py Outdated Show resolved Hide resolved

andrewm4894 added 5 commits April 13, 2026 23:33

fix(llma): use explicit key check for trigger_threshold validation

6f8fde9

Truthiness check treated 0 as absent, falling back to the instance value and bypassing min-threshold validation on PATCH.

chore(llma): squash evaluation report migrations into single migration

d0c29cb

Consolidate migrations 0023-0027 into a single 0023 that creates both tables in their final state. No intermediate AlterField ops needed since the tables have no existing rows.

fix(llma): use instance value for cooldown_minutes on partial update

628c96b

Same bug as trigger_threshold — attrs.get with default fallback ignored the instance's current value during PATCH, validating against the default instead.

andrewm4894 force-pushed the andy/llma-eval-reports-1-models-api branch from 7b3cefa to 628c96b Compare April 13, 2026 22:34

andrewm4894 added the team/llm-analytics LLM Analytics label Apr 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(llma): add evaluation report models and API#54363

feat(llma): add evaluation report models and API#54363
andrewm4894 wants to merge 5 commits intomasterfrom
andy/llma-eval-reports-1-models-api

andrewm4894 commented Apr 13, 2026 •

edited

Loading

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

andrewm4894 commented Apr 13, 2026 •

edited

Loading

Uh oh!

greptile-apps bot commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

andrewm4894 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Changes

How did you test this code?

Docs update

🤖 LLM context

Uh oh!

github-actions bot commented Apr 13, 2026

Uh oh!

andrewm4894 commented Apr 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

greptile-apps bot commented Apr 13, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

andrewm4894 commented Apr 13, 2026 •

edited

Loading

andrewm4894 commented Apr 13, 2026 •

edited

Loading