Skip to content

feat(llma): add evaluation report models and API#54363

Open
andrewm4894 wants to merge 5 commits intomasterfrom
andy/llma-eval-reports-1-models-api
Open

feat(llma): add evaluation report models and API#54363
andrewm4894 wants to merge 5 commits intomasterfrom
andy/llma-eval-reports-1-models-api

Conversation

@andrewm4894
Copy link
Copy Markdown
Member

@andrewm4894 andrewm4894 commented Apr 13, 2026

Problem

LLM Analytics needs scheduled evaluation reports — an LLM agent periodically analyzes evaluation results and delivers a detailed, example-grounded report. This PR adds the data layer foundation.

Part 1 of 5 in a stacked PR series. See reference PR #53400 for full context.

Stack:

  1. Models + API (this PR)
  2. Report Agent (feat(llma): add evaluation report agent #54364)
  3. Delivery + Workflows (feat(llma): add evaluation report delivery and temporal workflows #54365)
  4. Frontend (feat(llma): add evaluation reports frontend #54366)
  5. Wiring + Generated (feat(llma): wire up auto-create hook, generated types, and MCP tools #54367)

Changes

  • Add EvaluationReport and EvaluationReportRun Django models with 5 migrations
  • Add DRF viewset with CRUD endpoints and a manual generation trigger action
  • Register routes on the environments router
  • Add semgrep IDOR rules for new models
  • Comprehensive model and API tests

How did you test this code?

  • Unit tests for model creation, field defaults, and constraints
  • API tests for CRUD operations, permissions, and edge cases
  • All tests pass locally

No

Docs update

skip-inkeep-docs

🤖 LLM context

Agent-assisted PR stack creation from a single reference branch. Code was written iteratively on andy/llma-eval-reports and split into this stack for review.

@github-actions
Copy link
Copy Markdown
Contributor

Hey @andrewm4894! 👋\nThis pull request seems to contain no description. Please add useful context, rationale, and/or any other information that will help make sense of this change now and in the distant Mars-based future.

Copy link
Copy Markdown
Member Author

andrewm4894 commented Apr 13, 2026

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 13, 2026

Prompt To Fix All With AI
This is a comment left during a code review.
Path: products/llm_analytics/backend/api/evaluation_reports.py
Line: 80-93

Comment:
**Empty `delivery_targets` list passes validation**

`validate_delivery_targets` only validates the structure of each item but never checks that at least one target is present. An empty list silently returns, causing `test_create_requires_delivery_targets` to receive a `201` instead of the expected `400`.

```suggestion
    def validate_delivery_targets(self, value: list) -> list:
        if not isinstance(value, list):
            raise serializers.ValidationError("Delivery targets must be a list.")
        if not value:
            raise serializers.ValidationError("At least one delivery target is required.")
        for target in value:
            if not isinstance(target, dict):
                raise serializers.ValidationError("Each delivery target must be an object.")
            target_type = target.get("type")
            if target_type not in ("email", "slack"):
                raise serializers.ValidationError(f"Invalid delivery target type: {target_type}")
            if target_type == "email" and not target.get("value"):
                raise serializers.ValidationError("Email delivery target must include a 'value' field.")
            if target_type == "slack" and (not target.get("integration_id") or not target.get("channel")):
                raise serializers.ValidationError("Slack delivery target must include 'integration_id' and 'channel'.")
        return value
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/llm_analytics/backend/models/evaluation_reports.py
Line: 103-115

Comment:
**`rrule` raises `KeyError` for `EVERY_N` frequency**

`freq_map` covers only `HOURLY`, `DAILY`, and `WEEKLY`. Accessing `freq_map[self.frequency]` on an `EVERY_N` report raises `KeyError`. `set_next_delivery_date` guards against this call, but the property itself has no guard — any future caller (scheduler code, admin tooling) would get an unhandled exception.

```suggestion
    @property
    def rrule(self):
        if self.is_count_triggered:
            raise ValueError("rrule is not available for count-triggered reports (frequency='every_n').")
        freq_map: dict[str, int] = {
            self.Frequency.HOURLY: HOURLY,
            self.Frequency.DAILY: DAILY,
            self.Frequency.WEEKLY: WEEKLY,
        }
        freq = cast(Literal[0, 1, 2, 3, 4, 5, 6], freq_map[self.frequency])
        return rrule(
            freq=freq,
            dtstart=self.start_date,
            byweekday=_to_rrule_weekdays(self.byweekday) if self.byweekday else None,
        )
```

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/llm_analytics/backend/api/evaluation_reports.py
Line: 179-187

Comment:
**`asyncio.run()` inside a sync view fails under ASGI**

`asyncio.run()` creates a new event loop, but if this view is dispatched under an ASGI server (uvicorn/daphne) there is already a running event loop and the call raises `RuntimeError: This event loop is already running`. The same Temporal workflow kick-off pattern elsewhere in PostHog uses a thread-based workaround or `sync_to_async`. Consider checking how other `sync_connect` callers in the codebase handle this.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: products/llm_analytics/backend/migrations/0026_alter_evaluationreport_frequency_and_more.py
Line: 1-36

Comment:
**Five separate migrations for a single feature**

Migrations 0023–0027 all relate to the same `EvaluationReport` table introduced in this PR. Before merging, squash them into a single migration so the history is clean and the table is created in its final state in one shot. This avoids unnecessary intermediate `AlterField` operations on a table that has no rows yet.

How can I resolve this? If you propose a fix, please make it concise.

Reviews (1): Last reviewed commit: "feat(llma): add evaluation report models..." | Re-trigger Greptile

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: e80736bfb9

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Add EvaluationReport and EvaluationReportRun Django models with
migrations, a DRF viewset with CRUD + manual generation trigger,
and comprehensive tests. This provides the data layer for scheduled
evaluation reports that will be wired to a Temporal workflow in a
follow-up PR.
- Add guard for rrule property on count-triggered reports to prevent
  KeyError when frequency is 'every_n'
- Replace asyncio.run() with async_to_sync() for Temporal workflow
  start to avoid RuntimeError under ASGI servers
- Update test to reflect design intent: empty delivery_targets are
  allowed (reports are viewable in UI without delivery configured)
Truthiness check treated 0 as absent, falling back to the instance
value and bypassing min-threshold validation on PATCH.
Consolidate migrations 0023-0027 into a single 0023 that creates
both tables in their final state. No intermediate AlterField ops
needed since the tables have no existing rows.
Same bug as trigger_threshold — attrs.get with default fallback
ignored the instance's current value during PATCH, validating
against the default instead.
@andrewm4894 andrewm4894 force-pushed the andy/llma-eval-reports-1-models-api branch from 7b3cefa to 628c96b Compare April 13, 2026 22:34
@andrewm4894 andrewm4894 added the team/llm-analytics LLM Analytics label Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

team/llm-analytics LLM Analytics

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant