-
Notifications
You must be signed in to change notification settings - Fork 2.5k
feat(llma): add evaluation report models and API #54363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
andrewm4894
wants to merge
7
commits into
master
Choose a base branch
from
andy/llma-eval-reports-1-models-api
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1bd7745
feat(llma): add evaluation report models and API
andrewm4894 fb79a9a
fix(llma): address bot review feedback on evaluation reports
andrewm4894 bcf24d1
fix(llma): use explicit key check for trigger_threshold validation
andrewm4894 af308a8
chore(llma): squash evaluation report migrations into single migration
andrewm4894 3dec016
fix(llma): use instance value for cooldown_minutes on partial update
andrewm4894 1c08631
fix: correct mcp_store import path after rebase
andrewm4894 5a83f6d
chore: update OpenAPI generated types
tests-posthog[bot] File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Oops, something went wrong.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
201 changes: 201 additions & 0 deletions
201
products/llm_analytics/backend/api/evaluation_reports.py
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,201 @@ | ||
| """API endpoints for evaluation report configuration and report run history.""" | ||
|
|
||
| import datetime as dt | ||
|
|
||
| from django.conf import settings | ||
| from django.db.models import QuerySet | ||
|
|
||
| import structlog | ||
| from asgiref.sync import async_to_sync | ||
| from drf_spectacular.utils import extend_schema | ||
| from rest_framework import serializers, status, viewsets | ||
| from rest_framework.decorators import action | ||
| from rest_framework.request import Request | ||
| from rest_framework.response import Response | ||
|
|
||
| from posthog.api.routing import TeamAndOrgViewSetMixin | ||
| from posthog.permissions import AccessControlPermission | ||
|
|
||
| from products.llm_analytics.backend.api.metrics import llma_track_latency | ||
| from products.llm_analytics.backend.models.evaluation_reports import EvaluationReport, EvaluationReportRun | ||
|
|
||
| logger = structlog.get_logger(__name__) | ||
|
|
||
|
|
||
| class EvaluationReportSerializer(serializers.ModelSerializer): | ||
| class Meta: | ||
| model = EvaluationReport | ||
| fields = [ | ||
| "id", | ||
| "evaluation", | ||
| "frequency", | ||
| "byweekday", | ||
| "start_date", | ||
| "next_delivery_date", | ||
| "delivery_targets", | ||
| "max_sample_size", | ||
| "enabled", | ||
| "deleted", | ||
| "last_delivered_at", | ||
| "report_prompt_guidance", | ||
| "trigger_threshold", | ||
| "cooldown_minutes", | ||
| "daily_run_cap", | ||
| "created_by", | ||
| "created_at", | ||
| ] | ||
| read_only_fields = ["id", "next_delivery_date", "last_delivered_at", "created_by", "created_at"] | ||
|
|
||
| def validate_evaluation(self, value): | ||
| # Prevent creating a report in team A that references team B's evaluation: | ||
| # the FK queryset is unscoped, so a user with access to multiple teams could | ||
| # otherwise cross tenant boundaries by passing a foreign evaluation id. | ||
| team = self.context["get_team"]() | ||
| if value.team_id != team.id: | ||
| raise serializers.ValidationError("Evaluation does not belong to this team.") | ||
| return value | ||
|
|
||
| def validate(self, attrs): | ||
| attrs = super().validate(attrs) | ||
| frequency = attrs.get("frequency") or (self.instance.frequency if self.instance else None) | ||
| if frequency == EvaluationReport.Frequency.EVERY_N: | ||
| threshold = ( | ||
| attrs.get("trigger_threshold") | ||
| if "trigger_threshold" in attrs | ||
| else (self.instance.trigger_threshold if self.instance else None) | ||
| ) | ||
| if threshold is None: | ||
| raise serializers.ValidationError({"trigger_threshold": "Required when frequency is 'every_n'."}) | ||
| if threshold < EvaluationReport.TRIGGER_THRESHOLD_MIN: | ||
| raise serializers.ValidationError( | ||
| {"trigger_threshold": f"Minimum is {EvaluationReport.TRIGGER_THRESHOLD_MIN}."} | ||
| ) | ||
| if threshold > EvaluationReport.TRIGGER_THRESHOLD_MAX: | ||
| raise serializers.ValidationError( | ||
| {"trigger_threshold": f"Maximum is {EvaluationReport.TRIGGER_THRESHOLD_MAX}."} | ||
| ) | ||
| cooldown = ( | ||
| attrs.get("cooldown_minutes") | ||
| if "cooldown_minutes" in attrs | ||
| else (self.instance.cooldown_minutes if self.instance else EvaluationReport.COOLDOWN_MINUTES_DEFAULT) | ||
| ) | ||
| if cooldown < EvaluationReport.COOLDOWN_MINUTES_MIN: | ||
| raise serializers.ValidationError( | ||
| {"cooldown_minutes": f"Minimum is {EvaluationReport.COOLDOWN_MINUTES_MIN} minutes."} | ||
| ) | ||
| return attrs | ||
|
|
||
| def validate_delivery_targets(self, value: list) -> list: | ||
| if not isinstance(value, list): | ||
| raise serializers.ValidationError("Delivery targets must be a list.") | ||
| for target in value: | ||
| if not isinstance(target, dict): | ||
| raise serializers.ValidationError("Each delivery target must be an object.") | ||
| target_type = target.get("type") | ||
| if target_type not in ("email", "slack"): | ||
| raise serializers.ValidationError(f"Invalid delivery target type: {target_type}") | ||
| if target_type == "email" and not target.get("value"): | ||
| raise serializers.ValidationError("Email delivery target must include a 'value' field.") | ||
| if target_type == "slack" and (not target.get("integration_id") or not target.get("channel")): | ||
| raise serializers.ValidationError("Slack delivery target must include 'integration_id' and 'channel'.") | ||
| return value | ||
|
|
||
| def create(self, validated_data): | ||
| request = self.context["request"] | ||
| team = self.context["get_team"]() | ||
| validated_data["team"] = team | ||
| validated_data["created_by"] = request.user | ||
| return super().create(validated_data) | ||
|
|
||
|
|
||
| class EvaluationReportRunSerializer(serializers.ModelSerializer): | ||
| class Meta: | ||
| model = EvaluationReportRun | ||
| fields = [ | ||
| "id", | ||
| "report", | ||
| "content", | ||
| "metadata", | ||
| "period_start", | ||
| "period_end", | ||
| "delivery_status", | ||
| "delivery_errors", | ||
| "created_at", | ||
| ] | ||
| read_only_fields = fields | ||
|
|
||
|
|
||
| class EvaluationReportViewSet(TeamAndOrgViewSetMixin, viewsets.ModelViewSet): | ||
| """CRUD for evaluation report configurations + report run history.""" | ||
|
|
||
| scope_object = "llm_analytics" | ||
| permission_classes = [AccessControlPermission] | ||
| serializer_class = EvaluationReportSerializer | ||
| queryset = EvaluationReport.objects.all() | ||
|
|
||
| def safely_get_queryset(self, queryset: QuerySet[EvaluationReport]) -> QuerySet[EvaluationReport]: | ||
| queryset = queryset.filter(team_id=self.team_id).order_by("-created_at") | ||
| if self.action not in ("update", "partial_update"): | ||
| queryset = queryset.filter(deleted=False) | ||
| return queryset | ||
|
|
||
| @llma_track_latency("llma_evaluation_reports_list") | ||
| def list(self, request: Request, *args, **kwargs) -> Response: | ||
| return super().list(request, *args, **kwargs) | ||
|
|
||
| @llma_track_latency("llma_evaluation_reports_create") | ||
| def create(self, request: Request, *args, **kwargs) -> Response: | ||
| return super().create(request, *args, **kwargs) | ||
|
|
||
| @llma_track_latency("llma_evaluation_reports_retrieve") | ||
| def retrieve(self, request: Request, *args, **kwargs) -> Response: | ||
| return super().retrieve(request, *args, **kwargs) | ||
|
|
||
| @llma_track_latency("llma_evaluation_reports_update") | ||
| def update(self, request: Request, *args, **kwargs) -> Response: | ||
| return super().update(request, *args, **kwargs) | ||
|
|
||
| @llma_track_latency("llma_evaluation_reports_partial_update") | ||
| def partial_update(self, request: Request, *args, **kwargs) -> Response: | ||
| return super().partial_update(request, *args, **kwargs) | ||
|
|
||
| def perform_destroy(self, instance): | ||
| instance.deleted = True | ||
| instance.save(update_fields=["deleted"]) | ||
|
|
||
| @action(detail=True, methods=["get"], url_path="runs") | ||
| @llma_track_latency("llma_evaluation_report_runs_list") | ||
| def runs(self, request: Request, **kwargs) -> Response: | ||
| """List report runs (history) for this report.""" | ||
| report = self.get_object() | ||
| runs = EvaluationReportRun.objects.filter(report=report).order_by("-created_at")[:50] | ||
| serializer = EvaluationReportRunSerializer(runs, many=True) | ||
| return Response(serializer.data) | ||
|
|
||
| @extend_schema(request=None, responses={202: None}) | ||
| @action(detail=True, methods=["post"], url_path="generate") | ||
| @llma_track_latency("llma_evaluation_report_generate") | ||
| def generate(self, request: Request, **kwargs) -> Response: | ||
| """Trigger immediate report generation.""" | ||
| report = self.get_object() | ||
|
|
||
| try: | ||
| from posthog.temporal.common.client import sync_connect | ||
| from posthog.temporal.llm_analytics.eval_reports.constants import GENERATE_EVAL_REPORT_WORKFLOW_NAME | ||
| from posthog.temporal.llm_analytics.eval_reports.types import GenerateAndDeliverEvalReportWorkflowInput | ||
andrewm4894 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| client = sync_connect() | ||
| async_to_sync(client.start_workflow)( | ||
| GENERATE_EVAL_REPORT_WORKFLOW_NAME, | ||
| GenerateAndDeliverEvalReportWorkflowInput(report_id=str(report.id), manual=True), | ||
| id=f"eval-report-manual-{report.id}-{dt.datetime.now(tz=dt.UTC).timestamp():.0f}", | ||
| task_queue=settings.GENERAL_PURPOSE_TASK_QUEUE, | ||
| ) | ||
andrewm4894 marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| except Exception: | ||
| logger.exception("Failed to trigger evaluation report generation", report_id=str(report.id)) | ||
| return Response( | ||
| {"error": "Failed to trigger report generation"}, | ||
| status=status.HTTP_500_INTERNAL_SERVER_ERROR, | ||
| ) | ||
|
|
||
| return Response(status=status.HTTP_202_ACCEPTED) | ||
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.