chore(llma): Add PostHog LLM Analytics instrumentation to PR approval agent#52984
chore(llma): Add PostHog LLM Analytics instrumentation to PR approval agent#52984andrewm4894 wants to merge 6 commits into
Conversation
Adds PostHog LLM Analytics tracking to the PR approval agent so every stamphog run is visible in the LLM Analytics dashboard with cost, token usage, latency, and verdict data. - New analytics.py module with TraceRecorder that captures $ai_generation and $ai_trace events via the posthoganalytics SDK - Instrument the Reviewer to record generation-level metrics (model, tokens, cost, cache metrics, duration, num_turns) from ResultMessage - Instrument the Pipeline to emit a trace-level event on completion with aggregate stats and PR metadata (tier, verdict, gate results) - Pass STAMPHOG_POSTHOG_API_KEY and STAMPHOG_POSTHOG_HOST secrets in the GitHub Actions workflow - Analytics is a no-op when the API key is not configured https://claude.ai/code/session_01NCn85Cf1g54u29vQj5W9es
Use the standard internal project API key (sTMFPsFhdP1Ssg) by default so no new GitHub secrets are needed. Analytics is always on unless OPT_OUT_CAPTURE=1 is set. The STAMPHOG_POSTHOG_API_KEY env var still works as an override. https://claude.ai/code/session_01NCn85Cf1g54u29vQj5W9es
Prompt To Fix All With AIThis is a comment left during a code review.
Path: tools/pr-approval-agent/analytics.py
Line: 139-143
Comment:
**Analytics failures can crash the pipeline or trigger spurious retries**
The PR description says analytics is "non-blocking (analytics failures won't affect PR review)", but the code does not enforce this. `self.client.capture(...)` is called without any error handling.
- In `reviewer.py`, if `record_generation()` raises (e.g. network error, PostHog auth failure), the exception propagates through `asyncio.run(self._review(...))` and is caught by the retry loop in `_llm_review()`, causing the reviewer to be retried unnecessarily — up to 3 times — due to an analytics error, not a review failure.
- In `review_pr.py`, `_emit_trace()` is called directly in `run()` with no try/except. If it raises, the entire pipeline crashes with an unhandled exception.
To actually make analytics non-blocking, both `record_generation` and `record_trace` should swallow exceptions internally:
```python
try:
self.client.capture(
event="$ai_generation",
distinct_id=DISTINCT_ID,
properties=properties,
)
except Exception:
pass # analytics must never break the pipeline
```
The same pattern should be applied to `record_trace` and `flush`.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: tools/pr-approval-agent/analytics.py
Line: 15
Comment:
**API key hardcoded in a public repository**
The internal project API key is hardcoded as a string literal in a public open-source repository, and is also documented verbatim in `README.md`. Even if this is a write-only ingest key, committing credentials in source is against best practices — especially in a public repo where any reader can see it.
A `STAMPHOG_POSTHOG_API_KEY` env-var override already exists, so the hardcoded fallback could be removed entirely. The caller would need to supply the key via the environment variable, keeping credentials out of source.
How can I resolve this? If you propose a fix, please make it concise.
---
This is a comment left during a code review.
Path: tools/pr-approval-agent/analytics.py
Line: 73-86
Comment:
**`structured_output` parameter is accepted but never used**
The `structured_output: Any = None` parameter is declared in `record_generation`'s signature but is never referenced anywhere in the method body. This is a superfluous part — remove it to keep the interface minimal and avoid confusion about whether the parameter has any effect.
```suggestion
def record_generation(
self,
*,
model: str,
input_messages: list[dict[str, str]],
output_text: str,
usage: dict[str, Any] | None = None,
model_usage: dict[str, Any] | None = None,
duration_ms: int = 0,
total_cost_usd: float | None = None,
num_turns: int = 0,
stop_reason: str | None = None,
) -> None:
```
The call site in `reviewer.py` would need to drop the `structured_output=structured_output` keyword argument accordingly.
How can I resolve this? If you propose a fix, please make it concise.Reviews (1): Last reviewed commit: "chore(llma): use internal PostHog projec..." | Re-trigger Greptile |
oxfmt requires aligned table column separators. https://claude.ai/code/session_01NCn85Cf1g54u29vQj5W9es
| self.client.capture( | ||
| event="$ai_generation", | ||
| distinct_id=DISTINCT_ID, | ||
| properties=properties, | ||
| ) |
There was a problem hiding this comment.
Analytics failures can crash the pipeline or trigger spurious retries
The PR description says analytics is "non-blocking (analytics failures won't affect PR review)", but the code does not enforce this. self.client.capture(...) is called without any error handling.
- In
reviewer.py, ifrecord_generation()raises (e.g. network error, PostHog auth failure), the exception propagates throughasyncio.run(self._review(...))and is caught by the retry loop in_llm_review(), causing the reviewer to be retried unnecessarily — up to 3 times — due to an analytics error, not a review failure. - In
review_pr.py,_emit_trace()is called directly inrun()with no try/except. If it raises, the entire pipeline crashes with an unhandled exception.
To actually make analytics non-blocking, both record_generation and record_trace should swallow exceptions internally:
try:
self.client.capture(
event="$ai_generation",
distinct_id=DISTINCT_ID,
properties=properties,
)
except Exception:
pass # analytics must never break the pipelineThe same pattern should be applied to record_trace and flush.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/pr-approval-agent/analytics.py
Line: 139-143
Comment:
**Analytics failures can crash the pipeline or trigger spurious retries**
The PR description says analytics is "non-blocking (analytics failures won't affect PR review)", but the code does not enforce this. `self.client.capture(...)` is called without any error handling.
- In `reviewer.py`, if `record_generation()` raises (e.g. network error, PostHog auth failure), the exception propagates through `asyncio.run(self._review(...))` and is caught by the retry loop in `_llm_review()`, causing the reviewer to be retried unnecessarily — up to 3 times — due to an analytics error, not a review failure.
- In `review_pr.py`, `_emit_trace()` is called directly in `run()` with no try/except. If it raises, the entire pipeline crashes with an unhandled exception.
To actually make analytics non-blocking, both `record_generation` and `record_trace` should swallow exceptions internally:
```python
try:
self.client.capture(
event="$ai_generation",
distinct_id=DISTINCT_ID,
properties=properties,
)
except Exception:
pass # analytics must never break the pipeline
```
The same pattern should be applied to `record_trace` and `flush`.
How can I resolve this? If you propose a fix, please make it concise.|
|
||
| from posthoganalytics import Posthog | ||
|
|
||
| _INTERNAL_PROJECT_API_KEY = "sTMFPsFhdP1Ssg" |
There was a problem hiding this comment.
API key hardcoded in a public repository
The internal project API key is hardcoded as a string literal in a public open-source repository, and is also documented verbatim in README.md. Even if this is a write-only ingest key, committing credentials in source is against best practices — especially in a public repo where any reader can see it.
A STAMPHOG_POSTHOG_API_KEY env-var override already exists, so the hardcoded fallback could be removed entirely. The caller would need to supply the key via the environment variable, keeping credentials out of source.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/pr-approval-agent/analytics.py
Line: 15
Comment:
**API key hardcoded in a public repository**
The internal project API key is hardcoded as a string literal in a public open-source repository, and is also documented verbatim in `README.md`. Even if this is a write-only ingest key, committing credentials in source is against best practices — especially in a public repo where any reader can see it.
A `STAMPHOG_POSTHOG_API_KEY` env-var override already exists, so the hardcoded fallback could be removed entirely. The caller would need to supply the key via the environment variable, keeping credentials out of source.
How can I resolve this? If you propose a fix, please make it concise.| def record_generation( | ||
| self, | ||
| *, | ||
| model: str, | ||
| input_messages: list[dict[str, str]], | ||
| output_text: str, | ||
| usage: dict[str, Any] | None = None, | ||
| model_usage: dict[str, Any] | None = None, | ||
| duration_ms: int = 0, | ||
| total_cost_usd: float | None = None, | ||
| num_turns: int = 0, | ||
| stop_reason: str | None = None, | ||
| structured_output: Any = None, | ||
| ) -> None: |
There was a problem hiding this comment.
structured_output parameter is accepted but never used
The structured_output: Any = None parameter is declared in record_generation's signature but is never referenced anywhere in the method body. This is a superfluous part — remove it to keep the interface minimal and avoid confusion about whether the parameter has any effect.
| def record_generation( | |
| self, | |
| *, | |
| model: str, | |
| input_messages: list[dict[str, str]], | |
| output_text: str, | |
| usage: dict[str, Any] | None = None, | |
| model_usage: dict[str, Any] | None = None, | |
| duration_ms: int = 0, | |
| total_cost_usd: float | None = None, | |
| num_turns: int = 0, | |
| stop_reason: str | None = None, | |
| structured_output: Any = None, | |
| ) -> None: | |
| def record_generation( | |
| self, | |
| *, | |
| model: str, | |
| input_messages: list[dict[str, str]], | |
| output_text: str, | |
| usage: dict[str, Any] | None = None, | |
| model_usage: dict[str, Any] | None = None, | |
| duration_ms: int = 0, | |
| total_cost_usd: float | None = None, | |
| num_turns: int = 0, | |
| stop_reason: str | None = None, | |
| ) -> None: |
The call site in reviewer.py would need to drop the structured_output=structured_output keyword argument accordingly.
Prompt To Fix With AI
This is a comment left during a code review.
Path: tools/pr-approval-agent/analytics.py
Line: 73-86
Comment:
**`structured_output` parameter is accepted but never used**
The `structured_output: Any = None` parameter is declared in `record_generation`'s signature but is never referenced anywhere in the method body. This is a superfluous part — remove it to keep the interface minimal and avoid confusion about whether the parameter has any effect.
```suggestion
def record_generation(
self,
*,
model: str,
input_messages: list[dict[str, str]],
output_text: str,
usage: dict[str, Any] | None = None,
model_usage: dict[str, Any] | None = None,
duration_ms: int = 0,
total_cost_usd: float | None = None,
num_turns: int = 0,
stop_reason: str | None = None,
) -> None:
```
The call site in `reviewer.py` would need to drop the `structured_output=structured_output` keyword argument accordingly.
How can I resolve this? If you propose a fix, please make it concise.Wrap capture/flush calls in try/except so analytics errors never crash the pipeline or trigger spurious reviewer retries. Remove unused structured_output parameter from record_generation. https://claude.ai/code/session_01NCn85Cf1g54u29vQj5W9es
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: e81b374ef8
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| Uses the internal PostHog project key by default so no extra secrets | ||
| are needed. Set OPT_OUT_CAPTURE=1 to disable. | ||
| """ | ||
| if os.environ.get("OPT_OUT_CAPTURE"): |
There was a problem hiding this comment.
Check OPT_OUT_CAPTURE value before disabling capture
create_client currently turns analytics off whenever OPT_OUT_CAPTURE is set to any non-empty value, so OPT_OUT_CAPTURE=0 (a common “false” convention used in this repo’s workflows) will still disable telemetry. In environments where that variable is inherited, this silently drops all $ai_generation/$ai_trace events and defeats the instrumentation. Compare against an explicit opt-out value (for example "1") or parse a boolean string instead of relying on truthiness.
Useful? React with 👍 / 👎.
OPT_OUT_CAPTURE=0 or OPT_OUT_CAPTURE=false should not disable analytics. Check against explicit truthy values (true/yes/1) to match the convention in posthog/settings/base_variables.py. https://claude.ai/code/session_01NCn85Cf1g54u29vQj5W9es
|
Size Change: 0 B Total Size: 127 MB ℹ️ View Unchanged
|
Storybook visual regression tests failedIf your changes intentionally update UI snapshots: add the If you didn't change any UI: this is likely a flaky snapshot — wait for a fix to land on master. |
webjunkie
left a comment
There was a problem hiding this comment.
Some of the bot reviews seem worth looking into.
|
All four bot review comments have been addressed:
Generated by Claude Code |
|
Closing in favor of a cleaner approach. Instead of hand-rolling analytics in stamphog, we built a proper PostHog SDK integration for the Claude Agent SDK:
Once posthog-python#477 is released, #53008 will light up stamphog analytics automatically — no custom analytics module needed. |
Problem
The PR approval agent currently has no observability into LLM performance, costs, and behavior. We need to capture telemetry about reviewer calls and pipeline execution to understand model performance, token usage, and costs in production.
Changes
Added comprehensive LLM Analytics instrumentation using PostHog's standard events:
New
analytics.pymodule: ProvidesTraceRecorderclass to capture$ai_generationevents (per LLM call) and$ai_traceevents (pipeline completion). Uses PostHog's internal project API key by default with opt-out support viaOPT_OUT_CAPTURE=1.Updated
review_pr.py:TraceRecorderin the pipeline orchestratorReviewerfor generation-level instrumentationUpdated
reviewer.py:TraceRecorderin constructorResultMessageUpdated
README.md: Documents the LLM Analytics feature, configuration options, and captured events.Updated dependencies: Added
posthoganalyticsto the script dependencies comment.All events include custom
stamphog_*properties (PR number, author, tier, verdict, etc.) for filtering and analysis in the PostHog dashboard.How did you test this code?
The implementation follows PostHog's standard LLM Analytics event schema (
$ai_generationand$ai_trace). Code review of:client=None)No manual testing performed as this is an agent-authored change. The instrumentation is non-blocking (analytics failures won't affect PR review) and can be validated through PostHog dashboard inspection once deployed.
Publish to changelog?
No
https://claude.ai/code/session_01NCn85Cf1g54u29vQj5W9es