feat(slack-bot): bill slack-initiated posthog code tasks under AI credits#59040
Conversation
|
🎭 Playwright didn't run on this PR — your changes touch code that could affect E2E behavior, but Playwright is opt-in via label now to keep CI cost down. Add the Most PRs don't need this. Real regressions still get caught on master and fix-forward. |
Prompt To Fix All With AIFix the following 2 code review issues. Work through them one at a time, proposing concise fixes.
---
### Issue 1 of 2
services/llm-gateway/tests/callbacks/test_posthog.py:350-374
The failure-path test only covers `slack_app`, while the success-path equivalent is parametrized over both `slack_app` and `slack_app_routing`. The team rule is to always prefer parametrized tests, and both products have `billable=True`, so both should be exercised here.
```suggestion
@pytest.mark.asyncio
@pytest.mark.parametrize("product", ["slack_app", "slack_app_routing"])
async def test_on_failure_marks_slack_products_billable(
self,
callback: PostHogCallback,
auth_user: AuthenticatedUser,
mock_posthog_client: tuple,
product: str,
) -> None:
_, mock_client = mock_posthog_client
kwargs = {
"standard_logging_object": {
"model": "claude-sonnet-4-6",
"custom_llm_provider": "anthropic",
"error_str": "boom",
},
"litellm_params": {},
}
with (
patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user),
patch("llm_gateway.callbacks.posthog.get_product", return_value=product),
):
await callback._on_failure(kwargs, None, 0.0, 1.0, end_user_id=None)
props = mock_client.capture.call_args.kwargs["properties"]
assert props["$ai_billable"] is True
```
### Issue 2 of 2
services/llm-gateway/src/llm_gateway/rate_limiting/billable_credits_throttle.py:55-66
`self._now()` is called twice: once for the limit check and again for `retry_after`. If real-world clock time advances between calls, the computed `retry_after` will be slightly shorter than intended, and in a pathological edge case where `score` is within a fraction of a second of `now`, the first call could evaluate `score > now` (block) while the second call makes `score - now` negative (before `max` saves it). Capturing the timestamp once is safer.
```suggestion
score = await self._redis.zscore(_AI_CREDITS_LIMIT_KEY, context.user.team_api_token)
now = self._now()
if score is None or score <= now:
return ThrottleResult.allow()
return ThrottleResult.deny(
detail=(
"Your team has used its monthly PostHog AI credits. "
"Top up at https://us.posthog.com/organization/billing to continue."
),
scope=self.scope,
retry_after=max(int(score - now), 1),
)
```
Reviews (1): Last reviewed commit: "feat(llm-gateway): gate billable product..." | Re-trigger Greptile |
| @pytest.mark.asyncio | ||
| async def test_on_failure_marks_slack_app_billable( | ||
| self, | ||
| callback: PostHogCallback, | ||
| auth_user: AuthenticatedUser, | ||
| mock_posthog_client: tuple, | ||
| ) -> None: | ||
| _, mock_client = mock_posthog_client | ||
| kwargs = { | ||
| "standard_logging_object": { | ||
| "model": "claude-sonnet-4-6", | ||
| "custom_llm_provider": "anthropic", | ||
| "error_str": "boom", | ||
| }, | ||
| "litellm_params": {}, | ||
| } | ||
|
|
||
| with ( | ||
| patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user), | ||
| patch("llm_gateway.callbacks.posthog.get_product", return_value="slack_app"), | ||
| ): | ||
| await callback._on_failure(kwargs, None, 0.0, 1.0, end_user_id=None) | ||
|
|
||
| props = mock_client.capture.call_args.kwargs["properties"] | ||
| assert props["$ai_billable"] is True |
There was a problem hiding this comment.
The failure-path test only covers
slack_app, while the success-path equivalent is parametrized over both slack_app and slack_app_routing. The team rule is to always prefer parametrized tests, and both products have billable=True, so both should be exercised here.
| @pytest.mark.asyncio | |
| async def test_on_failure_marks_slack_app_billable( | |
| self, | |
| callback: PostHogCallback, | |
| auth_user: AuthenticatedUser, | |
| mock_posthog_client: tuple, | |
| ) -> None: | |
| _, mock_client = mock_posthog_client | |
| kwargs = { | |
| "standard_logging_object": { | |
| "model": "claude-sonnet-4-6", | |
| "custom_llm_provider": "anthropic", | |
| "error_str": "boom", | |
| }, | |
| "litellm_params": {}, | |
| } | |
| with ( | |
| patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user), | |
| patch("llm_gateway.callbacks.posthog.get_product", return_value="slack_app"), | |
| ): | |
| await callback._on_failure(kwargs, None, 0.0, 1.0, end_user_id=None) | |
| props = mock_client.capture.call_args.kwargs["properties"] | |
| assert props["$ai_billable"] is True | |
| @pytest.mark.asyncio | |
| @pytest.mark.parametrize("product", ["slack_app", "slack_app_routing"]) | |
| async def test_on_failure_marks_slack_products_billable( | |
| self, | |
| callback: PostHogCallback, | |
| auth_user: AuthenticatedUser, | |
| mock_posthog_client: tuple, | |
| product: str, | |
| ) -> None: | |
| _, mock_client = mock_posthog_client | |
| kwargs = { | |
| "standard_logging_object": { | |
| "model": "claude-sonnet-4-6", | |
| "custom_llm_provider": "anthropic", | |
| "error_str": "boom", | |
| }, | |
| "litellm_params": {}, | |
| } | |
| with ( | |
| patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user), | |
| patch("llm_gateway.callbacks.posthog.get_product", return_value=product), | |
| ): | |
| await callback._on_failure(kwargs, None, 0.0, 1.0, end_user_id=None) | |
| props = mock_client.capture.call_args.kwargs["properties"] | |
| assert props["$ai_billable"] is True |
Prompt To Fix With AI
This is a comment left during a code review.
Path: services/llm-gateway/tests/callbacks/test_posthog.py
Line: 350-374
Comment:
The failure-path test only covers `slack_app`, while the success-path equivalent is parametrized over both `slack_app` and `slack_app_routing`. The team rule is to always prefer parametrized tests, and both products have `billable=True`, so both should be exercised here.
```suggestion
@pytest.mark.asyncio
@pytest.mark.parametrize("product", ["slack_app", "slack_app_routing"])
async def test_on_failure_marks_slack_products_billable(
self,
callback: PostHogCallback,
auth_user: AuthenticatedUser,
mock_posthog_client: tuple,
product: str,
) -> None:
_, mock_client = mock_posthog_client
kwargs = {
"standard_logging_object": {
"model": "claude-sonnet-4-6",
"custom_llm_provider": "anthropic",
"error_str": "boom",
},
"litellm_params": {},
}
with (
patch("llm_gateway.callbacks.posthog.get_auth_user", return_value=auth_user),
patch("llm_gateway.callbacks.posthog.get_product", return_value=product),
):
await callback._on_failure(kwargs, None, 0.0, 1.0, end_user_id=None)
props = mock_client.capture.call_args.kwargs["properties"]
assert props["$ai_billable"] is True
```
How can I resolve this? If you propose a fix, please make it concise.Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!
PR overviewTwo security issues remain open, both allowing over-quota callers to bypass AI-credit throttling for billable LLM gateway products such as Slack app routing. The bypasses are reachable through supported authentication paths or authorization failures, so attacker action is confirmed but the impact is limited to quota and billing enforcement rather than broader data access or code execution. One issue has already been addressed, indicating the PR is moving in the right direction but still needs quota enforcement fixes before it is secure. Open issues (2)
Fixed/addressed: 1 · PR risk: 6/10 |
joshsny
left a comment
There was a problem hiding this comment.
django & the gateway don't share a redis instance, so you'll need to go a different route here :)
| class BillableCreditThrottle(Throttle): | ||
| """Gate billable-product LLM calls on the team's AI credits balance. | ||
|
|
||
| Reads the same Redis sorted set Django populates via |
There was a problem hiding this comment.
these are different redis instances, so this approach won't work - instead there should be an endpoint already on the gateway that posthog code uses to check if folks are limited, we should use the same endpoint (we might need to make it more general, should be something like /limits that should expose a users limits based on their OAuth token, or if a personal api key is provided they can get limits based on the user they specify, so long as they have access to that user)
dc3e14e to
bc7ff29
Compare
|
@joshsny updated • new django endpoint |
| headers={"Authorization": auth_header}, | ||
| timeout=2.0, | ||
| ) | ||
| if resp.status_code >= 400: |
There was a problem hiding this comment.
Medium: Quota bypass on authorization failures
/api/projects/{team_id}/quota_limits/ requires project:read, but LLM gateway authentication only requires llm_gateway:read. An over-quota user can call a billable API-key product such as slack_app_routing with a normal llm_gateway:read key; Django returns 403 here, this branch caches limited=False, and BillableCreditThrottle allows the LLM call. Make the quota endpoint/resolver compatible with the gateway scope, or fail closed for authorization failures instead of treating them as unlimited.
|
Size Change: 0 B Total Size: 80.2 MB ℹ️ View Unchanged
|
joshsny
left a comment
There was a problem hiding this comment.
LGTM, some things to fix but then we can get this in 🚢
| def list(self, request: Request, *args: Any, **kwargs: Any) -> Response: | ||
| limited = { | ||
| QuotaResource.AI_CREDITS.value: { | ||
| "limited": is_team_limited( |
There was a problem hiding this comment.
what does this function call? is it a clickhouse query, or hitting redis? (I think redis in which case great)
| ), | ||
| responses={200: QuotaLimitsResponseSerializer}, | ||
| ) | ||
| def list(self, request: Request, *args: Any, **kwargs: Any) -> Response: |
There was a problem hiding this comment.
this should be general given the endpoints registration, lets show them quota limiting status for all products?
There was a problem hiding this comment.
this is technically only data that should be available to admins since it's directed off of billing data, but given we're just exposing is limited or not this seems fine to be available to anyone with project:read imo
cc @PostHog/team-billing in case one of them disagrees, but you can assume not
|
|
||
|
|
||
| @activity.defn | ||
| def enforce_posthog_code_billing_quota_activity( |
There was a problem hiding this comment.
we should pull activities into their own file together with their input / output types as a general rule of thumb
There was a problem hiding this comment.
yeah, don't like it either, but it's going to be a bigger change, planning to do it as follow up and move all activities to own file(s)
| return slack_user_id | ||
|
|
||
|
|
||
| def _refuse_mention_if_team_over_quota( |
There was a problem hiding this comment.
is this a job of the workflow or the API, it should be a job of one of those things, but not both (probably the workflow)?
There was a problem hiding this comment.
don't feel super strong about this one, so happy to do one of those, but why not both?
| # Cache window for the fail-open path (4xx from Django, e.g. expired token). | ||
| # Short enough that a fixed-up token recovers quickly; long enough to keep a | ||
| # misconfigured client off Django's neck during an auth-failure storm. | ||
| _FAIL_OPEN_CACHE_TTL_SECONDS = 5 |
There was a problem hiding this comment.
we can probs fail open for longer, say 60 seconds
|
|
||
| posthog_api_base_url: str = "https://us.posthog.com" | ||
| plan_cache_ttl: int = 900 # 15 minutes | ||
| # AI credits quota state is fast-moving — Django itself caches the underlying |
There was a problem hiding this comment.
we should check this with billing, but we used to quota limit on an hourly schedule, not sure if that's still the case, in any case we can bump the ttl here to 5 mins, we are okay if teams go slightly over their limit
| end_user_id = await _extract_end_user_id_from_body(request) | ||
|
|
||
| plan_info = await resolve_plan_info(request, user.user_id, product) | ||
| # Plan + quota are independent Django roundtrips on cache miss — overlap them. |
There was a problem hiding this comment.
this logic is in two places, can we pull into a function?
| return cached | ||
|
|
||
| try: | ||
| status, ttl = await self._fetch(resource_key, team_id, auth_header) |
There was a problem hiding this comment.
we should do exponential backoff here up to the fail open ttl and then fail closed
eb4f9d5 to
96084d4
Compare
| """Resolve the team's AI credits quota state, falling open on errors.""" | ||
| if team_id is None: | ||
| return QuotaResourceStatus(limited=False) | ||
| auth_header = request.headers.get("Authorization", "") |
There was a problem hiding this comment.
Medium: AI-credit quota bypass with X-API-Key auth
AuthService accepts API keys from X-API-Key, but this resolver treats requests without Authorization as unlimited. An over-quota caller can use the supported X-API-Key header against a billable API-key product such as slack_app_routing and avoid the AI-credit throttle.
| auth_header = request.headers.get("Authorization", "") | |
| auth_header = request.headers.get("Authorization", "") | |
| if not auth_header and (api_key := request.headers.get("x-api-key")): | |
| auth_header = f"Bearer {api_key}" |
…edge Adds BillableCreditThrottle that, for any product with ProductConfig.billable=True, ZSCOREs the team's API token against @posthog/quota-limits/ai_credits — the same Redis set Django's daily aggregator writes to. Over-quota teams get a 429 with a friendly denial before the request reaches the upstream LLM provider. Closes the cross-surface gap for slack-origin tasks: PostHog Code desktop replies bypass the Temporal pre-flight, but cannot bypass the gateway. Plumbs team_api_token through AuthenticatedUser (LEFT JOIN posthog_team in both authenticators) so the throttle has the key it needs to look up Redis. Fails open when Redis or team_api_token is missing — matches the rest of the throttle chain. A startup warning surfaces when Redis isn't configured so we notice silent disablement in production.
The existing gate fires deep inside `create_posthog_code_task_for_repo_activity`, after the classifier has already been called against the llm-gateway. For over-quota teams the classifier 429s and the bot's catch-all error handler steers it into the repo-picker, posting in the wrong thread. Two new entry points: - Webhook: refuse `app_mention` before `start_workflow` so no Temporal execution is created. Saves Slack roundtrips and a billable classifier call. - Workflow: new `enforce_posthog_code_billing_quota_activity`, first call in `PostHogCodeSlackMentionWorkflow.run()`, guarded by `workflow.patched( "posthog-code-slack-billing-gate")` so in-flight workflows stay deterministic. Existing in-activity gates stay as defense in depth (and to keep the direct-activity unit tests passing).
1cc02bf to
51f0a58
Compare
51f0a58 to
9d08641
Compare
Problem
PostHog Code Slack-bot task processing has no metering or quota attribution today. The agent the bot runs in the sandbox calls the LLM gateway as
posthog_code, so slack-initiated traffic rides desktop's plan-tier path instead of being usage-billed.Wire slack-initiated runs into the existing PHAI billing pipeline:
$ai_billable→$ai_generation→ dailyusage_report.pyaggregator →QuotaResource.AI_CREDITS. Pairs with PostHog/code#2225, which switches the agent to sendslack_appas the gateway product based on the task'sorigin_product.Changes
billableflag onProductConfig.slack_app(agent) andslack_app_routing(the haiku classifier, renamed fromslack-posthog-codewith aliases for back-compat) are marked billable. The PostHog callback tags their$ai_generationevents with$ai_billable=true.is_team_limitedcheck on bothcreate_posthog_code_task_for_repo_activityandforward_posthog_code_followup_activity. Posts a friendly denial in-thread and bails before any side effects.BillableCreditThrottlekeyed onProductConfig.billable. ZSCOREs the team's API token against the same Redis set Django writes to. Returns 429 to over-quota teams. Closes the cross-surface gap — PostHog Code desktop replies to slack-origin tasks bypass the Temporal pre-flight but cannot bypass the gateway.team_api_tokenis plumbed throughAuthenticatedUserto make this possible. Fails open if Redis is unreachable.How did you test this code?
Automated
uv run pytest services/llm-gateway/tests/test_product_config.py services/llm-gateway/tests/callbacks/test_posthog.py services/llm-gateway/tests/test_billable_credits_throttle.py— covers product registration,$ai_billabletagging, alias resolution, and the 6 throttle branches (non-billable, not limited, limit expired, limit active, no Redis, no team token).hogli test products/slack_app/backend/tests/test_followup_forwarding.py— quota-blocked task creation and quota-blocked follow-up.Manual end-to-end against local dev
Triggered
@posthogin Slack and confirmed the gateway emits two$ai_generationevents tagged$ai_billable=true— one forslack_app(the agent) and one forslack_app_routing(the classifier). Verified via HogQL on team 2:Simulated an over-quota team by writing directly to the Redis set the daily aggregator populates:
Confirmed:
@posthogmention short-circuits increate_posthog_code_task_for_repo_activity— no sandbox provisioned.forward_posthog_code_followup_activity— no message forwarded.Removed the limit (
ZREM) and re-mentioned — back to normal.Showcase
Deployment order
Deploy this PR first, then PostHog/code#2225. The agent in #2225 starts calling the gateway as
slack_app, which doesn't exist as a registered product until this PR lands. Reversed, slack-initiated mentions would 400.Publish to changelog?
no
Docs update
No docs touched.
🤖 Agent context
Agent-authored with Claude Code, supervised.
billablelives onProductConfig(one place to flip per product) and the gateway throttle keys off it (forward-compat for any future billable product). Pre-flight in Temporal stays in place as belt-and-suspenders — it saves sandbox provisioning when already over quota; the throttle catches anything that gets past it.