Skip to content

Commit 5052d89

Browse files
authored
feat(webapp,core): add a public HTTP API for errors (#4005)
## Summary Adds an environment-scoped HTTP API over the Errors feature, mirroring the runs API. Task-run failures are grouped by a fingerprint into "error groups," and this exposes everything you can do with them in the dashboard: - `GET /api/v1/errors` lists error groups, with `filter[taskIdentifier]`, `filter[version]`, `filter[status]` (`unresolved`/`resolved`/`ignored`), `filter[search]`, a time range, and cursor pagination. - `GET /api/v1/errors/{errorId}` retrieves a single group (summary, lifecycle state, affected versions). - `POST /api/v1/errors/{errorId}/{resolve,ignore,unresolve}` changes its state. - `GET /api/v1/runs?filter[error]={errorId}` lists the runs behind a group. Request and response schemas are exported from `@trigger.dev/core/v3` so the SDK can reuse them, and all endpoints are documented in the API reference (OpenAPI). `errorId` is the `error_<fingerprint>` friendly id. ## Attribution State changes record who made them. A plain environment API key has no user, so `resolvedBy`/`ignoredByUserId` stay null. When the caller uses an environment JWT obtained by exchanging a personal access token or a delegated user token at `POST /api/v1/projects/:ref/:env/jwt`, that exchange now stamps an `act` delegation claim, and the write endpoints read `act.sub` to attribute the change to the acting user. This is the first endpoint to consume the `act` claim, so two small pieces of plumbing ride along: the exchange stamps `act` for personal-access-token subjects too (it was delegated-token-only), and the public-JWT bearer-auth path surfaces `act.sub` to the handler. Built on the delegated-token work in #3997.
1 parent 135c7e9 commit 5052d89

25 files changed

Lines changed: 1384 additions & 8 deletions

.changeset/errors-api-schemas.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@trigger.dev/core": patch
3+
---
4+
5+
Add request and response schemas for the new Errors API (error groups). These back the env-scoped HTTP endpoints for listing error groups, retrieving a single group, and changing its state (resolve, ignore, unresolve), plus a `filter[error]` option on the runs list to fetch the runs behind a group. Exported from `@trigger.dev/core/v3` so the SDK can reuse them.
Lines changed: 199 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,199 @@
1+
---
2+
name: errors-api-e2e
3+
description: End-to-end smoke test for the public Errors HTTP API (error groups). Seeds failed runs into ClickHouse so the error materialized views populate, then drives the real endpoints against the running webapp — list (with filters + pagination), retrieve, resolve/ignore/unresolve, the `filter[error]` runs filter, user attribution via the `trigger.dev mint-token` -> JWT exchange, and the 401/403/404 negatives. Use for "smoke test the errors API", "test the errors API e2e", "prove the errors endpoints work", or to re-verify after changes.
4+
allowed-tools: Read, Bash
5+
---
6+
7+
# Errors API — end-to-end smoke test
8+
9+
Proves the public Errors API against the **running** webapp with real HTTP. No
10+
mocks. The error data plane is ClickHouse (`errors_v1` + `error_occurrences_v1`,
11+
both materialized-view-fed from `task_runs_v2`) plus Postgres `ErrorGroupState`
12+
for lifecycle status; this skill seeds straight into `task_runs_v2` and lets the
13+
MVs do the rest.
14+
15+
Code under test:
16+
- `apps/webapp/app/routes/api.v1.errors.ts``GET /api/v1/errors` (list).
17+
- `apps/webapp/app/routes/api.v1.errors.$errorId.ts``GET /api/v1/errors/:errorId` (detail).
18+
- `apps/webapp/app/routes/api.v1.errors.$errorId.{resolve,ignore,unresolve}.ts` — state actions.
19+
- `apps/webapp/app/presenters/v3/ApiErrorListPresenter.server.ts` / `ApiErrorGroupPresenter.server.ts`.
20+
- `apps/webapp/app/presenters/v3/ApiRunListPresenter.server.ts` — the `filter[error]` addition on `GET /api/v1/runs`.
21+
- `apps/webapp/app/v3/services/errorGroupActions.server.ts` — resolve/ignore/unresolve (nullable `userId`).
22+
- Attribution: `api.v1.projects.$projectRef.$env.jwt.ts` stamps `act:{sub}` for PAT **and** UAT exchanges; `@trigger.dev/rbac` surfaces `act.sub` through bearer auth; the action handlers read `authentication.actor?.sub`.
23+
24+
`errorId` is `error_<fingerprint>` (round-trips via `ErrorId` in `@trigger.dev/core/v3/isomorphic`).
25+
26+
## Prerequisites
27+
28+
- Webapp running on http://localhost:3030 (`pnpm run dev --filter webapp`). Confirm `curl -s http://localhost:3030/healthcheck`.
29+
- DB seeded (`pnpm run db:seed`), and a local ClickHouse reachable at `CLICKHOUSE_URL` (the `pnpm run docker` stack).
30+
- The CLI built + logged in to localhost:3030 (`pnpm run build --filter trigger.dev`; profile `default` points at localhost:3030). Needed only for the attribution leg.
31+
32+
> Important wiring facts the seed relies on (verified):
33+
> - The MVs read the error type/message from `error.data.*`, so the seeded
34+
> `error` JSON column **must** be wrapped: `{"data": {"type": ..., "message": ..., "stack": ...}}`.
35+
> - The MVs only fire for failed statuses: `SYSTEM_FAILURE | CRASHED | INTERRUPTED | COMPLETED_WITH_ERRORS | TIMED_OUT`, and require a non-empty `error_fingerprint`.
36+
> - `GET /api/v1/runs` lists run **ids** from ClickHouse but **hydrates from Postgres** `TaskRun`. So the error-list/detail/action legs work from a ClickHouse-only seed, but the `filter[error]` leg needs a **paired** Postgres `TaskRun` row whose `id` equals the ClickHouse `run_id`.
37+
38+
Run everything from the repo root in one shell. Invoke the built CLI via a
39+
function (a `CLI="node …"` variable won't word-split under zsh):
40+
```bash
41+
cli() { node packages/cli-v3/dist/esm/index.js "$@"; }
42+
PROFILE=default
43+
```
44+
45+
## Setup — resolve a dev environment + connection strings
46+
47+
```bash
48+
cd apps/webapp
49+
CHURL=$(grep -E "^CLICKHOUSE_URL=" .env | head -1 | cut -d= -f2- | tr -d '"')
50+
DBURL=$(grep -E "^DATABASE_URL=" .env | head -1 | cut -d= -f2- | tr -d '"' | tr -d "'" | sed 's/?.*//')
51+
52+
# Pick the seeded hello-world dev env (proj_rrkpdguyagvsoktglnod). Adjust the
53+
# WHERE if you want a different project.
54+
read ENV ORG PROJ REF < <(psql "$DBURL" -t -A -F' ' -c "
55+
SELECT re.id, re.\"organizationId\", re.\"projectId\", p.\"externalRef\"
56+
FROM \"RuntimeEnvironment\" re
57+
JOIN \"Project\" p ON p.id = re.\"projectId\"
58+
WHERE re.slug='dev' AND p.\"externalRef\"='proj_rrkpdguyagvsoktglnod' LIMIT 1;")
59+
APIKEY=$(psql "$DBURL" -t -A -c "SELECT \"apiKey\" FROM \"RuntimeEnvironment\" WHERE id='$ENV';")
60+
cd ..
61+
H="Authorization: Bearer $APIKEY"
62+
B="http://localhost:3030"
63+
```
64+
65+
## Steps
66+
67+
### 1. Seed two error groups (ClickHouse, MV-fed)
68+
69+
```bash
70+
RUN=$(node -e 'console.log(Date.now().toString(36))')
71+
TASK="errors-api-e2e-$RUN"; FP_A="fpA${RUN}"; FP_B="fpB${RUN}"
72+
ERRID_A="error_$FP_A"; ERRID_B="error_$FP_B"
73+
NOW_CH=$(node -e 'console.log(new Date().toISOString().replace("T"," ").replace("Z","").slice(0,23))')
74+
NOW_MS=$(node -e 'console.log(Date.now())')
75+
Q=$(python3 -c "import urllib.parse;print(urllib.parse.quote('INSERT INTO trigger_dev.task_runs_v2 FORMAT JSONEachRow'))")
76+
77+
mkrow() { # status fingerprint errorType message runId
78+
echo "{\"environment_id\":\"$ENV\",\"organization_id\":\"$ORG\",\"project_id\":\"$PROJ\",\"run_id\":\"$5\",\"friendly_id\":\"run_$5\",\"status\":\"$1\",\"environment_type\":\"DEVELOPMENT\",\"engine\":\"V2\",\"task_identifier\":\"$TASK\",\"created_at\":\"$NOW_CH\",\"updated_at\":\"$NOW_CH\",\"error\":{\"data\":{\"type\":\"$3\",\"message\":\"$4\",\"stack\":\"at x (a.ts:1:1)\"}},\"error_fingerprint\":\"$2\",\"task_version\":\"20240101.1\",\"_version\":\"$NOW_MS\",\"_is_deleted\":0}"
79+
}
80+
ROWS="$(mkrow COMPLETED_WITH_ERRORS $FP_A AlphaBoom 'alpha boom happened' r_a1_$RUN)
81+
$(mkrow COMPLETED_WITH_ERRORS $FP_A AlphaBoom 'alpha boom happened' r_a2_$RUN)
82+
$(mkrow CRASHED $FP_B BetaCrash 'beta crash happened' r_b1_$RUN)"
83+
printf '%s' "$ROWS" | curl -s "$CHURL/?query=$Q" --data-binary @-
84+
85+
# Poll until both fingerprints appear in errors_v1 (the MV is near-instant locally).
86+
for i in $(seq 1 10); do
87+
N=$(curl -s "$CHURL" --data-binary "SELECT count() FROM (SELECT 1 FROM trigger_dev.errors_v1 WHERE environment_id='$ENV' AND error_fingerprint IN ('$FP_A','$FP_B') GROUP BY error_fingerprint)")
88+
[ "$N" = "2" ] && break; sleep 1
89+
done
90+
echo "seeded fingerprints in errors_v1: $N (want 2)"
91+
```
92+
PASS: `N = 2`. Alpha has 2 occurrences, beta 1.
93+
94+
### 2. List + filters + pagination
95+
96+
```bash
97+
curl -s "$B/api/v1/errors?filter%5BtaskIdentifier%5D=$TASK&filter%5Bperiod%5D=1d" -H "$H" \
98+
| python3 -c "import sys,json;d=json.load(sys.stdin);print('count',len(d['data']),[(e['id'],e['status'],e['count']) for e in d['data']])"
99+
```
100+
PASS: 2 groups, both `status=unresolved`, alpha `count=2`, beta `count=1`, ids `error_<fp>`.
101+
102+
Assert each filter narrows correctly (each should return the noted shape):
103+
```bash
104+
curl -s "$B/api/v1/errors?filter%5BtaskIdentifier%5D=$TASK&filter%5Bstatus%5D=unresolved&filter%5Bperiod%5D=1d" -H "$H" | python3 -c "import sys,json;print('unresolved:',len(json.load(sys.stdin)['data']))" # 2
105+
curl -s "$B/api/v1/errors?filter%5BtaskIdentifier%5D=$TASK&filter%5Bsearch%5D=AlphaBoom&filter%5Bperiod%5D=1d" -H "$H" | python3 -c "import sys,json;print('search:',[e['errorType'] for e in json.load(sys.stdin)['data']])" # ['AlphaBoom']
106+
curl -s "$B/api/v1/errors?filter%5BtaskIdentifier%5D=$TASK&filter%5Bperiod%5D=1d&page%5Bsize%5D=1" -H "$H" | python3 -c "import sys,json;d=json.load(sys.stdin);print('page size 1:',len(d['data']),'next?',bool(d['pagination'].get('next')))" # 1 / True
107+
```
108+
PASS: `unresolved: 2`, `search: ['AlphaBoom']`, `page size 1: 1 / next? True`.
109+
110+
### 3. Retrieve detail
111+
112+
```bash
113+
curl -s "$B/api/v1/errors/$ERRID_A" -H "$H" \
114+
| python3 -c "import sys,json;d=json.load(sys.stdin);print(d['id'],d['errorType'],d['status'],d['count'],d['affectedVersions'],d['resolvedBy'])"
115+
```
116+
PASS: `error_<fpA> AlphaBoom unresolved 2 ['20240101.1'] None`.
117+
118+
### 4. Resolve / ignore / unresolve (env API key — `resolvedBy` null)
119+
120+
```bash
121+
st(){ python3 -c "import sys,json;d=json.load(sys.stdin);print('status',d['status'],'| resolvedInVersion',d['resolvedInVersion'],'| resolvedBy',d['resolvedBy'],'| ignoredUntil',bool(d['ignoredUntil']),'| reason',d['ignoredReason'])"; }
122+
123+
curl -s -X POST "$B/api/v1/errors/$ERRID_A/resolve" -H "$H" -H 'Content-Type: application/json' -d '{"resolvedInVersion":"20240101.1"}' >/dev/null
124+
curl -s "$B/api/v1/errors/$ERRID_A" -H "$H" | st # status resolved | resolvedInVersion 20240101.1 | resolvedBy None
125+
126+
curl -s -X POST "$B/api/v1/errors/$ERRID_B/ignore" -H "$H" -H 'Content-Type: application/json' -d '{"duration":3600000,"reason":"known flake"}' >/dev/null
127+
curl -s "$B/api/v1/errors/$ERRID_B" -H "$H" | st # status ignored | ignoredUntil True | reason known flake
128+
129+
curl -s -X POST "$B/api/v1/errors/$ERRID_A/unresolve" -H "$H" >/dev/null
130+
curl -s "$B/api/v1/errors/$ERRID_A" -H "$H" | st # status unresolved
131+
```
132+
PASS: each transition reflected; `filter[status]=ignored` returns only beta:
133+
```bash
134+
curl -s "$B/api/v1/errors?filter%5BtaskIdentifier%5D=$TASK&filter%5Bstatus%5D=ignored&filter%5Bperiod%5D=1d" -H "$H" | python3 -c "import sys,json;print([e['id'] for e in json.load(sys.stdin)['data']])" # [error_<fpB>]
135+
```
136+
137+
### 5. `filter[error]` on the runs list (paired PG + CH seed)
138+
139+
The runs list hydrates from Postgres, so seed a matching `TaskRun` row + a CH row
140+
that share `run_id`/`id` and carry a fingerprint:
141+
```bash
142+
RID="re2e${RUN}"; FRID="run_${RID}"; FP_R="fpR${RUN}"
143+
psql "$DBURL" -v ON_ERROR_STOP=1 -c "
144+
INSERT INTO \"TaskRun\" (id, \"friendlyId\", \"taskIdentifier\", payload, \"traceId\", \"spanId\", \"runtimeEnvironmentId\", \"projectId\", queue, status, \"createdAt\", \"updatedAt\")
145+
VALUES ('$RID','$FRID','$TASK','{}','trace_$RID','span_$RID','$ENV','$PROJ','task/$TASK','COMPLETED_WITH_ERRORS', now(), now())
146+
ON CONFLICT (id) DO NOTHING;" >/dev/null
147+
ROW="{\"environment_id\":\"$ENV\",\"organization_id\":\"$ORG\",\"project_id\":\"$PROJ\",\"run_id\":\"$RID\",\"friendly_id\":\"$FRID\",\"status\":\"COMPLETED_WITH_ERRORS\",\"environment_type\":\"DEVELOPMENT\",\"engine\":\"V2\",\"task_identifier\":\"$TASK\",\"created_at\":\"$NOW_CH\",\"updated_at\":\"$NOW_CH\",\"error\":{\"data\":{\"type\":\"RunsFilterErr\",\"message\":\"for runs filter\",\"stack\":\"at x\"}},\"error_fingerprint\":\"$FP_R\",\"task_version\":\"20240101.1\",\"_version\":\"$NOW_MS\",\"_is_deleted\":0}"
148+
printf '%s' "$ROW" | curl -s "$CHURL/?query=$Q" --data-binary @-
149+
sleep 1
150+
curl -s "$B/api/v1/runs?filter%5Berror%5D=error_$FP_R" -H "$H" | python3 -c "import sys,json;d=json.load(sys.stdin);print('runs:',[r['id'] for r in d['data']])"
151+
```
152+
PASS: one run, `run_<RID>` (status maps to `FAILED`). Proves `filter[error]` -> fingerprint -> CH -> PG hydration.
153+
154+
### 6. Attribution — `mint-token` -> JWT exchange records the acting user
155+
156+
```bash
157+
TOKEN=$(cli mint-token --profile $PROFILE --client errors-api-e2e 2>/dev/null) # UAT
158+
ENVJWT=$(curl -sS -X POST "$B/api/v1/projects/$REF/dev/jwt" -H "Authorization: Bearer $TOKEN" \
159+
-H 'Content-Type: application/json' -d '{"claims":{"scopes":["read:errors","write:errors"]}}' \
160+
| python3 -c "import sys,json;print(json.load(sys.stdin)['token'])")
161+
# Decoded env JWT carries act.sub = the user id.
162+
node -e 'const p=JSON.parse(Buffer.from(process.argv[1].split(".")[1],"base64url").toString());console.log("act:",JSON.stringify(p.act))' "$ENVJWT"
163+
164+
curl -s -X POST "$B/api/v1/errors/$ERRID_A/resolve" -H "Authorization: Bearer $ENVJWT" \
165+
-H 'Content-Type: application/json' -d '{"resolvedInVersion":"20240101.2"}' >/dev/null
166+
curl -s "$B/api/v1/errors/$ERRID_A" -H "$H" | python3 -c "import sys,json;d=json.load(sys.stdin);print('resolvedBy:',d['resolvedBy'])"
167+
```
168+
PASS: `act.sub` is the user id (matches `cli whoami`), and `detail.resolvedBy` equals that user id (not null). A plain env key leaves it null (step 4). A **PAT** exchanged the same way also stamps `act` — repeat with the stored PAT to confirm `ignoredByUserId` attribution.
169+
170+
### 7. Negatives
171+
172+
```bash
173+
curl -s -o /dev/null -w 'unknown id: %{http_code} (404)\n' "$B/api/v1/errors/error_doesnotexist0000" -H "$H"
174+
curl -s -o /dev/null -w 'no auth list: %{http_code} (401)\n' "$B/api/v1/errors"
175+
curl -s -o /dev/null -w 'no auth resolve: %{http_code} (401)\n' -X POST "$B/api/v1/errors/$ERRID_B/resolve" -H 'Content-Type: application/json' -d '{}'
176+
177+
# read-only JWT must be denied on write, allowed on read
178+
READJWT=$(curl -sS -X POST "$B/api/v1/projects/$REF/dev/jwt" -H "Authorization: Bearer $TOKEN" \
179+
-H 'Content-Type: application/json' -d '{"claims":{"scopes":["read:errors"]}}' | python3 -c "import sys,json;print(json.load(sys.stdin)['token'])")
180+
curl -s -o /dev/null -w 'read JWT write: %{http_code} (403)\n' -X POST "$B/api/v1/errors/$ERRID_B/resolve" -H "Authorization: Bearer $READJWT" -H 'Content-Type: application/json' -d '{}'
181+
curl -s -o /dev/null -w 'read JWT read: %{http_code} (200)\n' "$B/api/v1/errors?filter%5BtaskIdentifier%5D=$TASK" -H "Authorization: Bearer $READJWT"
182+
```
183+
PASS: `404`, `401`, `401`, `403`, `200` respectively.
184+
185+
## Result
186+
187+
Report PASS only if: step 1 lands 2 groups in `errors_v1`; step 2's filters and
188+
pagination narrow correctly; step 3 returns the detail; step 4's resolve/ignore/
189+
unresolve flip status (and `filter[status]` follows); step 5's `filter[error]`
190+
returns the paired run; step 6 records `resolvedBy` = the acting user via the
191+
JWT exchange (null with a plain env key); and step 7 returns 404/401/401/403/200.
192+
A red leg is a bug or a missing prereq — report the exact status + body and file
193+
a Linear issue, don't tune around it.
194+
195+
## Notes / gotchas
196+
197+
- Run files use a unique `$RUN` suffix per invocation, so reruns don't collide and seeded rows stay isolated by their unique task identifier. They are local-dev test rows (90-day ClickHouse TTL); no cleanup required.
198+
- After **adding** the route files, the classic Remix dev compiler may not register them until a dev-server restart (a stale manifest returns Remix's HTML 404 on the new paths). If `POST …/resolve` returns a 404 HTML page rather than 401/200, restart `pnpm run dev --filter webapp`.
199+
- The rbac `act` extraction lives in `@trigger.dev/rbac` (a built dep). After editing it, `pnpm run build --filter @trigger.dev/rbac` and restart the webapp so the attribution leg (step 6) reflects the change.

0 commit comments

Comments
 (0)