Skip to content

feat: UX and operations improvements batch#76

Merged
TerrifiedBug merged 19 commits intomainfrom
feat/ux-and-ops-improvements
Mar 9, 2026
Merged

feat: UX and operations improvements batch#76
TerrifiedBug merged 19 commits intomainfrom
feat/ux-and-ops-improvements

Conversation

@TerrifiedBug
Copy link
Owner

Summary

  • Default team/environment preferences — Users can star their preferred team and per-team environment in the selector dropdowns. Admins can set a team-level default environment as fallback. Server-side persistence via new UserPreference model with localStorage caching for instant hydration.
  • SCIM audit entity type — SCIM user operations now logged under ScimUser entity type (was User). Combined "SCIM (All)" filter in audit log UI shows both ScimUser and ScimGroup activity.
  • Settings sidebar navigation — Monolithic settings page split into 10 sub-routes. Sidebar transforms to show grouped settings nav (System, Security, Organization, Operations) with animated slide transition. Role-based visibility.
  • Decoupled deploy approvals — Approval and deployment are now separate actions. Reviewers approve, anyone with deploy access deploys. New DEPLOYED status tracks who executed. Approved requests show in deploy dialog with "Deploy Now" and "Cancel" buttons.
  • Event-based alerting — 10 new alert metrics (deploy events, SCIM sync failures, backup failures, certificate expiring, node join/leave, new version available). Event rules fire on occurrence (no threshold). Uses existing notification channels. fireEventAlert helper wired into deploy, SCIM, backup, fleet, and version check services.

Schema changes

  • New UserPreference model (userId + key + value)
  • Team.defaultEnvironmentId field
  • DeployRequest.deployedById, deployedBy, deployedAt fields + DEPLOYED status
  • AlertMetric enum: 10 new event-based values
  • AlertRule.condition, threshold, durationSeconds now nullable

Test plan

  • Star a team in dropdown, refresh — persists across sessions
  • Star an environment per team, switch teams — each has its own starred env
  • Clear localStorage, refresh — server preference restores correctly
  • Admin: set team default env in Settings > Team, verify fallback works for users without a personal default
  • Audit log: filter by "SCIM (All)" — shows both ScimUser and ScimGroup events
  • Navigate to Settings — sidebar transforms with animation, back button returns to main nav
  • Settings sections load correctly at each sub-route (/settings/auth, /settings/fleet, etc.)
  • Role-based: team admin sees only Team + Service Accounts in settings sidebar
  • Deploy approval: editor requests, admin approves (no deploy), editor clicks "Deploy Now"
  • Deploy approval: cancel an approved request before deploying
  • Admin direct deploy still bypasses approval
  • Create an event-based alert rule (e.g. deploy_completed) — threshold fields hidden
  • Trigger a deploy — verify notification fires through configured channel
  • Create infrastructure alert rule — threshold fields still shown

Add 10 new event-based AlertMetric enum values (deploy_requested,
deploy_completed, deploy_rejected, deploy_cancelled,
new_version_available, scim_sync_failed, backup_failed,
certificate_expiring, node_joined, node_left) for alerts that fire
on occurrence rather than polling.

Make condition, threshold, and durationSeconds nullable on AlertRule
so event-based rules can omit threshold configuration. Add null
guards in the alert evaluator to skip event-based rules during
polling, and update the UI and API to handle nullable fields.
@github-actions github-actions bot added documentation Improvements or additions to documentation feature labels Mar 9, 2026
@greptile-apps
Copy link
Contributor

greptile-apps bot commented Mar 9, 2026

Greptile Summary

This PR delivers five significant features: user/team default environment preferences (with server-side persistence and localStorage caching), a split settings sidebar with animated navigation and role-based visibility, decoupled deploy approvals (approve and deploy as separate actions with a new DEPLOYED status), 10 new event-based alert metrics wired into the relevant services, and a ScimUser audit entity type with a combined "SCIM (All)" filter. The schema changes are clean and migrations are correct. Per-rule error isolation in event-alerts.ts and non-throwing deploy failure revert logic are correctly implemented.

Key issues found:

  • Deploy attribution bug (src/server/routers/deploy.ts line 407–412): executeApprovedRequest calls deployAgent with request.requestedById (the original requester) instead of ctx.session.user.id (the actual executor). This breaks the audit trail for the new decoupled approval feature — every deployed pipeline will be attributed to the person who requested the approval, not the person who executed the deploy.
  • Status revert missing on deletion race (src/server/routers/deploy.ts lines 396–402): If the DeployRequest record is deleted between the atomic DEPLOYED CAS update and the subsequent findUnique, the procedure throws NOT_FOUND without reverting the status, leaving the request permanently stuck in DEPLOYED state.

Confidence Score: 3/5

  • Two critical logic bugs in the new deploy approval workflow that break audit attribution and leave requests in invalid states if race conditions occur.
  • The deploy attribution bug (passing requester's ID instead of executor's) directly breaks the primary feature of the PR — the new "separate approve/deploy" workflow will silently misattribute every deployment to the original requester. The missing status revert on deletion is a data integrity issue that could strand approved requests in an invalid state. Both are in high-traffic code paths and will manifest immediately when the feature is used in production. The rest of the feature set (preferences, sidebar, alerting, SCIM) is well-implemented with correct isolation and error handling.
  • src/server/routers/deploy.ts — specifically the executeApprovedRequest procedure (lines 396–402, 407–412)

Last reviewed commit: 1eb09ca

Comment on lines +13 to +31
set: protectedProcedure
.input(z.object({ key: z.string().max(100), value: z.string().max(500) }))
.mutation(async ({ ctx, input }) => {
await prisma.userPreference.upsert({
where: {
userId_key: { userId: ctx.session.user!.id!, key: input.key },
},
create: { userId: ctx.session.user!.id!, key: input.key, value: input.value },
update: { value: input.value },
});
}),

delete: protectedProcedure
.input(z.object({ key: z.string().max(100) }))
.mutation(async ({ ctx, input }) => {
await prisma.userPreference.deleteMany({
where: { userId: ctx.session.user!.id!, key: input.key },
});
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mutations missing withAudit middleware

The set and delete mutations don't use the withAudit middleware, which is required on every mutating procedure in the codebase. This leaves preference changes unlogged in the audit trail. Compare with team.updateDefaultEnvironment (also added in this PR), which correctly includes the middleware.

Add .use(withAudit("userPreference.set", "UserPreference")) before the .mutation() call on the set procedure (line 13), and .use(withAudit("userPreference.delete", "UserPreference")) before the .mutation() call on the delete procedure (line 25).

Rule Used: ## Security & Cryptography Review Rules

When revi... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/routers/user-preference.ts
Line: 13-31

Comment:
**Mutations missing `withAudit` middleware**

The `set` and `delete` mutations don't use the `withAudit` middleware, which is required on every mutating procedure in the codebase. This leaves preference changes unlogged in the audit trail. Compare with `team.updateDefaultEnvironment` (also added in this PR), which correctly includes the middleware.

Add `.use(withAudit("userPreference.set", "UserPreference"))` before the `.mutation()` call on the `set` procedure (line 13), and `.use(withAudit("userPreference.delete", "UserPreference"))` before the `.mutation()` call on the `delete` procedure (line 25).

**Rule Used:** ## Security & Cryptography Review Rules

When revi... ([source](https://app.greptile.com/review/custom-context?memory=7cb20c56-ca6a-40aa-8660-7fa75e6e3db2))

How can I resolve this? If you propose a fix, please make it concise.

@TerrifiedBug
Copy link
Owner Author

@greptile review

@TerrifiedBug
Copy link
Owner Author

@greptile review

Comment on lines +13 to +23
set: protectedProcedure
.input(z.object({ key: z.string().max(100), value: z.string().max(500) }))
.mutation(async ({ ctx, input }) => {
await prisma.userPreference.upsert({
where: {
userId_key: { userId: ctx.session.user!.id!, key: input.key },
},
create: { userId: ctx.session.user!.id!, key: input.key, value: input.value },
update: { value: input.value },
});
}),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The set and delete mutations are missing the withAudit middleware. Every mutating procedure in this codebase must use audit middleware to log changes to the audit trail. Without it, preference changes are unlogged and untracked.

Add the middleware to both mutations by importing withAudit from the TRPC init file and wrapping each mutation with .use(withAudit(...)) before the .mutation() call, following the same pattern as team.updateDefaultEnvironment elsewhere in the codebase.

Rule Used: ## Code Style & Conventions

TypeScript Conven... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/routers/user-preference.ts
Line: 13-23

Comment:
The `set` and `delete` mutations are missing the `withAudit` middleware. Every mutating procedure in this codebase must use audit middleware to log changes to the audit trail. Without it, preference changes are unlogged and untracked.

Add the middleware to both mutations by importing `withAudit` from the TRPC init file and wrapping each mutation with `.use(withAudit(...))` before the `.mutation()` call, following the same pattern as `team.updateDefaultEnvironment` elsewhere in the codebase.

**Rule Used:** ## Code Style & Conventions

### TypeScript Conven... ([source](https://app.greptile.com/review/custom-context?memory=6ae51394-d0b6-4686-bc4c-1ad840c2e310))

How can I resolve this? If you propose a fix, please make it concise.

@TerrifiedBug TerrifiedBug merged commit 17eaa67 into main Mar 9, 2026
3 checks passed
@TerrifiedBug TerrifiedBug deleted the feat/ux-and-ops-improvements branch March 9, 2026 13:37
Comment on lines 407 to 412
@@ -372,8 +411,17 @@ export const deployRouter = router({
request.configYaml,
);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Deployment attributed to original requester instead of executor

deployAgent is called with request.requestedById ?? ctx.session.user.id — this uses the original requester's ID as the deployer instead of the actual executor's ID. The fallback masks the primary issue: when request.requestedById is set (which it always is for approved requests), it becomes the deployer. This means the PipelineVersion snapshot, audit log entries, and internal deploy records will all attribute the deployment to the person who requested it, not the person who clicked "Deploy Now".

The DeployRequest.deployedById is already correctly set to ctx.session.user.id on line 390, so the two records will disagree about who executed the deployment. The fix is to always pass the executor's ID:

Suggested change
const result = await deployAgent(
request.pipelineId,
ctx.session.user.id,
request.changelog,
request.configYaml,
);

Rule Used: ## Pipeline Editor & Deployment Review Guide

The ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/routers/deploy.ts
Line: 407-412

Comment:
**Deployment attributed to original requester instead of executor**

`deployAgent` is called with `request.requestedById ?? ctx.session.user.id` — this uses the original requester's ID as the deployer instead of the actual executor's ID. The fallback masks the primary issue: when `request.requestedById` is set (which it always is for approved requests), it becomes the deployer. This means the `PipelineVersion` snapshot, audit log entries, and internal deploy records will all attribute the deployment to the person who *requested* it, not the person who clicked "Deploy Now".

The `DeployRequest.deployedById` is already correctly set to `ctx.session.user.id` on line 390, so the two records will disagree about who executed the deployment. The fix is to always pass the executor's ID:

```suggestion
        const result = await deployAgent(
          request.pipelineId,
          ctx.session.user.id,
          request.changelog,
          request.configYaml,
        );
```

**Rule Used:** ## Pipeline Editor & Deployment Review Guide

The ... ([source](https://app.greptile.com/review/custom-context?memory=99b9234e-e536-4295-b97d-d10499542ee1))

How can I resolve this? If you propose a fix, please make it concise.

Comment on lines +396 to +402
// Fetch the full request to get configYaml, pipelineId, changelog
const request = await prisma.deployRequest.findUnique({
where: { id: input.requestId },
});
if (!request) {
throw new TRPCError({ code: "NOT_FOUND", message: "Deploy request not found" });
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status stuck at DEPLOYED if record deleted between atomic update and fetch

The code atomically sets the request to DEPLOYED (line 388–391), then immediately tries to fetch the full record (line 397–401). If the record is deleted between these operations — possible through a concurrent admin operation — the findUnique returns null and the procedure throws NOT_FOUND without reverting the DEPLOYED status. The request is left permanently in DEPLOYED state with no recovery path.

The two revert blocks at lines 416–419 and 442–446 only cover the deployAgent execution path. Add a revert before re-throwing for the early-exit path:

Suggested change
// Fetch the full request to get configYaml, pipelineId, changelog
const request = await prisma.deployRequest.findUnique({
where: { id: input.requestId },
});
if (!request) {
throw new TRPCError({ code: "NOT_FOUND", message: "Deploy request not found" });
}
// Fetch the full request to get configYaml, pipelineId, changelog
const request = await prisma.deployRequest.findUnique({
where: { id: input.requestId },
});
if (!request) {
await prisma.deployRequest.updateMany({
where: { id: input.requestId, status: "DEPLOYED" },
data: { status: "APPROVED", deployedById: null, deployedAt: null },
});
throw new TRPCError({ code: "NOT_FOUND", message: "Deploy request not found" });
}

Rule Used: ## Pipeline Editor & Deployment Review Guide

The ... (source)

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/server/routers/deploy.ts
Line: 396-402

Comment:
**Status stuck at DEPLOYED if record deleted between atomic update and fetch**

The code atomically sets the request to `DEPLOYED` (line 388–391), then immediately tries to fetch the full record (line 397–401). If the record is deleted between these operations — possible through a concurrent admin operation — the `findUnique` returns `null` and the procedure throws `NOT_FOUND` without reverting the `DEPLOYED` status. The request is left permanently in `DEPLOYED` state with no recovery path.

The two revert blocks at lines 416–419 and 442–446 only cover the `deployAgent` execution path. Add a revert before re-throwing for the early-exit path:

```suggestion
      // Fetch the full request to get configYaml, pipelineId, changelog
      const request = await prisma.deployRequest.findUnique({
        where: { id: input.requestId },
      });
      if (!request) {
        await prisma.deployRequest.updateMany({
          where: { id: input.requestId, status: "DEPLOYED" },
          data: { status: "APPROVED", deployedById: null, deployedAt: null },
        });
        throw new TRPCError({ code: "NOT_FOUND", message: "Deploy request not found" });
      }
```

**Rule Used:** ## Pipeline Editor & Deployment Review Guide

The ... ([source](https://app.greptile.com/review/custom-context?memory=99b9234e-e536-4295-b97d-d10499542ee1))

How can I resolve this? If you propose a fix, please make it concise.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation feature

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant