Skip to content

feat(observability): add dashboard at /observability with agent + task views#185

Open
eipasteur wants to merge 21 commits into
mainfrom
feature/observability-dashboard
Open

feat(observability): add dashboard at /observability with agent + task views#185
eipasteur wants to merge 21 commits into
mainfrom
feature/observability-dashboard

Conversation

@eipasteur
Copy link
Copy Markdown
Contributor

Summary

Add an Observability Dashboard at /observability to give operators a single pane on agent activity, project health, and task progression across the platform.

Components (all new under frontend/src/components/observability/)

  • ObservabilityLayout — outer chrome with sidebar + content area
  • ObservabilityDashboard — top-level metrics + agent status overview
  • ObservabilitySidebar — left nav between Overview / Agents / Projects / Tasks
  • AgentStatusCards — live cards per agent (working, idle, errored)
  • DashboardMetrics — global counters (active sprints, pending questions, etc.)
  • MetricCard — reusable presentational card primitive
  • ProjectDetailView — drill-down for a single project
  • TaskBoard — kanban-style view of in-flight tasks

Routing

/observability route updated in App.tsx to render ObservabilityLayout (replaces the legacy ObservabilityPage which is now removed).

Stacked on

Stacks on #184 (cache + sidebar) since the dashboard reuses useProjectsCache and the global sidebar pattern. Includes #182 (esbuild fix) transitively.

Merge order: #182#183#184 → this PR.

Validation

  • ✅ 153 unit tests pass
  • ✅ TypeScript clean
  • npm run build clean
  • ✅ Components reuse ui/* primitives (Card, Badge, etc.) — no UI duplication
  • ✅ Cache reuse: ObservabilityDashboard consumes the same useProjectsCache hook as the main app — no duplicate fetches when toggling between dashboards

Out of scope

  • Backend changes (this PR is presentation-only — consumes existing /api/projects, /api/sprints, /api/agents endpoints)
  • WebSocket live updates (deferred — current refresh interval matches the cache TTL)

eipasteur added 21 commits May 21, 2026 12:09
Replace sprint-centric sidebar with a global navigation layout that
shows all projects with their current agent status. The sidebar is now
always visible, providing quick access to projects, observability, and
backlog views.

- Add stale-while-revalidate cache for projects and sprint lists
- Show running/waiting agent indicators per project in sidebar
- Activity panel auto-opens only when inside a sprint
- Add Collapsible UI component (radix-ui)
Add a dedicated observability view for monitoring agent activity across
projects. Includes real-time status cards, task boards, metric summaries,
and per-project detail views with collapsible phase sections.

- Add MetricCard, DashboardMetrics, AgentStatusCards components
- Add TaskBoard for sprint task status overview
- Add ObservabilitySidebar for project navigation within the view
- Add ProjectDetailView with agent stream and phase breakdown
- Add ObservabilityLayout and ObservabilityDashboard pages
- Update barrel exports
- AppShell: ActivityPanel resolves projectId via useParams, no need to pass it down
- useProjectsCache: remove unused useState import
Clean up imports and destructured values that were unused after porting
from upstream:
- ProjectDetailView: drop unused Card sub-components, Separator, ScrollArea,
  sprintsService, realtimeService, Project, LastToolMap, PendingQuestionsMap,
  useEffect, lastTool param
- TaskBoard: drop unused ListChecks icon
- ObservabilityDashboard: drop unused DashboardMetrics, projects, selectedProjectId
… Backlog link

- Dashboard now consumes the shared cache instead of fetching projects
  directly. This removes a duplicate request at startup (Dashboard +
  AppSidebar were both fetching) and lets create/delete actions invalidate
  a single source of truth.
- Remove the Backlog navigation entry from AppSidebar: there is no /backlog
  route in this codebase, the button was dead.
The new ObservabilityLayout renders the redesigned ObservabilityDashboard
and ProjectDetailView added in this branch. Update the App.tsx route to
point at it and drop the old ObservabilityPage now that nothing imports
it.
Refactor /project/:projectId to act as a true project landing page,
not a sprint-centric view. The page now:

- Reads project + sprints from useProjectsCache (shared with sidebar
  and dashboard) — single fetch path, no more duplicate requests when
  navigating between Dashboard and Project.
- Subscribes to sprint events via useSprintEvents on the latest sprint,
  so agent status changes appear in real time without manual refresh.
- Renders a Live badge while an agent is running or waiting on the
  latest sprint, plus per-sprint status icons in the iteration list.
- Splits sprints into 'Active' and 'Past' (collapsible) so the user
  always sees what is happening now first.
- Embeds Jan's IssueListPanel side-by-side with the iteration list, so
  GitHub issues drive sprint creation directly from this page.
…on button

- Wrap IssueListPanel under an 'Issues' uppercase header that mirrors
  the existing 'Iterations' header, so both columns line up at the
  same vertical position.
- Add a primary 'Start' button to the right of the Iterations title
  with a Dialog that asks for the iteration name and calls
  sprintsService.create. On success, refresh the cached sprint list.
- Drop the hard-coded mt-6 from IssueListPanel's outer Card; the
  parent layout now owns vertical spacing, so the component composes
  cleanly under any header.
…spacing

The two columns were drifting vertically because their headers had
different heights (the Issues h3 was its natural height, while the
Iterations row was a flex container sized by the Start button).

- Wrap both titles in 'flex items-center justify-between h-7' so they
  share the same fixed height as the Start button.
- Reduce the column space-y from 4 to 2 so the title sits closer to
  the panel underneath.
PR #176 migrated lambda/github from CommonJS (require('./shared/...')) to
ESM (import '../shared/response.js'), but the Terraform packaging copied
shared/ at the same level as index.js in the zip. At runtime,
'../shared/response.js' resolved to /var/shared/response.js (one level
above /var/task/), which does not exist, causing ERR_MODULE_NOT_FOUND
on every invocation.

Apply the same pattern as PR #180 (github-issues): build the lambda with
esbuild via the npm workspace and zip the .build output. esbuild inlines
the shared/ modules at build time, eliminating runtime path resolution.

- terraform/modules/api/lambda/main.tf:
  - Replace npm_requirements + prefix_in_zip='shared' with the build/:zip
    pattern, mirroring github_issues_lambda
  - Bump runtime to nodejs24.x for consistency with the build target
    (esbuild --target=node24, package.json already targets node24)

Tests (153 unit tests across 7 files) continue to pass since they import
'../index.js' which resolves to lambda/github/index.js, whose original
'../shared/response.js' import resolves correctly inside the source tree
at lambda/shared/response.js (the same path esbuild uses to bundle).

Closes the runtime regression introduced in #176.
Two layered runtime regressions blocked /api/github/* on staging:

1. PR #176 migrated lambda/github from CommonJS to ESM but the Terraform
   packaging used npm_requirements + prefix_in_zip='shared'. The ESM
   resolver dereferenced '../shared/response.js' to /var/shared/...
   (one level above /var/task/), which doesn't exist, throwing
   ERR_MODULE_NOT_FOUND on every cold start.

2. shared/git-token.js is CommonJS and does require('@aws-sdk/client-ssm')
   at the module top level. esbuild bundles a CJS shim around it, but
   the Node.js ESM runtime rejects the resulting dynamic require:
   'Dynamic require of @aws-sdk/client-ssm is not supported'.

Fix:
- terraform/modules/api/lambda/main.tf: switch github_lambda from
  npm_requirements to the build/:zip pattern (mirroring github_issues
  per PR #180), bumping runtime to nodejs24.x to match the esbuild
  --target=node24 the workspace already uses.
- lambda/github/index.js: inline resolveGitToken locally, mirroring the
  exact pattern adopted by lambda/github-issues. shared/response.js
  remains imported because esbuild can statically bundle it without
  hitting a dynamic require (it has no top-level require of an external
  package).

Validated end-to-end on staging:
- terraform apply succeeded (no drift)
- aws lambda invoke returns statusCode 200 with valid JSON body
  ({connected: false}) for /api/github/status
- 153 unit tests across the repo continue to pass

Closes the runtime regression introduced in #176.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant