Skip to content

ChrisECG/docpipeline

Repository files navigation

Document Pipeline Dashboard

A real-time document pipeline monitoring dashboard built as a technical challenge for a Frontend Engineer position.

The app lets operators track documents as they move through a multi-step processing pipeline (pending → extracting → validating → indexing → done / failed), take batch actions (approve, reject, reassign), and receive live notifications for critical events.

Live demo: docpipeline.sbs

Tech stack

  • Next.js 16 (App Router) with a custom tsx HTTP server
  • TanStack Query — data fetching and optimistic cache management
  • TanStack Table + TanStack Virtual — virtualized document list with server-side filtering/pagination
  • Socket.IO — real-time pipeline state updates pushed from the server
  • SSE (ReadableStream) — streaming progress for batch jobs
  • In-memory singleton store (no database); seeded from JSON files on startup

Documentation

Document Contents
Architecture System diagram, data flow per ingestion path, layer table
Communication map REST / WebSocket / SSE / polling — what uses each and why
Action queue design Concurrency, retry/backoff, optimistic updates, rollback
ADR Four architecture decisions: TanStack Query, in-memory store, Socket.IO, client fan-out
Wireframes ASCII wireframes of the main panel, notification drawer, Drive flow
Assumptions Every spec ambiguity and the interpretation taken

Running for development

npm install
npm run dev

The app starts at http://localhost:3000.

npm run dev uses tsx server.ts (a custom HTTP server that wraps Next.js and attaches the WebSocket server on /api/ws). Turbopack is not used because the custom server is incompatible with it.

Running for production

npm run build
npm start

npm start also uses tsx server.ts with NODE_ENV=production.

Docker

# Build
docker build -t docpipeline .

# Run (Google Drive integration requires the env vars below)
docker run -p 3000:3000 \
  -e GOOGLE_CLIENT_ID=your_client_id \
  -e GOOGLE_CLIENT_SECRET=your_client_secret \
  -e GOOGLE_REDIRECT_URI=https://docpipeline.sbs/api/drive/callback \
  docpipeline

NEXT_PUBLIC_GOOGLE_API_KEY is a build-time variable (embedded by Next.js at next build). Pass it as a build argument if needed:

docker build \
  --build-arg NEXT_PUBLIC_GOOGLE_API_KEY=your_api_key \
  -t docpipeline .

All env vars are optional — the app starts and runs without them (Drive import will redirect to an auth error if the Google vars are missing).

Variable Default Description
GOOGLE_CLIENT_ID OAuth 2.0 client ID from Google Cloud Console
GOOGLE_CLIENT_SECRET OAuth 2.0 client secret
GOOGLE_REDIRECT_URI http://localhost:3000/api/drive/callback OAuth callback URL
NEXT_PUBLIC_GOOGLE_API_KEY Google Picker API key (build-time)
SEED_COUNT 50000 Documents seeded at startup
PORT 3000 HTTP port

Environment variables (local dev)

Copy .env.example to .env and fill in the values before running the app.

Google Cloud setup: create a Web application OAuth 2.0 credential and add http://localhost:3000/api/drive/callback as an authorized redirect URI. Enable the Google Picker API in the API library for the same project.

The app runs without these variables — the Import from Drive button will redirect to the auth flow and show an error if they are missing.

How the pipeline works

Documents start in pending and auto-advance:

pending → extracting (3 s) → validating (3 s) — pauses here for operator action

From validating, an operator can approve (→ indexing → done) or reject (→ failed).

Each auto-step has an 8% chance of random failure. The probability is controlled by the FAIL_CHANCE constant in lib/pipeline.ts.

Testing DOCUMENT_FAILED notifications

Increase the failure chance (optional)

Open lib/pipeline.ts and change FAIL_CHANCE:

const FAIL_CHANCE = 0.8 // 80% — almost guaranteed to fail

Restart the dev server after changing this value.

Ingest a document via webhook

curl -X POST http://localhost:3000/api/webhooks/ingest \
  -H "Content-Type: application/json" \
  -d '{"name": "test-doc.pdf", "path": "/uploads/test-doc.pdf"}'

The pipeline starts automatically. Within ~3–6 seconds (depending on which step fails) a red notification panel will slide in from the right showing "Pipeline failed" for the document.

Trigger a failure via manual rejection

  1. Open the dashboard and find any document in Validating status.
  2. Select it and click Reject in the action queue (or use the batch action toolbar).
  3. The notification panel opens immediately with a "Pipeline failed" entry.

Running tests

npm test

Vitest unit tests cover the action queue: happy path, exponential backoff retry, no-retry on 4xx, rollback after max retries, and concurrency cap.

About

A real-time document pipeline monitoring dashboard

Topics

Resources

Stars

Watchers

Forks

Contributors