Automated Change Detection for Student Information Systems
A system that scrapes a Student Information System (SIS) parent portal on a schedule, detects changes to assignment and grade data, including retroactive modifications, and surfaces those changes through a calendar-based diff UI. UI inspired by https://archive.org.
Built to solve a real-world problem: Student Information Systems are inherently ephemeral. Updates made retroactively by a teacher are not surfaced as a time series, leaving parents and students to detect changes through manually timestamped screenshots or spreadsheets. Git history provides the audit trail natively, as every snapshot is an immutable, timestamped commit.
The system is decomposed into four isolated runtime boundaries deployed across n8n, a container runtime, GitHub, and Vercel. Each has a single responsibility and no access to credentials it does not need.
┌──────────────────────────────────────────────┐
│ ORCHESTRATION │
│ n8n │
│ │
│ Scheduling · Dispatch · Conditional logic │
│ Credential management · Delivery routing │
│ │
│ (see Control Plane Workflow.png) │
└──┬──────────────┬───────────────────┬────────┘
│ │ │
POST /scrape │ GitHub API │ Email, │
HMAC │ (commit) │ Slack, etc. │
▼ ▼ ▼
┌────────────────────────┐ ┌────────────────────────┐ ┌────────────────────┐
│ COLLECTION + INDEX │ │ STORAGE │ │ MESSAGING │
│ Container Runtime │ │ GitHub │ │ n8n-native │
│ │ │ │ │ │
│ Headless browser │ │ Private repo │ │ Email │
│ scraping SIS portal │ │ (data snapshots only) │ │ Future channels │
│ │ │ │ │ │
│ -POST /scrape: │ │ -Code and data │ │ -n8n routes to │
│ stateless worker, │ │ repos separated │ │ channels based │
│ returns JSON │ │ -Git history = │ │ on change type │
│ -POST /rebuild-index: │ │ full audit log │ │ │
│ diffs snapshots, │ │ │ │ │
│ writes rolling index │ │ │ │ │
└────────────────────────┘ └────────┬───────────────┘ └────────────────────┘
│
GitHub API │ (read-only)
▼
┌──────────────────────┐
│ PRESENTATION │
│ Vercel │
│ │
│ -Calendar UI │
│ -Client-side diffs │
│ -SSR from storage │
│ -Voice agent widget │
└──────┬───────────────┘
│
LiveKit │ (voice/video)
▼
┌──────────────────────┐
│ VOICE AGENT │
│ LiveKit Cloud │
│ │
│ sally-schoolwork │
│ (separate repo) │
│ -Voice/text Q&A │
│ -Avatar (lip-sync) │
│ -Browser navigation │
│ via RPC │
└──────────────────────┘
n8n owns the data flow. The scraper is a stateless worker that returns JSON -- it knows nothing about where data goes or who gets notified. n8n decides what happens with the results: commit to GitHub, notify on changes, retry on failure, route alerts and status notifications via email, with additional messaging channels available as workflow branches.
This is a deliberate separation. The scraper's job is to collect. n8n's job is to orchestrate. Neither holds responsibility for the other's concerns.
Workflow: Schedule Trigger → POST /scrape to scraper → success/failure branch → split class snapshots → commit each to GitHub → update snapshot metadata → POST /rebuild-index (server-side diff + index write) → notify on error (scrape failure or index rebuild failure)
A Python + Playwright service behind FastAPI with two endpoints:
- POST /scrape: Stateless scraper. Logs into the SIS portal, extracts assignment tables, returns normalized JSON. Knows nothing about storage or delivery.
- POST /rebuild-index: Reads all snapshots from the data repo, diffs consecutive pairs, and writes a rolling index with accurate change counts. Called by n8n after each scrape, or manually for backfills.
The scraper implements a TableSource abstraction so additional SIS platforms can be added without changing the orchestration layer.
SISSource(TableSource) <- current implementation
FutureSource(TableSource) <- swap in without touching n8n
Authentication is HMAC-based: n8n signs each request with a shared webhook secret, which the scraper verifies before processing.
Code and data live in separate repositories:
- table-mutation-tracker (this repo): Application code, public-ready
- table-mutation-data (private): Snapshot JSON and rolling index only
The data repo receives commits exclusively through n8n via the GitHub Contents API. Git history serves as a complete audit trail of every change detected.
snapshots/
2026-03-10/
140532/
english_10/assignments.json
algebra_2/assignments.json
metadata.json
index/
rolling_index.json
A Next.js app on Vercel reads snapshots from the data repo via GitHub API. Change detection happens at two levels:
- Rolling index (pre-computed): After each scrape, n8n calls the scraper's
/rebuild-indexendpoint, which diffs all consecutive snapshot pairs, writes change counts, and computes a weighted GPA to the rolling index. The calendar view reads these counts to color-code days without fetching individual snapshots. - Detail view (client-side): When a user clicks a day, the frontend fetches both snapshots and diffs them field-by-field, classifying changes as added, deleted, or modified.
This keeps the frontend fully decoupled from both the scraper and n8n. It reads from storage and nothing else.
n8n handles all outbound communication. Error notifications are active today: email on scrape failure, and a global workspace error handler following n8n design patterns. The architecture supports adding channels (Slack, SMS, weekly digests) as n8n workflow branches without touching any application code.
table-mutation-tracker/
├── src/
│ ├── main.py # Scrape orchestration (called by webhook)
│ ├── webhook.py # FastAPI endpoints: /scrape, /rebuild-index, /health
│ ├── config.py # Config loader (sources.json + env vars)
│ ├── lib/
│ │ ├── snapshot_store.py # SnapshotStore ABC + diff logic
│ │ └── github_store.py # GitHub Contents API implementation
│ └── scraper/
│ ├── base.py # TableSource ABC, Assignment/ClassSnapshot models
│ ├── sis.py # SIS portal implementation
│ └── normalize.py # Date, score, grade normalization
├── frontend/
│ ├── app/ # Next.js App Router (calendar + day views)
│ │ ├── api/livekit-token/ # Server-side JWT generation for agent dispatch
│ │ ├── deleted/ # Deleted assignments history page
│ │ └── help/ # Agent capabilities page (navigated to via RPC)
│ ├── components/
│ │ ├── AgentWidget.tsx # Floating voice/video widget with connect/disconnect
│ │ ├── GradeBanner.tsx # Per-class grade cards + GPA display
│ │ ├── CalendarView, DiffTable, ClassTabs, ChangeLegend, etc.
│ ├── lib/
│ │ ├── diff.ts # Client-side assignment diffing engine
│ │ ├── snapshots.ts # GitHub API data fetching
│ │ └── types.ts # Shared TypeScript types
│ └── hooks/ # Local snapshot state management
├── config/
│ └── sources.json # SIS URLs, class definitions, GPA weights per class
├── scripts/
│ ├── generate_synthetic.py # Test data generator (--days N --clean, pushes to data repo)
│ └── rebuild_index.py # CLI to rebuild rolling index from existing snapshots
├── Dockerfile # Playwright + Chromium containerized scraper
├── docker-compose.yml # Caddy + n8n deployment (GCE)
└── Caddyfile # Reverse proxy + auto-TLS for n8n
| Variable | Description |
|---|---|
SIS_USERNAME |
SIS portal credentials |
SIS_PASSWORD |
SIS portal credentials |
WEBHOOK_SECRET |
Shared secret for n8n request verification |
GITHUB_TOKEN |
Fine-grained PAT with read/write access to data repo (used by /rebuild-index) |
DATA_REPO |
GitHub data repo (owner/repo format, used by /rebuild-index) |
| Variable | Description |
|---|---|
GITHUB_TOKEN |
Fine-grained PAT with read access to data repo |
DATA_REPO |
GitHub data repo (owner/repo format) |
DATA_PREFIX |
Namespace for data paths. Empty for production, synthetic for preview deployments |
BASIC_AUTH_CREDENTIALS |
Single user:pass for HTTP basic auth. Scope per environment in Vercel. Unset = no auth |
NEXT_PUBLIC_LIVEKIT_URL |
LiveKit Cloud WebSocket URL (for agent widget) |
LIVEKIT_URL |
LiveKit Cloud WebSocket URL (server-side token generation) |
LIVEKIT_API_KEY |
LiveKit API key (server-side token generation) |
LIVEKIT_API_SECRET |
LiveKit API secret (server-side token generation) |
| Variable | Description |
|---|---|
N8N_USER |
Basic auth username |
N8N_PASSWORD |
Basic auth password |
All remaining credentials (GitHub PAT, webhook secret, SIS credentials for passthrough) are stored in n8n's encrypted credential store.
- Docker
- A container runtime that can serve HTTP (Cloud Run, App Runner, Fly.io, or similar)
- A container registry accessible from your runtime
- GitHub fine-grained PAT with Contents read/write on the data repo
- n8n instance (self-hosted or cloud)
# Build for linux/amd64 (required for Playwright/Chromium)
docker buildx build --platform linux/amd64 -t <your-registry>/table-scraper:latest .
# Push to your registry and deploy to your container runtime
docker push <your-registry>/table-scraper:latestcp .env.example .env
# Fill in N8N_USER, N8N_PASSWORD
docker compose up -dDeployed via Vercel with auto-deploy from this repo. Set GITHUB_TOKEN, DATA_REPO, and DATA_PREFIX in Vercel environment settings.
A LiveKit-powered voice agent ("Sally Schoolwork") is embedded as a floating widget in the frontend. Users ask questions about grades, assignments, and changes by voice; the agent answers conversationally and auto-navigates the browser to relevant views via LiveKit RPC.
The agent backend lives in a separate repo (sally-schoolwork). This repo provides the frontend widget and RPC navigation handler. See LIVEKIT_AGENT.md for integration details and PROGRESS_LIVEKIT.md for current status.
Frontend components:
AgentWidget.tsx— floating widget with video/audio rendering, persona selector, connect/disconnectNavigationHandler— registersnavigateToRPC method, callsrouter.push()on agent request/api/livekit-token— server-side JWT generation withRoomAgentDispatch/help— capabilities page the agent navigates to after onboarding
Environment variables (add to frontend/.env.local):
NEXT_PUBLIC_LIVEKIT_URL— LiveKit Cloud WebSocket URLLIVEKIT_URL,LIVEKIT_API_KEY,LIVEKIT_API_SECRET— server-side token generation
MIT License. Copyright (c) 2026 David Lasley.
Built using AI-assisted development tooling while maintaining human ownership of architectural decisions, runtime separation, and governance design. AI accelerated implementation; system decomposition and operationalization patterns were deliberate and human-directed.
The focus throughout:
- n8n as orchestrator. It owns the data flow; workers are stateless.
- Runtime isolation over monolithic convenience. Each component scales, fails, and deploys independently.
- Credential isolation by boundary. No component holds secrets it does not need.
- Computed over stored. The rolling index pre-computes change counts for monthly calendar view performance, but detail-level diffs are always derived at read time from raw snapshots.