Skip to content

[Backend] Build High-Scale User Activity Tracking System (Relational DB, Analytics, Multi-Tenant, Production Ready) #6

@abhishek-nexgen-dev

Description

@abhishek-nexgen-dev

We are building a high-scale activity tracking system to capture all user and system events across the platform.

User actions occur
→ events captured
→ queued (async)
→ processed by workers
→ stored in relational DB
→ queried for analytics
→ exposed via APIs

🎯 Goals

System must be:

  • High throughput (100K–1M events/day)
  • Strongly structured (Relational DB)
  • Multi-tenant safe
  • Analytics-ready (fast queries)
  • Fault-tolerant
  • Production-grade

🧱 SYSTEM ARCHITECTURE


Tech Stack

Backend → Node.js + Express
Database → PostgreSQL (preferred)
Queue → Redis (BullMQ) or Kafka (future)
ORM → Prisma / Sequelize

High-Level Flow

App Event
→ Event Emitter (trackEvent)
→ Queue (Redis/Kafka)
→ Worker processes
→ Batch insert into DB
→ Analytics APIs query DB

🧠 CORE MODULES


1️⃣ EVENT TRACKING SYSTEM


Internal API

trackEvent({
  userId,
  communityId,
  eventType,
  entityType,
  entityId,
  metadata
})

Event Types

user.signup
user.login
community.created
member.created
member.activated
event.created
hackathon.created
webhook.triggered
github.push
github.pr.opened

Requirements

  • non-blocking (async)
  • lightweight
  • reusable across services


2️⃣ RELATIONAL DATABASE DESIGN


Table: users

id UUID PRIMARY KEY
email TEXT UNIQUE
created_at TIMESTAMP

Table: communities

id UUID PRIMARY KEY
name TEXT
created_at TIMESTAMP

Table: activities (CORE)

id BIGSERIAL PRIMARY KEY
user_id UUID REFERENCES users(id)
community_id UUID REFERENCES communities(id)

event_type TEXT NOT NULL
entity_type TEXT
entity_id TEXT

metadata JSONB

ip_address TEXT
user_agent TEXT

created_at TIMESTAMP DEFAULT NOW()

Optional: partitions

PARTITION BY RANGE (created_at)


3️⃣ INDEXING STRATEGY (CRITICAL)


Basic Indexes

CREATE INDEX idx_user ON activities(user_id);
CREATE INDEX idx_community ON activities(community_id);
CREATE INDEX idx_event_type ON activities(event_type);
CREATE INDEX idx_created_at ON activities(created_at);

Composite Indexes

(user_id, created_at)
(community_id, event_type)
(event_type, created_at)

JSON Index

GIN (metadata)


4️⃣ EVENT INGESTION PIPELINE


Flow

Event occurs
→ push to queue
→ worker consumes
→ validate
→ batch insert into DB

Batch Insert

  • insert 100–1000 rows per batch
  • reduces DB load


5️⃣ SCALING STRATEGY


Horizontal Scaling

  • multiple workers
  • load-balanced API

DB Scaling

  • read replicas
  • partitioned tables

Queue Scaling

  • Redis cluster
  • Kafka (future upgrade)


6️⃣ ANALYTICS SYSTEM


API Endpoints

GET /api/v1/analytics/overview
GET /api/v1/analytics/community/:id
GET /api/v1/analytics/user/:id
GET /api/v1/analytics/events

Sample Queries


Daily Active Users

SELECT COUNT(DISTINCT user_id)
FROM activities
WHERE created_at >= NOW() - INTERVAL '1 day';

Event Count

SELECT event_type, COUNT(*)
FROM activities
GROUP BY event_type;


7️⃣ SECURITY 🔐


Data Safety

  • no sensitive data in metadata
  • sanitize all inputs

Abuse Protection

  • rate limit event ingestion
  • detect spam events

Access Control

  • analytics APIs protected
  • role-based access


8️⃣ IDEMPOTENCY


Problem

duplicate events (retry, network)


Solution

Add:

event_id (unique)

DB Constraint

UNIQUE(event_id)


9️⃣ DATA RETENTION & ARCHIVING


Policy

Hot data → 3 months
Warm data → 6–12 months
Cold → archive (S3)


🔟 OBSERVABILITY


Logging

  • structured logs (Pino)

Metrics

events/sec
queue size
DB latency
error rate

Alerts

  • DB slow queries
  • queue backlog
  • ingestion failures


1️⃣1️⃣ ERROR HANDLING


Strategy

  • retry failed jobs
  • dead letter queue

Failure Cases

DB down
queue failure
invalid payload


1️⃣2️⃣ TESTING


Unit

  • event validator
  • metadata sanitizer

Integration

  • event → queue → DB

Load Testing

  • simulate 100K events


1️⃣3️⃣ EDGE CASES


duplicate events
high traffic spikes
DB failure
queue crash
invalid metadata


⚙️ PERFORMANCE OPTIMIZATION


  • batch inserts
  • connection pooling
  • prepared statements


🌍 ENVIRONMENT


DEV
STAGING
PROD


📦 FOLDER STRUCTURE


/events
/services
/workers
/repositories
/models
/utils


✅ ACCEPTANCE CRITERIA


✔ Events tracked across system
✔ Stored in relational DB
✔ Query time < 200ms
✔ Handles high traffic
✔ Partitioning implemented
✔ Analytics APIs working
✔ Secure & scalable


🔥 FINAL SUMMARY

This system behaves like:

Mixpanel / Google Analytics backend system

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions