We are building a high-scale activity tracking system to capture all user and system events across the platform.
User actions occur
→ events captured
→ queued (async)
→ processed by workers
→ stored in relational DB
→ queried for analytics
→ exposed via APIs
🎯 Goals
System must be:
- High throughput (100K–1M events/day)
- Strongly structured (Relational DB)
- Multi-tenant safe
- Analytics-ready (fast queries)
- Fault-tolerant
- Production-grade
🧱 SYSTEM ARCHITECTURE
Tech Stack
Backend → Node.js + Express
Database → PostgreSQL (preferred)
Queue → Redis (BullMQ) or Kafka (future)
ORM → Prisma / Sequelize
High-Level Flow
App Event
→ Event Emitter (trackEvent)
→ Queue (Redis/Kafka)
→ Worker processes
→ Batch insert into DB
→ Analytics APIs query DB
🧠 CORE MODULES
1️⃣ EVENT TRACKING SYSTEM
Internal API
trackEvent({
userId,
communityId,
eventType,
entityType,
entityId,
metadata
})
Event Types
user.signup
user.login
community.created
member.created
member.activated
event.created
hackathon.created
webhook.triggered
github.push
github.pr.opened
Requirements
- non-blocking (async)
- lightweight
- reusable across services
2️⃣ RELATIONAL DATABASE DESIGN
Table: users
id UUID PRIMARY KEY
email TEXT UNIQUE
created_at TIMESTAMP
Table: communities
id UUID PRIMARY KEY
name TEXT
created_at TIMESTAMP
Table: activities (CORE)
id BIGSERIAL PRIMARY KEY
user_id UUID REFERENCES users(id)
community_id UUID REFERENCES communities(id)
event_type TEXT NOT NULL
entity_type TEXT
entity_id TEXT
metadata JSONB
ip_address TEXT
user_agent TEXT
created_at TIMESTAMP DEFAULT NOW()
Optional: partitions
PARTITION BY RANGE (created_at)
3️⃣ INDEXING STRATEGY (CRITICAL)
Basic Indexes
CREATE INDEX idx_user ON activities(user_id);
CREATE INDEX idx_community ON activities(community_id);
CREATE INDEX idx_event_type ON activities(event_type);
CREATE INDEX idx_created_at ON activities(created_at);
Composite Indexes
(user_id, created_at)
(community_id, event_type)
(event_type, created_at)
JSON Index
4️⃣ EVENT INGESTION PIPELINE
Flow
Event occurs
→ push to queue
→ worker consumes
→ validate
→ batch insert into DB
Batch Insert
- insert 100–1000 rows per batch
- reduces DB load
5️⃣ SCALING STRATEGY
Horizontal Scaling
- multiple workers
- load-balanced API
DB Scaling
- read replicas
- partitioned tables
Queue Scaling
- Redis cluster
- Kafka (future upgrade)
6️⃣ ANALYTICS SYSTEM
API Endpoints
GET /api/v1/analytics/overview
GET /api/v1/analytics/community/:id
GET /api/v1/analytics/user/:id
GET /api/v1/analytics/events
Sample Queries
Daily Active Users
SELECT COUNT(DISTINCT user_id)
FROM activities
WHERE created_at >= NOW() - INTERVAL '1 day';
Event Count
SELECT event_type, COUNT(*)
FROM activities
GROUP BY event_type;
7️⃣ SECURITY 🔐
Data Safety
- no sensitive data in metadata
- sanitize all inputs
Abuse Protection
- rate limit event ingestion
- detect spam events
Access Control
- analytics APIs protected
- role-based access
8️⃣ IDEMPOTENCY
Problem
duplicate events (retry, network)
Solution
Add:
DB Constraint
9️⃣ DATA RETENTION & ARCHIVING
Policy
Hot data → 3 months
Warm data → 6–12 months
Cold → archive (S3)
🔟 OBSERVABILITY
Logging
Metrics
events/sec
queue size
DB latency
error rate
Alerts
- DB slow queries
- queue backlog
- ingestion failures
1️⃣1️⃣ ERROR HANDLING
Strategy
- retry failed jobs
- dead letter queue
Failure Cases
DB down
queue failure
invalid payload
1️⃣2️⃣ TESTING
Unit
- event validator
- metadata sanitizer
Integration
Load Testing
1️⃣3️⃣ EDGE CASES
duplicate events
high traffic spikes
DB failure
queue crash
invalid metadata
⚙️ PERFORMANCE OPTIMIZATION
- batch inserts
- connection pooling
- prepared statements
🌍 ENVIRONMENT
📦 FOLDER STRUCTURE
/events
/services
/workers
/repositories
/models
/utils
✅ ACCEPTANCE CRITERIA
✔ Events tracked across system
✔ Stored in relational DB
✔ Query time < 200ms
✔ Handles high traffic
✔ Partitioning implemented
✔ Analytics APIs working
✔ Secure & scalable
🔥 FINAL SUMMARY
This system behaves like:
Mixpanel / Google Analytics backend system
We are building a high-scale activity tracking system to capture all user and system events across the platform.
🎯 Goals
System must be:
🧱 SYSTEM ARCHITECTURE
Tech Stack
High-Level Flow
🧠 CORE MODULES
1️⃣ EVENT TRACKING SYSTEM
Internal API
trackEvent({ userId, communityId, eventType, entityType, entityId, metadata })Event Types
Requirements
2️⃣ RELATIONAL DATABASE DESIGN
Table: users
Table: communities
Table: activities (CORE)
Optional: partitions
3️⃣ INDEXING STRATEGY (CRITICAL)
Basic Indexes
Composite Indexes
JSON Index
4️⃣ EVENT INGESTION PIPELINE
Flow
Batch Insert
5️⃣ SCALING STRATEGY
Horizontal Scaling
DB Scaling
Queue Scaling
6️⃣ ANALYTICS SYSTEM
API Endpoints
Sample Queries
Daily Active Users
Event Count
7️⃣ SECURITY 🔐
Data Safety
Abuse Protection
Access Control
8️⃣ IDEMPOTENCY
Problem
duplicate events (retry, network)
Solution
Add:
DB Constraint
9️⃣ DATA RETENTION & ARCHIVING
Policy
🔟 OBSERVABILITY
Logging
Metrics
Alerts
1️⃣1️⃣ ERROR HANDLING
Strategy
Failure Cases
1️⃣2️⃣ TESTING
Unit
Integration
Load Testing
1️⃣3️⃣ EDGE CASES
⚙️ PERFORMANCE OPTIMIZATION
🌍 ENVIRONMENT
📦 FOLDER STRUCTURE
✅ ACCEPTANCE CRITERIA
✔ Events tracked across system
✔ Stored in relational DB
✔ Query time < 200ms
✔ Handles high traffic
✔ Partitioning implemented
✔ Analytics APIs working
✔ Secure & scalable
🔥 FINAL SUMMARY
This system behaves like: