Experiments Results

Post Inconsistency

Data Environment Settings 1

Tests 200 users sending 2000 posts in total concurrently and right after the completion using timeline service to retrieve the feed content under post and pull strategy respectively

Missing Rate

algorithms	Pull	Push	Hybrid
missing rate	38%	100%	80%

Missing Content

Pull

Push

Hybrid

Insights

When using Pull-based model, Pull mode calls PostRepository.GetPostByUserID(), which is basically a DynamoDB query on the user_id-index GSI to fetch all the posts. DynamoDB only guarantees eventual consistency on GSIs, so right after 2000 concurrent writes there’s a propagation lag before those items become visible on the GSI. We also noticed there are some jump in the order of the missing posts when retrieve the timeline, thats also prove that we’re merging whatever the index happened to return at that moment, not because the posts are not saved in the database

While using Push-based model, the Fan-out runs asynchronously through SNS/SQS, so the test hits /api/timeline before the background writes finish. When we’re creating thousands of posts at high concurrency, those goroutines queue up and SQS/Dynamo writes can take seconds. The test immediately checks the timeline window and sees only older entries, so every new content is reported missing.

This pattern is also confirmed in Setting 2, where we tested with 50 regular-user posts (processed via the push strategy under the hybrid mode) and 150 celebrity-user posts (processed via the pull strategy) that simulated the real life cases. This setting exhibited a missing rate between pull and push model and mainly caused by the regular users processed via the push strategy.

To sum up, the pull strategy supports more real-time retrieval because data is written to the database immediately. In contrast, the push strategy may introduce additional latency, as it relies on asynchronous processing and can involve queued updates.

Timeline Retrievel Time

Data Environment Settings

Tests 3 different users timeline get API response time: 10 following, 100 following, and 1600 following. Each following user has 10 posts.
Hybrid follower count threshold: 20000

Response time

algorithms	10 following	100 following	1600 following
push	45ms	43ms	48ms
pull	50ms	200ms	3200ms
hybrid	52ms	130ms	2200ms

Database Storage

Data Environment Settings

User Base: 5,000 users
Test Scenarios: Push, Pull, and Hybrid fan-out strategies
Database: Amazon DynamoDB

Storage Metrics by Strategy

Strategy	Post Items	Post Storage (MB)	Timeline Items	Timeline Storage (MB)	Total Storage (MB)
Push	0	0.00	3,686,351	812.71	812.71
Pull	46,317	6.21	0	0.00	6.21
Hybrid	1,303	0.18	3,477,569	760.96	761.14

Storage Cost Metrics by Strategy

Strategy	Post Cost ($/mo)	Timeline Cost ($/mo)	Total Cost ($/mo)	Annual Cost ($)
Push	$0.0000	$0.1984	$0.1984	$2.38
Pull	$0.0015	$0.0000	$0.0015	$0.02
Hybrid	$0.0001	$0.1858	$0.1859	$2.23

Average Item Size

Strategy	Post Avg (bytes)	Timeline Avg (bytes)
Push	0	231.17
Pull	140.69	0
Hybrid	141.00	229.45

Storage Efficiency per User (5K users)

Metric	Push	Pull	Hybrid
MB per user	0.163	0.001	0.152
Timeline items per user	737.27	0	695.51
Post items per user	0	9.26	0.26

Throughput, AutoScale test

Test Configuration

Users: 1500 concurrent
Spawn rate: 20 users/second
Duration: 20 minutes
Services: Post (1024MB), Timeline, User, Social Graph, Web
Social Graph: 1500 users, ~140k relationships (power-law)

PUSH (SNS fan-out)

Requests: ~392,025 (Aggregated)
Failures: 0 (0.00%)
Throughput: ~326.8 req/s (Aggregated)
Read (timeline): Median ~51ms, P95 ~7.8s, P99 ~11s
Write (post): Median ~34ms
Behavior: Writes are asynchronous, timelines precomputed; high reliability and stable performance.

PULL (on-demand aggregation)

Requests: ~143,283 (Aggregated)
Failures: ~98,641 (≈68.8% overall; Read ~61.37%, Write ~98.84%)
Throughput: ~129.3 req/s (Aggregated)
Read (timeline): Median ~7.6s, high tail latencies; frequent timeouts under load
Write (post): Median ~110ms, but extremely high failure rate
Behavior: Reads aggregate multiple sources on demand; degrades sharply with high followings and concurrency.

HYBRID (adaptive threshold)

Requests: ~157,736 (Aggregated)
Failures: ~30,401 (mostly writes; ≈95.76% write failure)
Throughput: ~131.6 req/s (Aggregated)
Read (timeline): Median ~11s, no read failures
Write (post): Median ~4.7s, high failure rate indicates write path falling back to PULL or unstable deployment
Behavior: Mixed results; current configuration likely routes many writes via PULL.

Why Performance Differs

PUSH:
- Precomputed timelines reduce read-time work; SNS fanout amortizes cost at write time.
- Low median read latency and zero failures; tail latency increases during bursts due to queue/backfill, but remains stable overall.
PULL:
- Read-time aggregation scales poorly with follower counts; N+1 queries and cross-service fetches drive timeouts.
- Read latency grows with the number of followings and concurrency; frequent ALB/backend timeouts cause high failure rates, especially at scale. Write latency looks fine, but user experience is dominated by slow, failure-prone reads.
HYBRID:
- Intended to route high-follower users via PULL and others via PUSH.
- High write failure rate indicates many writes still follow the PULL path (or fall back to it) and hit timeouts; reads show few failures but high median/tail latency

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiments Results

Post Inconsistency

Data Environment Settings 1

Missing Rate

Missing Content

Insights

Timeline Retrievel Time

Data Environment Settings

Response time

Database Storage

Data Environment Settings

Storage Metrics by Strategy

Storage Cost Metrics by Strategy

Average Item Size

Storage Efficiency per User (5K users)

Throughput, AutoScale test

Test Configuration

PUSH (SNS fan-out)

PULL (on-demand aggregation)

HYBRID (adaptive threshold)

Why Performance Differs

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Clone this wiki locally