-
Notifications
You must be signed in to change notification settings - Fork 0
Experiments Results
Tests 200 users sending 2000 posts in total concurrently and right after the completion using timeline service to retrieve the feed content under post and pull strategy respectively
| algorithms | Pull | Push | Hybrid |
|---|---|---|---|
| missing rate | 38% | 100% | 80% |
- Pull
- Push
- Hybrid
When using Pull-based model, Pull mode calls PostRepository.GetPostByUserID(), which is basically a DynamoDB query on the user_id-index GSI to fetch all the posts. DynamoDB only guarantees eventual consistency on GSIs, so right after 2000 concurrent writes there’s a propagation lag before those items become visible on the GSI. We also noticed there are some jump in the order of the missing posts when retrieve the timeline, thats also prove that we’re merging whatever the index happened to return at that moment, not because the posts are not saved in the database
While using Push-based model, the Fan-out runs asynchronously through SNS/SQS, so the test hits /api/timeline before the background writes finish. When we’re creating thousands of posts at high concurrency, those goroutines queue up and SQS/Dynamo writes can take seconds. The test immediately checks the timeline window and sees only older entries, so every new content is reported missing.
This pattern is also confirmed in Setting 2, where we tested with 50 regular-user posts (processed via the push strategy under the hybrid mode) and 150 celebrity-user posts (processed via the pull strategy) that simulated the real life cases. This setting exhibited a missing rate between pull and push model and mainly caused by the regular users processed via the push strategy.
To sum up, the pull strategy supports more real-time retrieval because data is written to the database immediately. In contrast, the push strategy may introduce additional latency, as it relies on asynchronous processing and can involve queued updates.
- Tests 3 different users timeline get API response time: 10 following, 100 following, and 1600 following. Each following user has 10 posts.
- Hybrid follower count threshold: 20000
| algorithms | 10 following | 100 following | 1600 following |
|---|---|---|---|
| push | 45ms | 43ms | 48ms |
| pull | 50ms | 200ms | 3200ms |
| hybrid | 52ms | 130ms | 2200ms |
User Base: 5,000 users
Test Scenarios: Push, Pull, and Hybrid fan-out strategies
Database: Amazon DynamoDB
| Strategy | Post Items | Post Storage (MB) | Timeline Items | Timeline Storage (MB) | Total Storage (MB) |
|---|---|---|---|---|---|
| Push | 0 | 0.00 | 3,686,351 | 812.71 | 812.71 |
| Pull | 46,317 | 6.21 | 0 | 0.00 | 6.21 |
| Hybrid | 1,303 | 0.18 | 3,477,569 | 760.96 | 761.14 |
| Strategy | Post Cost ($/mo) | Timeline Cost ($/mo) | Total Cost ($/mo) | Annual Cost ($) |
|---|---|---|---|---|
| Push | $0.0000 | $0.1984 | $0.1984 | $2.38 |
| Pull | $0.0015 | $0.0000 | $0.0015 | $0.02 |
| Hybrid | $0.0001 | $0.1858 | $0.1859 | $2.23 |
| Strategy | Post Avg (bytes) | Timeline Avg (bytes) |
|---|---|---|
| Push | 0 | 231.17 |
| Pull | 140.69 | 0 |
| Hybrid | 141.00 | 229.45 |
| Metric | Push | Pull | Hybrid |
|---|---|---|---|
| MB per user | 0.163 | 0.001 | 0.152 |
| Timeline items per user | 737.27 | 0 | 695.51 |
| Post items per user | 0 | 9.26 | 0.26 |
- Users: 1500 concurrent
- Spawn rate: 20 users/second
- Duration: 20 minutes
- Services: Post (1024MB), Timeline, User, Social Graph, Web
- Social Graph: 1500 users, ~140k relationships (power-law)
- Requests: ~392,025 (Aggregated)
- Failures: 0 (0.00%)
- Throughput: ~326.8 req/s (Aggregated)
- Read (timeline): Median ~51ms, P95 ~7.8s, P99 ~11s
- Write (post): Median ~34ms
- Behavior: Writes are asynchronous, timelines precomputed; high reliability and stable performance.
- Requests: ~143,283 (Aggregated)
- Failures: ~98,641 (≈68.8% overall; Read ~61.37%, Write ~98.84%)
- Throughput: ~129.3 req/s (Aggregated)
- Read (timeline): Median ~7.6s, high tail latencies; frequent timeouts under load
- Write (post): Median ~110ms, but extremely high failure rate
- Behavior: Reads aggregate multiple sources on demand; degrades sharply with high followings and concurrency.
- Requests: ~157,736 (Aggregated)
- Failures: ~30,401 (mostly writes; ≈95.76% write failure)
- Throughput: ~131.6 req/s (Aggregated)
- Read (timeline): Median ~11s, no read failures
- Write (post): Median ~4.7s, high failure rate indicates write path falling back to PULL or unstable deployment
- Behavior: Mixed results; current configuration likely routes many writes via PULL.
-
PUSH:
- Precomputed timelines reduce read-time work; SNS fanout amortizes cost at write time.
- Low median read latency and zero failures; tail latency increases during bursts due to queue/backfill, but remains stable overall.
-
PULL:
- Read-time aggregation scales poorly with follower counts; N+1 queries and cross-service fetches drive timeouts.
- Read latency grows with the number of followings and concurrency; frequent ALB/backend timeouts cause high failure rates, especially at scale. Write latency looks fine, but user experience is dominated by slow, failure-prone reads.
-
HYBRID:
- Intended to route high-follower users via PULL and others via PUSH.
- High write failure rate indicates many writes still follow the PULL path (or fall back to it) and hit timeouts; reads show few failures but high median/tail latency