Design Twitter (X)

Like you should in an interview. Explained as simply as possible… but not simpler.

Aug 25, 2025

∙ Paid

In this issue, I walk through the exact thinking I’d use in a system design interview out loud, step by step. Clear, practical, and including trade-offs you can defend.

What you’ll learn in ~15 minutes

How I would scope the problem without missing important requirements (timeline, celebrity accounts, media handling).
Why hybrid fan-out strategies are essential at scale - and exactly when to use fan-out on write vs. fan-out on read based on follower count thresholds.
The trade-off between eventual consistency and performance that makes sub-100ms timeline loads possible at Twitter's scale.

How this issue is structured
I split the write-up into the same sections I’d narrate at a whiteboard. Free readers get the full walkthrough up to the deep-dive parts. Paid members get the 🔒 sections.

Initial Thoughts & Clarifying Questions
Functional Requirements
Non-Functional Requirements
Back-of-the-envelope Estimations (QPS, storage, bandwidth, cardinality math)
🔒 System Design (the architecture I’d draw and the excalidraw link for it!)
🔒 Component Breakdown (why each piece exists + alternatives)
🔒 Trade-offs Made
🔒 Security & Privacy
🔒 Monitoring, Logging, and Alerting

Quick note: If you’ve been getting value from these and want the full deep dives, becoming a paid member helps me keep writing—and you’ll immediately unlock the 🔒 sections above, plus a few extras I lean on when I practice.

Members also get

12 Back-of-the-Envelope Calculations Every Engineer Should Know
My Excalidraw System Design Template — drop-in canvas you can copy and tweak.
My System Design Component Library

Let’s get to it!

Initial Thoughts & Clarifying Questions

To begin, I'd want to understand the exact scope and constraints of what we're building here. Let me think through the key questions I'd ask the interviewer to make sure we're aligned on requirements.

1. What's the expected scale of this system? My assumption here is we're designing for hundreds of millions of daily active users, similar to Twitter's actual scale. This means we need to handle potentially 500M+ DAU with billions of tweets being read daily. I'm thinking this because in a system design interview context, this question is likely testing whether I can design for massive scale.

2. What are the core features we need to support? From what I understand, we need the basics: creating tweets, following users, timeline generation, likes/retweets, and replies. I'm assuming we don't need to implement the full Twitter feature set like DMs, Spaces, or the ad network - that would be impossible in 45 minutes. I will focus on the core functionality.

3. What are the latency requirements for timeline loading? I'd expect sub-100ms latency for timeline loads since this is the primary user interaction. Users won't tolerate waiting several seconds for their feed to load. For tweet creation, we can probably tolerate slightly higher latency, maybe 200-300ms, since it's less frequent.

4. How fresh does the timeline need to be? My assumption is near real-time freshness - when someone I follow tweets, it should appear in my timeline within seconds, not minutes. This will heavily influence our timeline generation strategy.

5. What's the read/write ratio we're expecting? Based on typical social media patterns, I'm assuming something like 100:1 or even 1000:1 read-to-write ratio. Most users consume far more content than they create, which means we need to heavily optimize for reads.

6. How should we handle celebrity accounts with millions of followers? I'm assuming we have users like Elon Musk with 100M+ followers. Their tweets create massive fan-out problems that we'll need special handling for.

7. Do we need to support media attachments? I'll assume yes - images and videos are core to Twitter. This means we need to think about blob storage and CDN distribution from the start.

Functional Requirements

From what I understand, the core requirements are:

User account management - Users need to create accounts, log in, and manage their profiles. This is foundational because everything else ties back to user identity.
Tweet operations - Users must be able to create tweets (up to 280 characters), edit them, and delete them. This is our primary content creation mechanism.
Social graph management - Users need to follow/unfollow others. This is essential because it determines what appears in timelines and how information flows through the network.
Timeline generation - Users should see a feed of tweets from people they follow, sorted chronologically. This is probably the most complex requirement technically and where users spend most of their time.
Engagement features - Likes, retweets, and replies. These are critical for user engagement and viral content spread. Without these, it's just a broadcasting platform, not a social network.
Search functionality - Users need to search for tweets, hashtags, and other users. This helps with content discovery beyond just the timeline.

Non-Functional Requirements

I'd expect this system to handle some pretty intense requirements:

Scale & Performance: We'd likely need to support 500M+ daily active users, with peak traffic potentially 2-3x the average. I'm thinking we could see 1 million tweets created per minute during major events (like the Super Bowl or breaking news). The system needs to handle this without degrading.

Latency targets:

Timeline loads: < 100ms at P99
Tweet creation: < 200ms at P99
Search queries: < 150ms at P99

The reasoning here is that social media is highly interactive - users are constantly scrolling, refreshing, engaging. Any noticeable lag will damage user experience.

Availability: We need 99.99% uptime, which translates to less than 1 hour of downtime per year. Social media has become critical infrastructure for news, emergency communications, and public discourse. Being down during a major event would be catastrophic.

Consistency: I'd lean toward eventual consistency for most operations. It's okay if a tweet takes a few seconds to appear in all followers' timelines. However, we need strong consistency for critical operations like following/unfollowing to prevent weird states.

Data durability: Zero data loss for tweets. Once a tweet is acknowledged as created, it must never be lost. This means multi-region replication and robust backup strategies.

Back-of-the-envelope Estimations

Let's say we have 500M daily active users. If each user performs an average of 2 tweets per day (most users are lurkers, some are power users), that gives us 1 billion tweets created daily.

Write QPS for tweets: 1 billion tweets / 86,400 seconds = ~11,500 tweets/second on average Peak could be 3x average = ~35,000 tweets/second

Read QPS: If our read/write ratio is 1000:1, we're looking at: 11.5M reads/second on average 35M reads/second at peak

Actually, let me reconsider that - that seems extremely high. Let's think about timeline refreshes instead. If each user refreshes their timeline 20 times per day: 500M users × 20 refreshes = 10 billion timeline loads daily 10B / 86,400 = ~115,000 timeline loads/second average Peak: ~350,000 timeline loads/second

Storage requirements: Each tweet: ~280 bytes text + ~500 bytes metadata = ~800 bytes Media attachment: ~200KB average (mix of images/videos) If 20% of tweets have media: 1B tweets × 0.2 × 200KB = 40TB of media daily

Text storage: 1B tweets × 800 bytes = 800GB daily Over 3 years:

Text: ~1PB
Media: ~45PB

Bandwidth estimates: Timeline API response: ~50KB (containing ~50 tweets) 350K requests/second × 50KB = 17.5GB/second at peak That's 140Gbps - we'll definitely need multiple data centers and CDN distribution.

Memory requirements for caching: If we cache the most recent 20% of tweets (hot data): 200M tweets × 800 bytes = 160GB for tweet cache For timeline caches (assuming we pre-compute for active users): 100M active users × 50KB timeline = 5TB of timeline cache

We'll need a distributed caching layer, probably hundreds of cache servers.

🔒 System Design

I'd start with a simple design and then elaborate on each component. Let me draw this out conceptually...