System Design, but Simple

System Design, but Simple

Design Facebook's News Feed

Like you should in an interview. Explained as simply as possible… but not simpler.

Sep 01, 2025
∙ Paid
3
1
Share

In this issue, I walk through the exact thinking I’d use in a system design interview out loud, step by step. Clear, practical, and including trade-offs you can defend.

What you’ll learn in ~15 minutes

  • How I would scope the problem without missing important requirements.

  • Fan-Out Patterns - Push, pull, and hybrid models for feed generation

  • Hot Key/Hot Shard Problems - How viral content can overwhelm systems and solutions to distribute load

  • Global Secondary Indexes (GSI) - Database design patterns for supporting multiple query types efficiently

How this issue is structured
I split the write-up into the same sections I’d narrate at a whiteboard. Free readers get the full walkthrough up to the deep-dive parts. Paid members get the 🔒 sections.

  • Initial Thoughts & Clarifying Questions

  • Functional Requirements

  • Non-Functional Requirements

  • Back-of-the-envelope Estimations (QPS, storage, bandwidth, cardinality math)

  • 🔒 System Design (the architecture I’d draw and the excalidraw link for it!)

  • 🔒 Component Breakdown (why each piece exists + alternatives)

  • 🔒 Trade-offs Made

  • 🔒 Security & Privacy

  • 🔒 Monitoring, Logging, and Alerting

  • 🔒 Handling the Fan-Out Problem

Quick note: If you’ve been getting value from these and want the full deep dives, becoming a paid member helps me keep writing—and you’ll immediately unlock the 🔒 sections above, plus a few extras I lean on when I practice.

Members also get

  • 12 Back-of-the-Envelope Calculations Every Engineer Should Know

  • My Excalidraw System Design Template — drop-in canvas you can copy and tweak.

  • My System Design Component Library

Let’s get to it!


Initial Thoughts & Clarifying Questions

To begin, I'd want to understand the exact scope and constraints we're working with here. The newsfeed problem can go in many different directions, so let me ask some clarifying questions:

1. Are we building a Facebook-style bidirectional friendship system or a Twitter-style unidirectional follow system? Based on the context, I'm assuming we're building a unidirectional follow system where users can follow others without requiring mutual acceptance. This simplifies our data model and is closer to modern social platforms.

2. What type of content are we supporting in posts? I'll assume we're starting with text-based posts for now. Supporting multimedia would add complexity around CDN distribution and storage that we can address later if needed.

3. Are we implementing any feed ranking algorithms, or should this be purely chronological? For this design, I'm assuming we want a chronological feed ordered by post creation time. ML-based ranking systems would require a completely different architecture focused on feature extraction and model serving.

4. What's our target scale - how many users and posts per day are we expecting? Let me assume we're designing for a large-scale system with around 2 billion users globally. This will drive many of our architectural decisions.

5. Do we need real-time feed updates, or is some delay acceptable? I'm assuming that eventual consistency is acceptable - when someone posts, it doesn't need to appear instantly in all followers' feeds, but should appear within a reasonable timeframe, say 1-2 minutes.

6. Should we support feed pagination for infinite scroll? Yes, I'll assume we need cursor-based pagination to support mobile app infinite scroll experiences.

7. Are there any specific latency requirements for feed loading? I'll target sub-500ms response times for feed requests to ensure a responsive user experience.

Functional Requirements

From what I understand, the core requirements are:

  1. Create Posts: Users must be able to publish text-based posts to their timeline

  2. Follow Users: Users can follow other users unidirectionally (no mutual acceptance required)

  3. View Personalized Feed: Users can view a chronological feed of posts from accounts they follow

  4. Paginate Feed: Support infinite scroll through historical posts with cursor-based pagination

I'm keeping this focused on these core features. Additional functionality like likes, comments, or privacy settings would be great extensions, but I want to nail the fundamental newsfeed mechanics first. In a real interview, I'd mark those as "below the line" features we could discuss if time permits.

Non-Functional Requirements

I'd expect this system to handle several key non-functional requirements:

Consistency Model: We can leverage eventual consistency here. Users don't expect posts to appear instantaneously in their feed, but they should appear quickly - let's target within 1 minute for 99% of posts.

Latency: For a responsive user experience, I'd aim for sub-500ms latency for both posting new content and loading feeds. Users get impatient with slow social media experiences.

Scale: Given we're targeting Facebook-scale, I'm assuming around 2 billion registered users globally. This drives our need for horizontal scaling and distributed systems approaches.

Availability: Social media platforms need high availability - I'd target 99.9% uptime. Users expect the service to always be accessible.

Read-Heavy Workload: This will be an extremely read-heavy system. I'd estimate a 100:1 or even 1000:1 read-to-write ratio, as users consume far more content than they create.

Back-of-the-envelope Estimations

Let me work through some capacity planning numbers to size this system properly.

Starting with our user base: Let's say we have 2 billion registered users, with about 500 million daily active users. Of those DAU, maybe 50 million are actively posting each day.

Write Load Calculations:

  • 50M daily posters

  • Average 2 posts per active poster per day

  • Total: 100M posts per day

  • Posts per second: 100M / (24 * 3600) ≈ 1,200 posts/second

  • Peak load (assuming 3x average): ~3,600 posts/second

Read Load Calculations: Each user might check their feed 10 times per day on average:

  • 500M DAU × 10 feed requests = 5B feed requests per day

  • Feed requests per second: 5B / (24 * 3600) ≈ 58,000 requests/second

  • Peak load: ~175,000 feed requests/second

Storage Requirements:

  • Average post size: ~500 bytes (including metadata)

  • Daily posts: 100M × 500 bytes = 50GB per day

  • Annual storage: 50GB × 365 ≈ 18TB per year

  • With 5-year retention: ~90TB for post content

Follow Relationships:

  • Average follows per user: Let's say 200

  • Total follow relationships: 2B users × 200 = 400B relationships

  • Storage per relationship: ~20 bytes (follower_id, followed_id, timestamp)

  • Total follow storage: 400B × 20 bytes = 8TB

Feed Precomputation Storage: If we precompute feeds for faster access:

  • Store latest 200 posts per user feed

  • 2B users × 200 posts × 8 bytes (post_id) = 3.2TB

Bandwidth:

  • Average feed contains 25 posts

  • Each post + metadata ≈ 1KB when serialized

  • Feed response size: 25KB

  • Peak feed bandwidth: 175K requests/sec × 25KB = ~4.4 GB/second

These numbers tell me we definitely need a distributed, horizontally scalable architecture with caching layers.

🔒 System Design

I'd start with a simple design and then elaborate on the scaling challenges. Let me sketch out the high-level architecture:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Stephane Moreau
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture