System Design, but Simple

System Design, but Simple

Design Instagram

Like you should in an interview. Explained as simply as possible… but not simpler.

Stephane Moreau's avatar
Stephane Moreau
Oct 06, 2025
∙ Paid
3
2
Share

In this issue, I walk through the exact thinking I’d use in a system design interview out loud, step by step. Clear, practical, and including trade-offs you can defend.

What you’ll learn in ~15 minutes

  • How I would scope the problem without missing important requirements (custom aliases, expirations, availability).

  • Why the “obvious” feed generation approach fails at scale and how Instagram’s hybrid fanout strategy solves the celebrity problem while keeping 99% of users happy.

  • The media upload architecture that handles 4GB videos by never touching your application servers.

How this issue is structured
I split the write-up into the same sections I’d narrate at a whiteboard. Free readers get the full walkthrough up to the deep-dive parts. Paid members get the 🔒 sections.

  • Initial Thoughts & Clarifying Questions

  • Functional Requirements

  • Non-Functional Requirements

  • Back-of-the-envelope Estimations

  • 🔒 System Design (the architecture I’d draw and the excalidraw link for it!)

  • 🔒 Component Breakdown (why each piece exists + alternatives)

  • 🔒 Trade-offs Made

  • 🔒 Security & Privacy

  • 🔒 Monitoring, Logging, and Alerting

  • 🔒 Final Thoughts

Quick note: If you’ve been getting value from these and want the full deep dives, becoming a paid member helps me keep writing—and you’ll immediately unlock the 🔒 sections above, plus a few extras I lean on when I practice.

Members also get

  • 12 Back-of-the-Envelope Calculations Every Engineer Should Know

  • My Excalidraw System Design Template — drop-in canvas you can copy and tweak.

  • My System Design Component Library

Let’s get to it!


Initial Thoughts & Clarifying Questions

To begin, I’d want to understand the scope and scale we’re targeting before diving into the technical design. Let me ask some key questions:

What’s our scale target? I’m assuming we’re looking at around 500M daily active users with roughly 100M posts created per day. This gives us a sense of the read-heavy nature of the system.

What types of media are we supporting? I’ll assume we need to handle both photos (up to 8MB) and videos (up to 4GB).

What’s our latency expectation? For a social media platform, I’d expect feed loading to be under 500ms end-to-end, and media should render almost instantly once requested.

Are we building a global service? I’ll assume yes, which means we need to think about geographic distribution and CDN placement.

What’s our consistency model? For social media, I’m assuming eventual consistency is acceptable - it’s fine if a new post takes up to 2 minutes to appear in all followers’ feeds.

How many people does an average user follow? This is crucial for feed generation strategy. I’ll assume most users follow 100-500 accounts, but we need to handle edge cases of users following 10,000+ accounts.

What’s our read-to-write ratio? Social platforms are heavily read-skewed. I’d estimate each user views 100+ posts for every 1 post they create, which will heavily influence our caching and data distribution strategies.

Functional Requirements

From what I understand, the core requirements are:

Post Creation: Users must be able to upload photos and videos with captions. This needs to handle large files efficiently and provide immediate feedback to users about upload progress.

Social Graph Management: Users need to follow and unfollow other users. This creates a unidirectional relationship that drives our entire feed generation strategy.

Feed Generation: Users should see a chronological timeline of posts from accounts they follow. This is the most technically challenging requirement given our scale.

These three requirements form the foundation - everything else we might add (likes, comments, stories) would build on this core architecture.

Non-Functional Requirements

I’d expect this system to handle several critical non-functional requirements:

High Availability: We need 99.9% uptime minimum. Users expect Instagram to always be available, and downtime directly impacts user engagement and revenue.

Low Latency: Feed requests should complete under 500ms, and media should start rendering immediately. Users will abandon the app if it feels slow.

Massive Scale: Supporting 500M daily active users means handling potentially 100,000+ requests per second during peak hours. Our architecture needs to scale horizontally.

Global Distribution: Users worldwide should have similar performance, requiring strategic CDN placement and potentially regional data centers.

Eventual Consistency: It’s acceptable for new posts to take 1-2 minutes to appear in all followers’ feeds, allowing us to optimize for performance over strict consistency.

Back-of-the-envelope Estimations

Let’s work through the numbers to understand what we’re dealing with:

User Activity: With 500M daily active users, if each user checks their feed 10 times per day, that’s 5 billion feed requests daily. During peak hours (let’s say 8 hours), that’s roughly 175,000 feed requests per second.

Content Creation: 100M posts per day means about 1,200 posts per second on average, with peaks potentially 3-5x higher during major events or peak social hours.

Storage Requirements:

  • If average media size is 2MB per post: 100M posts × 2MB = 200TB of new media daily

  • Metadata per post (user ID, timestamp, caption, media reference): roughly 1KB × 100M = 100GB daily

  • Over 10 years: ~750PB of media storage needed

Read/Write Patterns:

  • Read QPS: ~175,000 feed requests/second + media requests (potentially 10x higher)

  • Write QPS: ~1,200 posts/second + follow operations

  • This confirms we’re dealing with a heavily read-skewed system

Memory Requirements: For feed caching, if we cache the last 100 posts for each active user’s feed (assuming 1KB per post summary), that’s 500M users × 100 posts × 1KB = 50TB of feed cache. This is manageable with distributed caching.

Bandwidth: During peak viewing, if 100M users are simultaneously viewing media averaging 2MB each, that’s 200TB/hour of bandwidth. This absolutely requires CDN distribution.

🔒 System Design

This is the design I am thinking of:

This post is for paid subscribers

Already a paid subscriber? Sign in
© 2025 Stephane Moreau
Privacy ∙ Terms ∙ Collection notice
Start writingGet the app
Substack is the home for great culture