Design Dropbox/Google Drive
Like you should in an interview. Explained as simply as possible… but not simpler.
In this issue, I walk through the exact thinking I’d use in a system design interview out loud, step by step. Clear, practical, and including trade-offs you can defend.
What you’ll learn in ~15 minutes
How I would scope the problem without missing important requirements (custom aliases, expirations, availability).
The permission inheritance problem that kills performance - and the elegant stream processing solution that makes it O(1)
Why Dropbox chunks files - plus the upload trick that saves millions in bandwidth costs
The conflict resolution strategy that prevents data loss - when two people edit the same file simultaneously
How this issue is structured
I split the write-up into the same sections I’d narrate at a whiteboard. Free readers get the full walkthrough up to the deep-dive parts. Paid members get the 🔒 sections.
Initial Thoughts & Clarifying Questions
Functional Requirements
Non-Functional Requirements
Back-of-the-envelope Estimations (QPS, storage, bandwidth, cardinality math)
🔒 System Design (the architecture I’d draw and the excalidraw link for it!)
🔒 Component Breakdown (why each piece exists + alternatives)
🔒 Trade-offs Made
🔒 Security & Privacy
🔒 Monitoring, Logging, and Alerting
🔒 Final Thoughts
Quick note: If you’ve been getting value from these and want the full deep dives, becoming a paid member helps me keep writing—and you’ll immediately unlock the 🔒 sections above, plus a few extras I lean on when I practice.
Members also get
12 Back-of-the-Envelope Calculations Every Engineer Should Know
My Excalidraw System Design Template — drop-in canvas you can copy and tweak.
My System Design Component Library
Let’s get to it!
Initial Thoughts & Clarifying Questions
To begin, I'd want to understand the specific requirements and constraints we're working with. Let me ask a few clarifying questions to make sure I'm designing the right system:
1. What's the expected scale in terms of users and files? Based on the context, I'm assuming we're looking at something like Dropbox's scale - around 1 billion file operations daily, which is roughly 12,000 operations per second. I'd estimate we're supporting millions of active users with potentially billions of files stored.
2. What types of files are we supporting? I'm assuming we need to handle any binary file type - documents, images, videos, executables - essentially treating everything as large byte arrays. This means we can't make assumptions about file structure for conflict resolution like we might with text files.
3. How many users can share a single file? I'll assume a limit of 500 users per file for real-time sync, similar to Dropbox's model. Beyond that, we'd provide download links rather than proactive syncing.
4. What's our consistency model for file conflicts? I'm assuming we need to handle concurrent edits by creating separate conflict versions rather than trying to merge them automatically, since we're dealing with arbitrary binary files.
5. How much version history do we need to maintain? I'll assume 30 days of file history, which seems reasonable for most use cases while balancing storage costs.
6. What are the performance expectations? I'd expect sub-second upload/download times for typical files (under 10MB), with reasonable progress indicators for larger files. File sync notifications should reach users within a few seconds of changes.
7. What's our geographic distribution requirement? I'm assuming global distribution with users expecting reasonable performance regardless of location, which means we'll need CDN support and potentially regional data centers.
Functional Requirements
From what I understand, the core requirements are:
File Upload and Storage - Users must be able to upload files of any type and size (within reasonable limits)
File Sharing and Permissions - Users can share files with others and control read/write permissions at both file and folder levels
Real-time Synchronisation - Changes to shared files sync automatically to all authorised users' devices
Version History - The system maintains 30 days of file versions for rollback capabilities
Conflict Resolution - When users edit files concurrently, the system creates separate conflict versions rather than attempting automatic merging
Folder Organisation - Users can organise files in folders with inherited permissions
Cross-platform Client Support - Desktop, mobile, and web clients that maintain local file caches
Non-Functional Requirements
I'd expect this system to handle some serious scale and performance requirements:
Scale: 1 billion daily file operations (12K ops/sec), 200 petabytes of new data daily, millions of concurrent users globally
Performance: Sub-100ms for metadata operations, sub-second for small file uploads (<1MB), reasonable progress for large files with chunked uploads/downloads
Availability: 99.9% uptime minimum - people rely on this for work, so downtime is costly
Consistency: Strong consistency for file metadata and permissions to prevent security issues, and eventual consistency is acceptable for file content delivery
Storage: Efficient deduplication since many users likely store similar files, with cost-effective cold storage for older versions
Security: Enterprise-grade encryption, fine-grained access controls, audit logging for compliance
Back-of-the-envelope Estimations
Let's say we have 100 million daily active users, with each user performing about 10 file operations per day on average. That gives us our 1 billion daily operations, or roughly 12,000 operations per second at peak.
Read/Write Ratio: I'd estimate this is heavily read-skewed, maybe 80% reads to 20% writes. File downloads and sync checks happen much more frequently than uploads.
Storage Requirements:
If we're seeing 200 petabytes of new data daily with 1 billion operations, and assuming 20% are actual uploads (200 million), that's an average file size of 1MB
However, with chunking and deduplication, I'd estimate we actually store maybe 60% of that raw data
So roughly 120 petabytes of new storage daily, or about 44 exabytes annually
With 30-day version history, we're looking at maintaining around 3.6 exabytes of active storage
QPS:
Peak read QPS: ~9,600 (metadata lookups, file downloads, sync checks)
Peak write QPS: ~2,400 (file uploads, metadata updates)
Database queries will be higher due to chunk operations - maybe 5x these numbers
Bandwidth:
Assuming an average file size of 1MB and a 20% cache hit rate on CDN
Download bandwidth: ~8 Gbps for cache misses to origin
Upload bandwidth: ~2 Gbps to object storage
Memory Requirements:
Metadata caching: ~100GB per application server (file metadata, permissions, chunk mappings)
CDN caching: Potentially petabytes distributed globally
Database memory: ~1TB per shard for frequently accessed metadata
🔒 System Design
Here’s the system design I am thinking of: