Automatically adjusts the number of running instances or resources based on current demand.
In cloud-based and distributed systems, workloads can vary significantly. Auto scaling ensures you have enough resources during high load and saves costs during low usage, without manual intervention.
Use auto scaling to maintain performance and availability while optimizing resource usage and cost in dynamic environments like web apps, APIs, or batch jobs.
You need to know
Scaling can be horizontal or vertical: Horizontal adds/removes instances (e.g., servers); vertical changes the size of a single instance (e.g., more CPU/memory).
Trigger-Based Scaling: Scaling actions are based on defined metrics like CPU usage, request rate, or custom application metrics.
Cooldown and threshold settings matter: Proper configuration prevents flapping (scaling too frequently) or slow response to load spikes.
Popular technologies
AWS Auto Scaling - Manages EC2 instances, containers, and more based on load.
Kubernetes HPA (Horizontal Pod Autoscaler) - Scales pods based on CPU or custom metrics.
Google Cloud Autoscaler - Automatically scales VM instances in managed instance groups.
Like posts like this?
Every week, you'll get a new system design concept, broken down like this one.
Free subscribers also get a little bonus:
🎁 The System Design Interview Preparation Cheat Sheet
If you're into visuals, paid subscribers unlock:
→ My Excalidraw system design template – so you have somewhere to start
→ My Excalidraw component library – used in the diagram of this issue
No pressure though. Your support helps me keep writing, and I appreciate it more than you know ❤️