Scale From Zero To Millions Of Users

Single server setup

Request flow

DNS: Maps domain name to IP address via a third-party service.
IP Address: Client receives the IP (e.g., 15.125.23.214).
HTTP Request: Client sends an HTTP request to the server.
Server Response: Server returns HTML or JSON for rendering.

Horizontal scaling

Horizontal scaling (“scale-out”) adds servers for better scalability and resilience, unlike vertical scaling, which has hardware limits and lacks failover.

Load balancer

In direct connections, if the web server goes offline or is overloaded, users can’t access the site or experience slowdowns. A load balancer solves this by distributing traffic across multiple servers.

Now users connect to the load balancer’s public IP, making web servers unreachable directly. For security, private IPs are used for server-to-server communication within the network, isolated from the internet. The load balancer uses these private IPs to route traffic to the web servers.

Database replication

Quoted from Wikipedia: “Database replication can be used in many database management systems, usually with a master/slave relationship between the original (master) and the copies (slaves)”.

A master database handles write operations (insert, delete, update), while slave databases handle read operations. Since applications often need more reads than writes, there are usually more slave databases than master databases.

Cache

The cache tier is a fast, temporary data store that enhances system performance, reduces database workload, and can be scaled independently.

When to Use: Use cache for frequently read, infrequently modified data. Cache isn’t suitable for persistent storage.
Expiration Policy: Set appropriate expiration to avoid stale data or frequent reloads.
Consistency: Ensure data consistency between the cache and the data store, which can be challenging.
Failure Mitigation: Avoid single points of failure by using multiple cache servers and overprovisioning memory.
Eviction Policy: Implement eviction policies like LRU, LFU, or FIFO when the cache is full.

Content delivery network (CDN)

A CDN (Content Delivery Network) is a group of geographically distributed servers that cache and deliver static content such as images, videos, CSS, and JavaScript files.

At a high level, a CDN works by delivering static content from the server closest to the user, improving load times. The further users are from a CDN server, the slower the website loads.

Stateless web tier

To scale the web tier horizontally, move state (e.g., user session data) out of the web tier. Best practice is to store session data in persistent storage like a relational database or NoSQL. This allows all web servers in the cluster to access the state data, creating a stateless web tier.

A stateful server retains client data (state) between requests, while a stateless server does not store any state information between client interactions.

By moving session data out of the web tier and into a shared persistent data store (e.g., relational database, Memcached/Redis, NoSQL), we enable better scaling. NoSQL is preferred for easy scalability. After the state data is removed out of web servers, auto-scaling of the web tier is easily achieved by adding or removing servers based on traffic load.

Data centers

To achieve a multi-data center setup, several technical challenges must be addressed:

Traffic Redirection: Tools like GeoDNS direct traffic to the nearest data center based on user location.
Data Synchronization: Replicating data across multiple data centers helps ensure availability.
Test and Deployment: Testing at different locations and using automated deployment tools ensures consistent services across data centers.

Database scaling

Horizontal scaling

Horizontal scaling, or sharding, involves adding more servers, where each shard has the same schema but holds unique data. The critical factor in sharding is selecting the sharding key (or partition key). This key determines data distribution, and it is crucial to choose one that ensures even distribution across shards.

Resharding Data: Necessary when a shard becomes full due to rapid growth or uneven data distribution. This involves updating the sharding function and moving data. Consistent hashing is a common solution.
Celebrity Problem: Also known as the hotspot key problem, where excessive access to a shard causes overload. A possible solution is allocating a shard for each high-traffic entity or further partitioning.
Join and De-normalization: Sharding makes join operations across shards difficult. A common workaround is database de-normalization to enable single-table queries.

Conclusion

To scale our system to support millions of users, we implement the following strategies:

Keep the web tier stateless
Build redundancy at every tier
Cache data extensively
Support multiple data centers
Host static assets in a CDN
Scale the data tier using sharding
Split tiers into individual services
Monitor the system and leverage automation tools

References

https://bytebytego.com/courses/system-design-interview/scale-from-zero-to-millions-of-users

ChatGPT 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

71. SDI - Scale.md

71. SDI - Scale.md

Scale From Zero To Millions Of Users

Single server setup

Horizontal scaling

Load balancer

Database replication

Cache

Content delivery network (CDN)

Stateless web tier

Data centers

Database scaling

Horizontal scaling

Conclusion

References

Files

71. SDI - Scale.md

Latest commit

History

71. SDI - Scale.md

File metadata and controls

Scale From Zero To Millions Of Users

Single server setup

Horizontal scaling

Load balancer

Database replication

Cache

Content delivery network (CDN)

Stateless web tier

Data centers

Database scaling

Horizontal scaling

Conclusion

References