Request flow
- DNS: Maps domain name to IP address via a third-party service.
- IP Address: Client receives the IP (e.g., 15.125.23.214).
- HTTP Request: Client sends an HTTP request to the server.
- Server Response: Server returns HTML or JSON for rendering.
Horizontal scaling (“scale-out”) adds servers for better scalability and resilience, unlike vertical scaling, which has hardware limits and lacks failover.
In direct connections, if the web server goes offline or is overloaded, users can’t access the site or experience slowdowns. A load balancer solves this by distributing traffic across multiple servers.
Now users connect to the load balancer’s public IP, making web servers unreachable directly. For security, private IPs are used for server-to-server communication within the network, isolated from the internet. The load balancer uses these private IPs to route traffic to the web servers.
Quoted from Wikipedia: “Database replication can be used in many database management systems, usually with a master/slave relationship between the original (master) and the copies (slaves)”.
A master database handles write operations (insert, delete, update), while slave databases handle read operations. Since applications often need more reads than writes, there are usually more slave databases than master databases.
The cache tier is a fast, temporary data store that enhances system performance, reduces database workload, and can be scaled independently.
- When to Use: Use cache for frequently read, infrequently modified data. Cache isn’t suitable for persistent storage.
- Expiration Policy: Set appropriate expiration to avoid stale data or frequent reloads.
- Consistency: Ensure data consistency between the cache and the data store, which can be challenging.
- Failure Mitigation: Avoid single points of failure by using multiple cache servers and overprovisioning memory.
- Eviction Policy: Implement eviction policies like LRU, LFU, or FIFO when the cache is full.
A CDN (Content Delivery Network) is a group of geographically distributed servers that cache and deliver static content such as images, videos, CSS, and JavaScript files.
At a high level, a CDN works by delivering static content from the server closest to the user, improving load times. The further users are from a CDN server, the slower the website loads.
To scale the web tier horizontally, move state (e.g., user session data) out of the web tier. Best practice is to store session data in persistent storage like a relational database or NoSQL. This allows all web servers in the cluster to access the state data, creating a stateless web tier.
A stateful server retains client data (state) between requests, while a stateless server does not store any state information between client interactions.
By moving session data out of the web tier and into a shared persistent data store (e.g., relational database, Memcached/Redis, NoSQL), we enable better scaling. NoSQL is preferred for easy scalability. After the state data is removed out of web servers, auto-scaling of the web tier is easily achieved by adding or removing servers based on traffic load.
To achieve a multi-data center setup, several technical challenges must be addressed:
- Traffic Redirection: Tools like GeoDNS direct traffic to the nearest data center based on user location.
- Data Synchronization: Replicating data across multiple data centers helps ensure availability.
- Test and Deployment: Testing at different locations and using automated deployment tools ensures consistent services across data centers.
Horizontal scaling, or sharding, involves adding more servers, where each shard has the same schema but holds unique data. The critical factor in sharding is selecting the sharding key (or partition key). This key determines data distribution, and it is crucial to choose one that ensures even distribution across shards.
- Resharding Data: Necessary when a shard becomes full due to rapid growth or uneven data distribution. This involves updating the sharding function and moving data. Consistent hashing is a common solution.
- Celebrity Problem: Also known as the hotspot key problem, where excessive access to a shard causes overload. A possible solution is allocating a shard for each high-traffic entity or further partitioning.
- Join and De-normalization: Sharding makes join operations across shards difficult. A common workaround is database de-normalization to enable single-table queries.
To scale our system to support millions of users, we implement the following strategies:
- Keep the web tier stateless
- Build redundancy at every tier
- Cache data extensively
- Support multiple data centers
- Host static assets in a CDN
- Scale the data tier using sharding
- Split tiers into individual services
- Monitor the system and leverage automation tools
https://bytebytego.com/courses/system-design-interview/scale-from-zero-to-millions-of-users
ChatGPT 4