Media content today is increasingly streamed video, and this trend will only grow as the speed of consumer internet and video quality improve.
The video delivery market is segmented into two parts: live and on-demand video streaming. On-demand video is video content that has been produced and transcoded in its entirety prior to delivery, and it may be watched or rewatched at any point in time. Movies and television programs are good examples of on-demand video streams. Live video streaming comes direct from a source, transcoded and delivered to consumers in real time. A good example of live video streaming is a sporting event.
The segments, or chunks, of the video are created by an encoding process. Video encoding is the process of slicing and compressing raw video into segments via a codec that can be decoded by players. Encoders can produce multiple versions of each segment at differing bitrates. The encoder also produces a manifest file that provides metadata about the video, supported bitrates, encodings, and where to find segments.
Streaming video clients will first download a manifest file that describes video metadata, including data about segments available for download. For on-demand video, the manifest contains data about all segments; for live video, the manifest is regularly updated by the live encoder and downloaded by the player.
A video streaming player will typically require three segments before allowing playback of the video. After playback starts, the player will continue to buffer ahead to ensure that playback is not interrupted by waiting for a segment download. Players will allow the user to select a specific quality setting or use an adaptive bitrate (ABR) algorithm, which will automatically select the best bitrate that is capable of being buffered ahead for smooth streaming, given the client’s bandwidth and latency from the server.
To provide end users with the best streaming experience, video is typically delivered through a CDN.
A content delivery network, or content distribution network (CDN), is a geographically distributed network of proxy servers and their data centers.
CDNs provide many locations that act as content caches. The locations (called points of presence, or PoPs) are distributed around the globe to be close to end users, to keep latency down. The PoPs cache recently requested content; they also fetch content that is not cached locally and store it for a specified period.
Client players are directed to retrieve video from a CDN.
The client request uses geographic or latency-based DNS routing to direct the client to the nearest PoP. The server to which the request is directed will either have the content cached or not. Depending on how the CDN operates, the request may be directed to other servers. Once the content is located, it will be cached by servers along the request’s path to improve performance the next time the asset is requested. Finally, the content is delivered to the client.
Currently, the emerging video streaming protocols, Apple HTTP Live Streaming (HLS) and MPEG-Dynamic Adaptive Streaming over HTTP (MPEG-DASH), follow a similar delivery premise: small segments of the video are downloaded via ** HTTP(S)**. The use of HTTP(S) enables us to take advantage of HTTP/2 and content delivery networks for optimization.
Contents below are from https://www.educative.io/courses/grokking-the-system-design-interview/xV26VjZ7yMl .
Requirements |
Functional:
|
Capacity Estimation | … |
System APIs | uploadVideo(api_dev_key, video_title, video_description, tags[], category_id, default_language, recording_details, video_contents): String
searchVideo(api_dev_key, search_query, user_location, maximum_videos_to_return, page_token): JSON streamVideo(api_dev_key, video_id, offset, codec , resolution): Stream *We should send the codec and resolution info in the API from the client to support play/pause from multiple devices. Imagine you are watching a video on your TV’s Netflix app, paused it, and started watching it on your phone’s Netflix app. In this case, you would need codec and resolution, as both these devices have a different resolution and use a different codec. |
High Level Design | Components:
|
Database Schema | … |
Detailed Component Design | Videos can be stored in a distributed file storage system like HDFS.
Heavy read:
Supports resuming uploading when connection losts. Video encoding: Newly uploaded videos are stored on the server, and a new task is added to the processing queue to encode the video into multiple formats. Once all the encoding is completed, the uploader will be notified, and the video is made available for view/sharing. |
https://learning.oreilly.com/library/view/optimize-video-streaming/9781098111649/
https://en.wikipedia.org/wiki/Content_delivery_network
https://bitmovin.com/what-is-transcoding/
https://www.educative.io/courses/grokking-the-system-design-interview/xV26VjZ7yMl