-
Notifications
You must be signed in to change notification settings - Fork 21
Twitter Use Case
This is an informal description of how a Twitter-like service would operate on Swarm.
Twitter is an interesting use-case because it requires that disparate data be pulled together and processed quickly and at scale. It should allow a smart clustering algorithm to demonstrate its value in a way that conventional approaches to arranging data would not, at least not without considerable design effort.
- HTTP Request inbound A HTTP request is received by a load-balancer like AWS ELB, it is then redirected to a Swarm node. While any Swarm node could handle a HTTP request, to facilitate specialization it may be beneficial to limit the pool to a subset of the swarm. We will label this [A]. Note that the HTTP objects are wrapped in a Swarm reference.
- Node [A] looks up the user, this requires accessing user-specific data stored on a new Node, we'll label it [B], so the continuation jumps to this node. [B] validates the user's credentials, and then retrieves a list of the users this user is following.
- We create a CopyOnWriteArrayList called tweets that will store the recent tweets of the users this user is following.
- We then employ a Swarm-enabled parallel foreach on the following list which maps each of these users to a list of these user's recent tweets.
- This parallel foreach spawns a separate thread for each user. Each of these threads then attempts to retrieve that user's information and tweets. Each of these threads will then jump to the machines [C1]-[Cn] containing the various tweets of these users.
- Note: a foreach writing to a previously created list isn't the most elegant approach, map followed by a reduce would be better and have largely the same effect, but this is easier to describe
- These various threads all then attempt to add the tweets they've collected to the tweets list, which will cause all of these threads to return to node [B]
- The parallel foreach will now return, and the tweets will be rendered into a suitable HTML page.
- We will now attempt to write the HTML page back to the HTTPResponse, which will cause the continuation to jump back to [A], and the page will be returned to the user.
It is important to consider what the data clustering algorithm will do over time here. It should transparently optimize the data such that data for users which tend to follow each-other, or be followed by the same people, will migrate to the same nodes in the Swarm. Achieving something similar with a conventional datastore would require significant manual understanding of the problem and effort, but with Swarm it is entirely automatic.