Skip to content
This repository has been archived by the owner on Jul 18, 2020. It is now read-only.

Latest commit

 

History

History
38 lines (22 loc) · 2.57 KB

README.md

File metadata and controls

38 lines (22 loc) · 2.57 KB

Note: Servers may be down cuz I'm really poor.

Q/A

Q: What's this?

A: Hashtagsbattle is a Web App which displays some analytics, such as hourly trending hashtags, daily hashtags, worldwide activity... based on Twitter and in Real-Time. Inspired by the awesome One Million Tweet Map. (Nb: that's a demo).

Q: How it works?

A:

  • First of, there's a tweets listener built with Tweepy which retrieves tweets sent back by the Twitter's API. It does some basic cleaning and filtering before publishing them to a Pub/Sub topic, which is basically a global-scale messaging buffer/bus. The listener is running on a Google Compute Engine instance, as it is somehow cheap and doesnt requires auto-scaling.

  • Then, there's a little Express server using SocketIO. This application is running on App Engine. There's an endpoint receiving Pub/Sub push messages and emiting events through a web socket. It's using the Supercluster library to do server-side clustering on points and to reduce networking/client-side rendering delay.

  • The heart of my project is the Apache-Beam streaming processing pipeline running on the Cloud Dataflow runner. This pipeline consumes events sent by the source Pub/Sub topic and it does some data transformations (grouping, counting, filtering, batching...) before sending back the pre-aggregated output to another Pub/Sub topic. I'm playing with some windows and some triggers to achieve a quite low-latency.

  • Finally, the output Pub/Sub topic will trigger Cloud Functions instances that are going to do some computation on the data before saving it to Firestore.

The Web-App is built with Stencil and it's deployed to Firebase Hosting.

As you can see, this is fully managed by Google Cloud Platform.

GCPimplementation

TODO :

  • Use Pub/Sub push method instead of pull (lower latency)
  • UI
  • Implement the ML layer
  • A lot of things ...

Installation

Work in progress.

The application is made of 4 components. Almost each component is Dockerized and has it's own CI/CD pipeline using Cloud Build.