Skip to content
AlvaroMarquesAndrade edited this page Jun 3, 2020 · 9 revisions

Welcome to the Butterfree wiki!

The main idea is for this repository to be a set of tools for easing ETLs. The idea is using Butterfree to upload data to a Feature Store, so data can be provided to your machine learning algorithms.

Table of Contents

What is going on here

This repository holds all the scripts that will query data from necessary services (databases, datalake, S3, etc), transform all of this info into feature store "domains" (for example, an entity House or an entity Contract) and then upload all of that on both a historical feature store and an online feature store.

Scripts use Python and Apache's Spark.

Also, it contains Airflow DAGs for managing these ETLs executions.

ML Services Architecture

ML Services Architecture

Historical Feature Store

An S3 bucket with entities used on ML products, as house and contract.

Online Feature Store

A Cassandra database that stores all houses in its most updated version.

Spark Stream

It is running on a Databricks job (Feature Store "house" Streaming)

It listens Kafka HouseBusinessEvents topic for house updates and saves updated house on Online Feature Store

This topic HouseBusinessEvents is produced by Mainstreamer

We currently have the jobs:

  • Feature Store "house" Streaming -> Saves house updates (listened from Kafka) to Online Feature Store.
  • Feature Store "contract" Historical -> Creates contract entity on historical feature store.
  • Feature Store "house" Historical -> Creates house entity on historical feature store.

Creating new ETLs

Here on the feature-store repository, we keep the ETLs responsible for gathering data from databases/datalake/other and uploading this data on both the Historical Feature Store and the Online Feature Store. When you create/update ETLs you will most likely run into some problems that other people have faced before. Thinking of this, we have created a Wiki page where we have added some examples on how to build new ETLs: https://github.com/quintoandar/wonka/wiki/Creating-ETLs

Clone this wiki locally