Skip to content
/ dozer Public

Dozer is a real-time data movement tool that leverages CDC from various sources and moves data into various sinks.

License

Notifications You must be signed in to change notification settings

getdozer/dozer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

5244a7d Β· Sep 1, 2023
Aug 31, 2023
Aug 29, 2023
Aug 29, 2023
Nov 24, 2022
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Sep 1, 2023
Mar 12, 2023
Apr 25, 2023
Aug 31, 2023
Jan 13, 2023
Apr 24, 2023
Sep 1, 2023
Aug 24, 2023
Apr 10, 2023
Aug 22, 2023
Mar 6, 2023

Repository files navigation


Connect any data source, combine them in real-time and instantly get low-latency data APIs.
⚑ All with just a simple configuration! ⚑️


CI Coverage Status Docs Join on Discord License

Overview

Dozer makes it easy to build low-latency data APIs (gRPC and REST) from any data source. Data is transformed on the fly using Dozer's reactive SQL engine and stored in a high-performance cache to offer the best possible experience. Dozer is useful for quickly building data products.

Architecture

Quick Start

Follow the instruction below to install Dozer on your machine and run a quick sample using the NY Taxi Dataset

Installation

MacOS Monterey (12) and above

brew tap getdozer/dozer && brew install dozer

Ubuntu 20.04 and above

# amd64
curl -sLO https://github.com/getdozer/dozer/releases/latest/download/dozer-linux-amd64.deb && sudo dpkg -i dozer-linux-amd64.deb

# aarch64
curl -sLO https://github.com/getdozer/dozer/releases/latest/download/dozer-linux-aarch64.deb && sudo dpkg -i dozer-linux-aarch64.deb

Dozer requires protobuf-compiler, installation instructions can be found in additional steps

Build from source

cargo install --path dozer-cli --locked

Run it

Download sample configuration and data

Create a new empty directory and run the commands below. This will download a sample configuration file and a sample NY Taxi Dataset file.

curl -o dozer-config.yaml https://raw.githubusercontent.com/getdozer/dozer-samples/main/connectors/local-storage/dozer-config.yaml
curl --create-dirs -o data/trips/fhvhv_tripdata_2022-01.parquet https://d37ci6vzurychx.cloudfront.net/trip-data/fhvhv_tripdata_2022-01.parquet

Run Dozer binary

dozer -c dozer-config.yaml

Dozer will start processing the data and populating the cache. You can see a progress of the execution from the console.

Query the APIs

When some data is loaded, you can query the cache using gRPC or REST

# gRPC
grpcurl -d '{"query": "{\"$limit\": 1}"}' -plaintext localhost:50051 dozer.generated.trips_cache.TripsCaches/query

# REST
curl -X POST  http://localhost:8080/trips/query --header 'Content-Type: application/json' --data-raw '{"$limit":3}'

Alternatively, you can use Postman to discover gRPC endpoints through gRPC reflection

postman query

Read more about Dozer here. And remember to star 🌟 our repo to support us!

Client Libraries

Library Language License
dozer-python Dozer Client library for Python Apache-2.0
dozer-js Dozer Client library for JavaScript Apache-2.0
dozer-react Dozer Client library for React with easy to use hooks Apache-2.0

Python

from pydozer.api import ApiClient
api_client = ApiClient('trips')
api_client.query()

JavaScript

import { ApiClient } from "@dozerjs/dozer";

const flightsClient = new ApiClient('flights');
flightsClient.count().then(count => {
    console.log(count);
});

React

import { useCount } from "@dozerjs/dozer-react";
const AirportComponent = () => {
    const [count] = useCount('trips');
    <div> Trips: {count} </div>
}

Samples

Check out Dozer's samples repository for more comprehensive examples and use case scenarios.

Type Sample Notes
Connectors Postgres Load data using Postgres CDC
Local Storage Load data from local files
AWS S3 Load data from AWS S3 bucket
Ethereum Load data from Ethereum
Kafka Load data from kafka stream
MySQL Load data using MySQL CDC
Snowflake (Coming soon) Load data using Snowflake table streams
SQL Using JOINs Dozer APIs over multiple sources using JOIN
Using Aggregations How to aggregate using Dozer
Using Window Functions Use Hop and Tumble Windows
Use Cases Flight Microservices Build APIs over multiple microservices.
Scaling Ecommerce Profile and benchmark Dozer using an ecommerce data set
Use Dozer to Instrument (Coming soon) Combine Log data to get real time insights
Real Time Model Scoring (Coming soon) Deploy trained models to get real time insights as APIs
Client Libraries Dozer React Starter Instantly start building real time views using Dozer and React
Ingest Polars/Pandas Dataframes Instantly ingest Polars/Pandas dataframes using Arrow format and deploy APIs
Authorization Dozer Authorziation How to apply JWT Auth on Dozer

Connectors

Refer to the full list of connectors and example configurations here.

Connector Status Type Schema Mapping Frequency Implemented Via
Postgres Available βœ… Relational Source Real Time Direct
Snowflake Available βœ… Data Warehouse Source Polling Direct
Local Files (CSV, Parquet) Available βœ… Object Storage Source Polling Data Fusion
Delta Lake Alpha Data Warehouse Source Polling Direct
AWS S3 (CSV, Parquet) Alpha Object Storage Source Polling Data Fusion
Google Cloud Storage(CSV, Parquet) Alpha Object Storage Source Polling Data Fusion
Ethereum Available βœ… Blockchain Logs/Contract ABI Real Time Direct
Kafka Stream Available βœ… Schema Registry Real Time Debezium
MySQL Available βœ… Relational Source Real Time Direct
Google Sheets In Roadmap Applications Source
Excel In Roadmap Applications Source
Airtable In Roadmap Applications Source

Pipeline Log Reader Bindings

Library Language License
dozer-log-python Python binding for reading Dozer logs Apache-2.0
dozer-log-js Node.js binding for reading Dozer logs Apache-2.0

Python

we support CPython >= 3.10 on Windows, MacOS and Linux, both amd and arm architectures.

import pydozer_log

reader = await pydozer_log.LogReader.new('.dozer', 'trips')
print(await reader.next_op())

JavaScript

const dozer_log = require('@dozerjs/log');

const runtime = dozer_log.Runtime();
reader = await runtime.create_reader('.dozer', 'trips');
console.log(await reader.next_op());

Releases

We release Dozer typically every 2 weeks and is available on our releases page. Currently, we publish binaries for Ubuntu 20.04, Apple(Intel) and Apple(Silicon).

Please visit our issues section if you are having any trouble running the project.

Contributing

Please refer to Contributing for more details.