From 2f6627ccd1c8a90f808e4fdb52886da0edee994d Mon Sep 17 00:00:00 2001 From: exAspArk Date: Sat, 3 Feb 2024 18:26:30 -0500 Subject: [PATCH] Update docs --- README.md | 89 +++++++++++++++++++++++-- docs/docs/postgresql/source-database.md | 12 ++++ docs/docs/self-hosting.md | 28 ++++++++ docs/sidebars.ts | 1 + 4 files changed, 125 insertions(+), 5 deletions(-) create mode 100644 docs/docs/self-hosting.md diff --git a/README.md b/README.md index 6b6756e..96abe96 100644 --- a/README.md +++ b/README.md @@ -22,26 +22,105 @@ # Bemi -## Running locally +Bemi automatically tracks database changes ensuring 100% reliability and a comprehensive understanding of every change. It does it by connecting PostgreSQL's [Write-Ahead Log](https://www.postgresql.org/docs/current/wal-intro.html) (WAL) and implementing [Change Data Capture](https://en.wikipedia.org/wiki/Change_data_capture) (CDC) data pattern. Designed with simplicity and non-invasiveness in mind, Bemi operates in the background and doesn't require any alterations to your existing database tables. + +## Contents + +- [Highlights](#highlights) +- [Use cases](#use-cases) +- [Quickstart](#quickstart) +- [Architecture](#architecture) +- [Testing](#testing) +- [License](#license) + +## Highlights + +- Automatic and 100% reliable database change tracking +- High performance without affecting runtime execution +- Easy-to-use without changing table structures +- Time travel querying and ability to easily filter changes +- Optional application-specific context by using [ORM packages](https://docs.bemi.io/#supported-nodejs-orms) + +## Use cases + +There's a wide range of use cases that Bemi is built for! The tech was initially built as a compliance engineering system for fintech that supported $15B worth of assets under management, but has since been extracted into a general-purpose utility. Some use cases include: + +- **Audit Trails:** Use logs for compliance purposes or surface them to customer support and external customers. +- **Time Travel:** Retrieve historical data without implementing event sourcing. +- **Troubleshooting:** Identify the root cause of application issues. +- **Change Reversion:** Revert changes made by a user or rollback all data changes within an API request. +- **Distributed Tracing:** Track changes across distributed systems. +- **Testing:** Rollback or roll-forward to different application test states. +- **Analyzing Trends:** Gain insights into historical trends and changes for informed decision-making. + +## Quickstart + +### System dependencies + +* [Node.js](https://github.com/nodejs/node) +* [NATS server](https://github.com/nats-io/nats-server) + +You can install these system dependencies manually or use [Devbox](https://github.com/jetpack-io/devbox) which uses [Nix Packages](https://github.com/NixOS/nixpkgs) providing isolated shells without containerization. + +And of course, you need a PostgreSQL database that you want to connect to to track data changes. Make sure your database has `SHOW wal_level;` returning `logical`. Otherwise, you need to run the following SQL command and restart your PostgreSQL server: + +```sql +ALTER SYSTEM SET wal_level = logical; +``` + +### Installation + +After installing all system dependencies, install all project dependencies with Node.js: + +```sh +make worker-setup && cd worker && npm install +``` + +Alternatively, you can use Devbox instead and run a single command that will also install Node.js with pnpm and NATS server: ```sh make worker-install +``` + +### Data change tracking + +Set environment variables specifying connection settings for a PostgreSQL database you want to track: +```sh export DB_HOST=127.0.0.1 DB_PORT=5432 DB_NAME=postgres DB_USER=postgres DB_PASSWORD=postgres -make worker-up ``` -## Testing +Run a worker as a single process with directly installed Node.js: ```sh -make core-install -make core-test +cd worker && npm concurrently -- "npm:up:*" + +# Alternatively, with Devbox +make worker-up ``` +Now try making some database changes like: + +```sql +UPDATE _bemi_migrations SET executed_at = NOW() WHERE id = 1; +``` + +This will add a new record in the `changes` table within the same database after a few seconds. + +If you want to also store application-specific context (e.g., user ID, API endpoint, etc.) with these low-level database changes, check out our open-source [ORM packages](https://docs.bemi.io/#supported-nodejs-orms). + ## Architecture ![Bemi Worker Architecture](docs/static/img/worker.png) +Bemi consists of three main parts: + +1. [Debezium](https://github.com/debezium/debezium), a very flexible tool for implementing Change Data Capture that is written in Java. It is used by many companies that need to implement ETL such as [Airbyte](https://github.com/airbytehq/airbyte) and [Materialize](https://github.com/MaterializeInc/materialize). We rely on it to be able to connect to PostgreSQL replication log, perform logical decoding, and send raw data to a data sink. +2. [NATS JetStream](https://github.com/nats-io/nats-server), a cloud-native messaging system written in Go. Debezium is historically designed to send data to Kafka, but it can be also re-configured to send data to NATS JetStream. It is much more lightweight and easy to manage while being very performant and having over 45 clients for different programming languages. +3. Bemi Worker, a process responsible for stitching data change with app context sent via our open-source [ORM packages](https://docs.bemi.io/#supported-nodejs-orms) and storing data changes. It is written in TypeScript and uses the `core` that we rely on for our [Bemi](https://bemi.io/) cloud platform. + +The described architecture and the `worker` code in this repository are a simplified version that can be easily run without much overhead. If you want to self-host it in a production environment, see our [self-hosting docs](https://docs.bemi.io/self-hosting). + ## License Distributed under the terms of the [SSPL-1.0 License](/LICENSE). If you need to modify and distribute the code, please release it to contribute back to the open-source community. diff --git a/docs/docs/postgresql/source-database.md b/docs/docs/postgresql/source-database.md index c021ed0..64bf3f0 100644 --- a/docs/docs/postgresql/source-database.md +++ b/docs/docs/postgresql/source-database.md @@ -73,6 +73,14 @@ SHOW wal_level; Note that changing `wal_level` in PostgreSQL requires a restart. Changing from `replica` to `logical` won't break replication. It will just increase the WAL volume (disk space and network traffic if there are replicas). +### Changing WAL level in a self-managed PostgreSQL + +Run the following SQL command and restart your database: + +```sql +ALTER SYSTEM SET wal_level = logical; +``` + ### Changing WAL level on AWS RDS At a high level, these are the steps necessary to update `wal_level` from `replica` to `logical` @@ -127,3 +135,7 @@ echo 'ssh-ed25519 AAAAC3Nz...' >> ~/.ssh/authorized_keys ``` If you need a public SSH Key before you know the SSH host address, just specify any address and later reach out to us to update it. + +## Static IPs + +If you restrict access to your databases by IP addresses, [contact us](mailto:hi@bemi.io). We will share our static IP addresses, which you can add to an allowlist, so we can connect to your Source PostgreSQL database. diff --git a/docs/docs/self-hosting.md b/docs/docs/self-hosting.md new file mode 100644 index 0000000..b2c599a --- /dev/null +++ b/docs/docs/self-hosting.md @@ -0,0 +1,28 @@ +# Self-Hosting + +BemiHQ/bemi +
+
+ +![Bemi Worker Architecture](/img/worker.png) + +Bemi consists of three main parts: + +1. [Debezium](https://github.com/debezium/debezium), a very flexible tool for implementing Change Data Capture that is written in Java. It is used by many companies that need to implement ETL such as [Airbyte](https://github.com/airbytehq/airbyte) and [Materialize](https://github.com/MaterializeInc/materialize). We rely on it to be able to connect to PostgreSQL replication log, perform logical decoding, and send raw data to a data sink. +2. [NATS JetStream](https://github.com/nats-io/nats-server), a cloud-native messaging system written in Go. Debezium is historically designed to send data to Kafka, but it can be also re-configured to send data to NATS JetStream. It is much more lightweight and easy to manage while being very performant and having over 45 clients for different programming languages. +3. Bemi Worker, a process responsible for stitching data change with app context sent via our open-source [ORM packages](https://docs.bemi.io/#supported-nodejs-orms) and storing data changes. It is written in TypeScript and uses the [`core`](https://github.com/BemiHQ/bemi) that we rely on for our [Bemi](https://bemi.io/) cloud platform. + +If you want to self-host our solution in a production environment, please [contact us](mailto:hi@bemi.io), and we'll be happy to provide you with a Docker image and assist with setting it up in exchange for your feedback :) + +## Self-Hosting vs Bemi Cloud + +| | Self-Hosting | Bemi Cloud | +| ------------------------------ | ------------- | ----------- | +| Automatic data change tracking | ✅ | ✅ | +| PostgreSQL support | ✅ | ✅ | +| Automatic table partitioning | ❌ | ✅ | +| Automatic data retention | ❌ | ✅ | +| Auto-scaling and redundancy | ❌ | ✅ | +| Control plane and monitoring | ❌ | ✅ | +| Support | ❌ | ✅ | +| Bemi Search UI (coming soon) | ❌ | ✅ | diff --git a/docs/sidebars.ts b/docs/sidebars.ts index 8dd9462..e63231c 100644 --- a/docs/sidebars.ts +++ b/docs/sidebars.ts @@ -31,6 +31,7 @@ const sidebars: SidebarsConfig = { 'orms/typeorm', ], }, + 'self-hosting', ], };