Skip to content

Commit

Permalink
Update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
exAspArk committed Feb 4, 2024
1 parent 6c05b6f commit 2f6627c
Show file tree
Hide file tree
Showing 4 changed files with 125 additions and 5 deletions.
89 changes: 84 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,26 +22,105 @@

# Bemi

## Running locally
Bemi automatically tracks database changes ensuring 100% reliability and a comprehensive understanding of every change. It does it by connecting PostgreSQL's [Write-Ahead Log](https://www.postgresql.org/docs/current/wal-intro.html) (WAL) and implementing [Change Data Capture](https://en.wikipedia.org/wiki/Change_data_capture) (CDC) data pattern. Designed with simplicity and non-invasiveness in mind, Bemi operates in the background and doesn't require any alterations to your existing database tables.

## Contents

- [Highlights](#highlights)
- [Use cases](#use-cases)
- [Quickstart](#quickstart)
- [Architecture](#architecture)
- [Testing](#testing)
- [License](#license)

## Highlights

- Automatic and 100% reliable database change tracking
- High performance without affecting runtime execution
- Easy-to-use without changing table structures
- Time travel querying and ability to easily filter changes
- Optional application-specific context by using [ORM packages](https://docs.bemi.io/#supported-nodejs-orms)

## Use cases

There's a wide range of use cases that Bemi is built for! The tech was initially built as a compliance engineering system for fintech that supported $15B worth of assets under management, but has since been extracted into a general-purpose utility. Some use cases include:

- **Audit Trails:** Use logs for compliance purposes or surface them to customer support and external customers.
- **Time Travel:** Retrieve historical data without implementing event sourcing.
- **Troubleshooting:** Identify the root cause of application issues.
- **Change Reversion:** Revert changes made by a user or rollback all data changes within an API request.
- **Distributed Tracing:** Track changes across distributed systems.
- **Testing:** Rollback or roll-forward to different application test states.
- **Analyzing Trends:** Gain insights into historical trends and changes for informed decision-making.

## Quickstart

### System dependencies

* [Node.js](https://github.com/nodejs/node)
* [NATS server](https://github.com/nats-io/nats-server)

You can install these system dependencies manually or use [Devbox](https://github.com/jetpack-io/devbox) which uses [Nix Packages](https://github.com/NixOS/nixpkgs) providing isolated shells without containerization.

And of course, you need a PostgreSQL database that you want to connect to to track data changes. Make sure your database has `SHOW wal_level;` returning `logical`. Otherwise, you need to run the following SQL command and restart your PostgreSQL server:

```sql
ALTER SYSTEM SET wal_level = logical;
```

### Installation

After installing all system dependencies, install all project dependencies with Node.js:

```sh
make worker-setup && cd worker && npm install
```

Alternatively, you can use Devbox instead and run a single command that will also install Node.js with pnpm and NATS server:

```sh
make worker-install
```

### Data change tracking

Set environment variables specifying connection settings for a PostgreSQL database you want to track:

```sh
export DB_HOST=127.0.0.1 DB_PORT=5432 DB_NAME=postgres DB_USER=postgres DB_PASSWORD=postgres
make worker-up
```

## Testing
Run a worker as a single process with directly installed Node.js:

```sh
make core-install
make core-test
cd worker && npm concurrently -- "npm:up:*"

# Alternatively, with Devbox
make worker-up
```

Now try making some database changes like:

```sql
UPDATE _bemi_migrations SET executed_at = NOW() WHERE id = 1;
```

This will add a new record in the `changes` table within the same database after a few seconds.

If you want to also store application-specific context (e.g., user ID, API endpoint, etc.) with these low-level database changes, check out our open-source [ORM packages](https://docs.bemi.io/#supported-nodejs-orms).

## Architecture

![Bemi Worker Architecture](docs/static/img/worker.png)

Bemi consists of three main parts:

1. [Debezium](https://github.com/debezium/debezium), a very flexible tool for implementing Change Data Capture that is written in Java. It is used by many companies that need to implement ETL such as [Airbyte](https://github.com/airbytehq/airbyte) and [Materialize](https://github.com/MaterializeInc/materialize). We rely on it to be able to connect to PostgreSQL replication log, perform logical decoding, and send raw data to a data sink.
2. [NATS JetStream](https://github.com/nats-io/nats-server), a cloud-native messaging system written in Go. Debezium is historically designed to send data to Kafka, but it can be also re-configured to send data to NATS JetStream. It is much more lightweight and easy to manage while being very performant and having over 45 clients for different programming languages.
3. Bemi Worker, a process responsible for stitching data change with app context sent via our open-source [ORM packages](https://docs.bemi.io/#supported-nodejs-orms) and storing data changes. It is written in TypeScript and uses the `core` that we rely on for our [Bemi](https://bemi.io/) cloud platform.

The described architecture and the `worker` code in this repository are a simplified version that can be easily run without much overhead. If you want to self-host it in a production environment, see our [self-hosting docs](https://docs.bemi.io/self-hosting).

## License

Distributed under the terms of the [SSPL-1.0 License](/LICENSE). If you need to modify and distribute the code, please release it to contribute back to the open-source community.
12 changes: 12 additions & 0 deletions docs/docs/postgresql/source-database.md
Original file line number Diff line number Diff line change
Expand Up @@ -73,6 +73,14 @@ SHOW wal_level;
Note that changing `wal_level` in PostgreSQL requires a restart. Changing from `replica` to `logical` won't break replication.
It will just increase the WAL volume (disk space and network traffic if there are replicas).

### Changing WAL level in a self-managed PostgreSQL

Run the following SQL command and restart your database:

```sql
ALTER SYSTEM SET wal_level = logical;
```

### Changing WAL level on AWS RDS

At a high level, these are the steps necessary to update `wal_level` from `replica` to `logical`
Expand Down Expand Up @@ -127,3 +135,7 @@ echo 'ssh-ed25519 AAAAC3Nz...' >> ~/.ssh/authorized_keys
```

If you need a public SSH Key before you know the SSH host address, just specify any address and later reach out to us to update it.

## Static IPs

If you restrict access to your databases by IP addresses, [contact us](mailto:[email protected]). We will share our static IP addresses, which you can add to an allowlist, so we can connect to your Source PostgreSQL database.
28 changes: 28 additions & 0 deletions docs/docs/self-hosting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
# Self-Hosting

<a class="github-button" href="https://github.com/BemiHQ/bemi" data-size="large" data-show-count="true" aria-label="Star BemiHQ/bemi on GitHub">BemiHQ/bemi</a>
<br />
<br />

![Bemi Worker Architecture](/img/worker.png)

Bemi consists of three main parts:

1. [Debezium](https://github.com/debezium/debezium), a very flexible tool for implementing Change Data Capture that is written in Java. It is used by many companies that need to implement ETL such as [Airbyte](https://github.com/airbytehq/airbyte) and [Materialize](https://github.com/MaterializeInc/materialize). We rely on it to be able to connect to PostgreSQL replication log, perform logical decoding, and send raw data to a data sink.
2. [NATS JetStream](https://github.com/nats-io/nats-server), a cloud-native messaging system written in Go. Debezium is historically designed to send data to Kafka, but it can be also re-configured to send data to NATS JetStream. It is much more lightweight and easy to manage while being very performant and having over 45 clients for different programming languages.
3. Bemi Worker, a process responsible for stitching data change with app context sent via our open-source [ORM packages](https://docs.bemi.io/#supported-nodejs-orms) and storing data changes. It is written in TypeScript and uses the [`core`](https://github.com/BemiHQ/bemi) that we rely on for our [Bemi](https://bemi.io/) cloud platform.

If you want to self-host our solution in a production environment, please [contact us](mailto:[email protected]), and we'll be happy to provide you with a Docker image and assist with setting it up in exchange for your feedback :)

## Self-Hosting vs Bemi Cloud

| | Self-Hosting | Bemi Cloud |
| ------------------------------ | ------------- | ----------- |
| Automatic data change tracking |||
| PostgreSQL support |||
| Automatic table partitioning |||
| Automatic data retention |||
| Auto-scaling and redundancy |||
| Control plane and monitoring |||
| Support |||
| Bemi Search UI (coming soon) |||
1 change: 1 addition & 0 deletions docs/sidebars.ts
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,7 @@ const sidebars: SidebarsConfig = {
'orms/typeorm',
],
},
'self-hosting',
],
};

Expand Down

0 comments on commit 2f6627c

Please sign in to comment.