replication poc #406

ludydoo · 2022-05-13T09:18:16Z

This PR is a proof of concept for the offline/reconciliation db engine that would power core

Features

Headless DB Engine (Implemented with SQLite)
Reconciler
Server
HTTP Client

The requirements of this engine are that

must work with clients that are offline for a very long time
should work on the client or server (peer to peer topology)
should offer a simple reconciliation mechanism
should be very robust and compatible with different runtimes and environments

Folders

api contains mostly public types and public methods that are meant to be exposed.
client contains the http client (for now). Could add grpc, websockets, etc.
engine contains the internal logic for the DB engine
handler contains the http handlers
test contains test utils
utils contains utils, such as a uuid generator

The interface is very simple and relatively self-explanatory

type ReadInterface interface {
	// GetRecord gets a single record from the database.
	GetRecord(ctx context.Context, request GetRecordRequest) (Record, error)
	// GetChanges gets a change stream for a table
	GetChanges(ctx context.Context, request GetChangesRequest) (Changes, error)
}

type WriteInterface interface {
	// PutRecord puts a single record inside the database.
	PutRecord(ctx context.Context, request PutRecordRequest) (Record, error)
	// CreateTable creates a new table in the database.
	CreateTable(ctx context.Context, table Table) (Table, error)
}

type Engine interface {
  ReadInterface
  WriteInterface
}

Records

The records have this structure

// Record represents a record in a database
type Record struct {
	ID               string     `json:"id"`
	Table            string     `json:"table"`
	Revision         Revision   `json:"revision"`
	PreviousRevision Revision   `json:"-"`
	Attributes       Attributes `json:"attributes"`
}

Record Serialization

Records have a special serialization mechanism, inspired from DynamoDB. It basically encodes any data type into a string to prevent any floating point approximation, which would undoubtedly break the hashing.

{
  "id" : "my-record-id",
  "table": "my-table",
  "revision": "1-96fc52d8fbf5d2adc6d139cb5b2ea099",
  "attributes": {
    "my-field-1" : { "string": "my-string-value" },
    "my-field-2" : { "int": "1234" }
  } 
}

Table structure

Each new table gives birth to two tables

<table>
<table>_history

Where <table> holds the reconciled version of the record and where <table>_history stores the different versions of the record.

rational-terraforming-golem · 2022-05-13T09:18:19Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ludydoo

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [ludydoo]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

ludydoo · 2022-05-13T10:25:45Z

/test pull-core-backend-test

sonarqubecloud · 2022-05-18T09:08:18Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
5 Code Smells

No Coverage information
0.0% Duplication

internetti · 2022-05-18T09:34:53Z

frontend/apps/db-explorer/src/client.ts

+}
+
+export type PutRecordRequest = {
+  isReplication?: boolean;


I'm a bit confused about the terms revision/replication? do you use them interchangeably or is there a difference in meaning?

also, what problem was solved by adding this? why do we need to differentiate between original and copy? is this just a shortcut to not have to check for revisions every time?

It's mostly due to (when we enable offline), that multiple people might edit the same records. When that happens, it's unclear whose version of the record is "the good one". To "solve" this, we store all the edits as different "revisions". Then, the engine chooses one of the revisions as being the "good one", but we can still present to the user that there was a conflict, and the user could resolve the conflict by choosing the "right one" manually.

It's also needed for replication/synchronization between different devices. Basically, the devices would pull the revisions of the records that they don't know about. And since the process of "choosing the good one" is deterministic, all users will always elect the same revision all the time.

In this case isReplication is exclusively used by the replicator. The engine assumes that the revision being put is known by another engine.

internetti · 2022-05-18T11:32:56Z

pkg/server/data/api/revision.go

+	}
+	hash := str[len(str)-32:]
+	// the hash can only contain hex characters
+	for _, c := range hash {


couldn't we do this with a regex?
specifially the hex check, but also the length checks?

We could for sure, but this is probably way more performant than compiling + executing a regex! That will add up if there is a lot of traffic. Also I tried to make this package depend on as little libraries as possible.

internetti · 2022-05-18T11:54:16Z

pkg/server/data/client/client_http.go

+	GetTable(ctx context.Context, request api.GetTableRequest) (api.Table, error)
+	GetChanges(ctx context.Context, request api.GetChangesRequest) (api.Changes, error)
+	PutRecord(ctx context.Context, request api.PutRecordRequest) (api.Record, error)
+}


it seems like there's no method to create a new record? am I missing something, or is there a trick?

PutRecord ;)

ah, the id in the request irritated me

Yeah, we have to specify an id when we create/update a record !

internetti · 2022-05-18T12:31:58Z

pkg/server/data/engine/pkg.go

+// A record is a collection of fields.
+// A record has a unique id.
+// A field is a named value.
+// A value is a string, a number, or a boolean.


no date/time fields?

Not yet! This was just for POC...

internetti · 2022-05-18T12:35:46Z

pkg/server/data/engine/pkg.go

+// The <num> is an incrementing number.
+// The <hash> is a hash of the record.
+//
+// Two records might have the same number but different hashes.


if the hash is of the original record, how can two revisions of the same record have a different hash?

or, if the hash is of the new version of the record, I don't understand yet how the revisions are linked to their original record

let's say

you and I checkout a record with revision 1-aaa

We go offline

We both update the record offline

We go online again

I created a new revision with hash 2-nnn and you 2-mmm. Both records started at 1-aaa, and they are both at the "2nd" revision, but they have different hashes because their content is different.

If we would for some reason update the same record 1-aaa with the same contents on both of our sides, then we would compute the same hash, so that would be the same revision. (both you and i would get like 2-ggg and 2-ggg)

like

# we checkout this record {"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}} # we go offline # you update your record to have {"foo":"baz"}, so your store contains {"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}} {"id":"some_id", "revision":"2-bbb", "attributes: {"foo":"baz"}} # i update my record to have {"foo":"snip"} so my store contains {"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}} {"id":"some_id", "revision":"2-ccc", "attributes: {"foo":"snip"}} # we go online # we reconcile with the server, so the server contains {"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}} {"id":"some_id", "revision":"2-bbb", "attributes: {"foo":"baz"}} {"id":"some_id", "revision":"2-ccc", "attributes: {"foo":"snip"}}

thanks for the example. the concept with the alphabetical order was quite clear, I just didn't realize that the revision is a property on the record. I thought the 2-mmm was supposed to somehow lead to the record itself, so the hash changing confused me. but if the revision is a property on the record, the hash is not relevant for the link.

neb42 · 2022-05-25T09:22:21Z

My understanding of the roadmap is that we had decided to do a centralised/online only version of core. The majority of this POC seems to be focussed on the concepts of revisions, p2p, and reconciliation.

Our goal at the moment is to design our data model and based on this POC I'm not sure how to progress with that goal.

I guess I'm not too sure of your aim of this pr. Are you suggesting we should be building something along these lines for the architecture refactor or is this more to demonstrate your ideas for work in the further future?

I do appreciate the point of having a single entry point of an engine that can be put into any use case, but I think that is doesn't depend on the p2p/reconciliation. I also think the interface of the engine is a bit overloaded and could be rethought.

I'd also like to revisit the pros/cons of using a revision based approach vs a crdt system.

neb42 · 2022-05-25T10:03:08Z

Something like this would separate the concerns of the engine a bit better

package engine

import (
	"context"

	"github.com/nrc-no/core/pkg/server/core-db/types"
)

type TableReader interface {
	Get(ctx context.Context, tableID string) (*types.Table, error)
	List(ctx context.Context) ([]types.Table, error)
}

type TableWriter interface {
	Upsert(ctx context.Context, table types.Table) (*types.Table, error)
	Delete(ctx context.Context, tableID string) error
}

type RecordReader interface {
	Get(ctx context.Context, tableId string, recordID string) (*types.Record, error)
	List(ctx context.Context) ([]types.Record, error)
}

type RecordWriter interface {
	Upsert(ctx context.Context, record types.Record) (*types.Record, error)
	Delete(ctx context.Context, tableID string, recordID string) error
}

type ReaderEngine struct {
	Table  TableReader
	Record RecordReader
}

type WriterEngine struct {
	Table  TableWriter
	Record RecordWriter
}

type Engine struct {
	Reader ReaderEngine
	Writer WriterEngine
}

func main() {
	tableReader = NewPostgresTableReader()
	tableWriter = NewPostgresTableWriter()
	recordReader = NewPostgresRecordReader()
	recordWriter = NewPostgresRecordWriter()

	engine := Engine{
		Reader: ReaderEngine{
			Table:  tableReader,
			Record: recordReader,
		},
		Writer: WriterEngine{
			Table:  tableWriter,
			Record: recordWriter,
		},
	}

	engine.Reader.Table.Create(...)
}

neb42 · 2022-06-13T10:15:11Z

I was thinking about this a bit more and have realised that you're showing how we could do this without actually having a data model stored in the db.

The issues that come to mind with this approach:

Strong lock-in with sql. Even when looking at sql flavours there could be a feature disparity (sqlite doesn't have everything postgres has)
Difficult to query table metadata. We would need to parse a DESCRIBE query or use the metadata stored in sql and parse it into our data structures.
How would we do access control here? We could use schemas in postgres and different database files in sqlite, but this limits us to a single project level of access.
We lose custom metadata about tables, like who created it.

Seen as we already need to define a data model for the api, then storing this isn't that much extra and gives us some benefits.

rational-terraforming-golem bot requested a review from internetti May 13, 2022 09:18

rational-terraforming-golem bot requested a review from neb42 May 13, 2022 09:18

rational-terraforming-golem bot added the approved label May 13, 2022

ludydoo added the do-not-merge/work-in-progress label May 13, 2022

rational-terraforming-golem bot added the size/XXL label May 13, 2022

ludydoo force-pushed the offline-db-poc branch 4 times, most recently from a485411 to aa20127 Compare May 15, 2022 14:35

Database Replication POC

30c8e4a

ludydoo force-pushed the offline-db-poc branch from c468ac6 to 30c8e4a Compare May 17, 2022 14:56

refactor

db0ae9b

rational-terraforming-golem bot removed the do-not-merge/work-in-progress label May 17, 2022

ludydoo added 8 commits May 17, 2022 17:10

refactor

eadfdb6

frontend demo

8d3ec12

fix test

4b97cd9

fix test

4da022b

fix code smells

57e8b58

fix code smells

f5e0fb3

fix code smells

837a7ce

move methods

43a41b9

internetti reviewed May 18, 2022

View reviewed changes

neb42 added the do-not-merge/work-in-progress label May 25, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

replication poc #406

replication poc #406

ludydoo commented May 13, 2022 •

edited

Loading

rational-terraforming-golem bot commented May 13, 2022

ludydoo commented May 13, 2022

sonarqubecloud bot commented May 18, 2022

internetti May 18, 2022

internetti May 18, 2022

ludydoo May 18, 2022

ludydoo May 18, 2022

internetti May 18, 2022

ludydoo May 18, 2022

internetti May 18, 2022

ludydoo May 18, 2022

internetti May 18, 2022

ludydoo May 18, 2022

internetti May 18, 2022

ludydoo May 18, 2022

internetti May 18, 2022

internetti May 18, 2022

ludydoo May 18, 2022 •

edited

Loading

ludydoo May 18, 2022

ludydoo May 18, 2022

internetti May 18, 2022

neb42 commented May 25, 2022

neb42 commented May 25, 2022

neb42 commented Jun 13, 2022

replication poc #406

Are you sure you want to change the base?

replication poc #406

Conversation

ludydoo commented May 13, 2022 • edited Loading

This PR is a proof of concept for the offline/reconciliation db engine that would power core

Folders

Records

Record Serialization

Table structure

rational-terraforming-golem bot commented May 13, 2022

ludydoo commented May 13, 2022

sonarqubecloud bot commented May 18, 2022

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ludydoo May 18, 2022 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

neb42 commented May 25, 2022

neb42 commented May 25, 2022

neb42 commented Jun 13, 2022

ludydoo commented May 13, 2022 •

edited

Loading

ludydoo May 18, 2022 •

edited

Loading