-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
replication poc #406
base: main
Are you sure you want to change the base?
replication poc #406
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ludydoo The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/test pull-core-backend-test |
a485411
to
aa20127
Compare
Kudos, SonarCloud Quality Gate passed! 0 Bugs No Coverage information |
} | ||
|
||
export type PutRecordRequest = { | ||
isReplication?: boolean; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm a bit confused about the terms revision/replication? do you use them interchangeably or is there a difference in meaning?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, what problem was solved by adding this? why do we need to differentiate between original and copy? is this just a shortcut to not have to check for revisions every time?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's mostly due to (when we enable offline), that multiple people might edit the same records. When that happens, it's unclear whose version of the record is "the good one". To "solve" this, we store all the edits as different "revisions". Then, the engine chooses one of the revisions as being the "good one", but we can still present to the user that there was a conflict, and the user could resolve the conflict by choosing the "right one" manually.
It's also needed for replication/synchronization between different devices. Basically, the devices would pull the revisions of the records that they don't know about. And since the process of "choosing the good one" is deterministic, all users will always elect the same revision all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case isReplication
is exclusively used by the replicator. The engine assumes that the revision being put
is known by another engine
.
} | ||
hash := str[len(str)-32:] | ||
// the hash can only contain hex characters | ||
for _, c := range hash { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
couldn't we do this with a regex?
specifially the hex check, but also the length checks?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could for sure, but this is probably way more performant than compiling + executing a regex! That will add up if there is a lot of traffic. Also I tried to make this package depend on as little libraries as possible.
GetTable(ctx context.Context, request api.GetTableRequest) (api.Table, error) | ||
GetChanges(ctx context.Context, request api.GetChangesRequest) (api.Changes, error) | ||
PutRecord(ctx context.Context, request api.PutRecordRequest) (api.Record, error) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems like there's no method to create a new record? am I missing something, or is there a trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
PutRecord
;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the id in the request irritated me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we have to specify an id when we create/update a record !
// A record is a collection of fields. | ||
// A record has a unique id. | ||
// A field is a named value. | ||
// A value is a string, a number, or a boolean. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no date/time fields?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not yet! This was just for POC...
// The <num> is an incrementing number. | ||
// The <hash> is a hash of the record. | ||
// | ||
// Two records might have the same number but different hashes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the hash is of the original record, how can two revisions of the same record have a different hash?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
or, if the hash is of the new version of the record, I don't understand yet how the revisions are linked to their original record
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's say
- you and I checkout a record with revision
1-aaa
- We go offline
- We both update the record offline
- We go online again
I created a new revision with hash 2-nnn
and you 2-mmm
. Both records started at 1-aaa
, and they are both at the "2nd" revision, but they have different hashes because their content is different.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would for some reason update the same record 1-aaa
with the same contents on both of our sides, then we would compute the same hash, so that would be the same revision. (both you and i would get like 2-ggg
and 2-ggg
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
like
# we checkout this record
{"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}}
# we go offline
# you update your record to have {"foo":"baz"}, so your store contains
{"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}}
{"id":"some_id", "revision":"2-bbb", "attributes: {"foo":"baz"}}
# i update my record to have {"foo":"snip"} so my store contains
{"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}}
{"id":"some_id", "revision":"2-ccc", "attributes: {"foo":"snip"}}
# we go online
# we reconcile with the server, so the server contains
{"id":"some_id", "revision":"1-aaa", "attributes: {"foo":"bar"}}
{"id":"some_id", "revision":"2-bbb", "attributes: {"foo":"baz"}}
{"id":"some_id", "revision":"2-ccc", "attributes: {"foo":"snip"}}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the example. the concept with the alphabetical order was quite clear, I just didn't realize that the revision is a property on the record. I thought the 2-mmm was supposed to somehow lead to the record itself, so the hash changing confused me. but if the revision is a property on the record, the hash is not relevant for the link.
My understanding of the roadmap is that we had decided to do a centralised/online only version of core. The majority of this POC seems to be focussed on the concepts of revisions, p2p, and reconciliation. Our goal at the moment is to design our data model and based on this POC I'm not sure how to progress with that goal. I guess I'm not too sure of your aim of this pr. Are you suggesting we should be building something along these lines for the architecture refactor or is this more to demonstrate your ideas for work in the further future? I do appreciate the point of having a single entry point of an I'd also like to revisit the pros/cons of using a revision based approach vs a crdt system. |
Something like this would separate the concerns of the engine a bit better
|
I was thinking about this a bit more and have realised that you're showing how we could do this without actually having a data model stored in the db. The issues that come to mind with this approach:
Seen as we already need to define a data model for the api, then storing this isn't that much extra and gives us some benefits. |
This PR is a proof of concept for the offline/reconciliation db engine that would power core
Features
The requirements of this engine are that
Folders
api
contains mostly public types and public methods that are meant to be exposed.client
contains the http client (for now). Could add grpc, websockets, etc.engine
contains the internal logic for the DB enginehandler
contains the http handlerstest
contains test utilsutils
contains utils, such as auuid
generatorThe interface is very simple and relatively self-explanatory
Records
The records have this structure
Record Serialization
Records have a special serialization mechanism, inspired from DynamoDB. It basically encodes any data type into a string to prevent any floating point approximation, which would undoubtedly break the hashing.
Table structure
Each new table gives birth to two tables
<table>
<table>_history
Where
<table>
holds the reconciled version of the record and where<table>_history
stores the different versions of the record.