Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🗃 Add _create metadata to snapshots to avoid getCommittedOpVersion() #659

Merged
merged 1 commit into from
May 28, 2024

Conversation

alecgibson
Copy link
Collaborator

Supersedes #657

The DB.getCommitedOpVersion() function is only used in one place: in a corner case that differentiates between two "blind", versionless (created without first fetching) create op conflict cases:

  • a versionless create op that conflicts with another create op
  • a versionless create op that has been re-submitted because the connection was closed before the ack was received

The first of these cases should result in an error, and the second is non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special DB.getCommittedOpVersion() function, which - given an op with a src and seq combination - will return null if the op hasn't been committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a duplicate, and we can safely ignore it.

This approach is problematic, because:

This change proposes that we actually don't need this function in most cases, and implements an alternative approach to differentiate between the above cases.

When creating an snapshot, we store the src, seq and v of the op that was used to create it.

If a conflicting create is received, we can then compare directly with this metadata, without needing to fetch anything extra from the database.

This should work in the majority of cases, but won't work if the metadata is missing, which could happen if:

  • the snapshot is "old": it was created before this change
  • the driver doesn't support metadata (eg Postgres)

In the case where metadata is unavailable, we fall back to the existing method, using getCommittedOpVersion(). However, if drivers support metadata, this should happen sufficiently infrequently that consumers could potentially remove the src/seq/v index with no noticeable performance penalty.

@@ -290,6 +299,20 @@ SubmitRequest.prototype._shouldSaveMilestoneSnapshot = function(snapshot) {
return this.saveMilestoneSnapshot;
};

SubmitRequest.prototype._fetchCreateOpVersion = function(callback) {
var create = this.snapshot.m._create;
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that this approach has one pitfall: it opens up a new corner case where
an op may be incorrectly rejected instead of acked:

  1. Client A submits Create 1
  2. Create 1 committed
  3. Client A disconnects before receiving the ack
  4. Client B submits and commits Delete, then Create 2
  5. Client A reconnects, and resubmits Create 1
  6. Create 1 doesn't match the current snapshot metadata for Create 2, so it's rejected with DOC_ALREADY_CREATED instead of the non-fatal OP_ALREADY_SUBMITTED

I don't know how much we care about this? It definitely is a bug that didn't used to exist, but I'm not sure it's important enough to keep leaning on getCommittedOpVersion() all the time?

In my personal opinion, if two clients are creating/deleting/recreating the same Doc in such a close span of time, errors are bound to happen anyway, since creates/deletes don't have great transformation behaviour. And if this is an expected use case, consumers should already have robust error-handling in place for this sort of conflict.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, this should be a vanishingly small edge case in practice, and Client A reconnecting may run into other errors while reconnecting anyways that would require it to re-initialize.

Base automatically changed from create-collide-tests to master May 23, 2024 07:14
@alecgibson alecgibson force-pushed the src-seq-create-meta branch from 9f08977 to 7ba7f0b Compare May 23, 2024 07:15
@alecgibson alecgibson marked this pull request as ready for review May 23, 2024 07:15
@alecgibson alecgibson force-pushed the src-seq-create-meta branch from 7ba7f0b to efabb26 Compare May 23, 2024 07:20
@coveralls
Copy link

coveralls commented May 23, 2024

Coverage Status

coverage: 97.445% (+0.006%) from 97.439%
when pulling 5e5f1b4 on src-seq-create-meta
into f4effa3 on master.

Comment on lines 224 to 225
if (fields) fields.$submit = false;
getSnapshot.call(this, collection, id, fields, snapshotOptions, callback);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit worried $submit has or in the future will have other side effects, so it'd be safer to manually strip the metadata from the resulting snapshot

@@ -290,6 +299,20 @@ SubmitRequest.prototype._shouldSaveMilestoneSnapshot = function(snapshot) {
return this.saveMilestoneSnapshot;
};

SubmitRequest.prototype._fetchCreateOpVersion = function(callback) {
var create = this.snapshot.m._create;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per discussion, this should be a vanishingly small edge case in practice, and Client A reconnecting may run into other errors while reconnecting anyways that would require it to re-initialize.

Supersedes #657

The `DB.getCommitedOpVersion()` function is only used [in one place][1]:
in a corner case that differentiates between two "blind", versionless
(created without first fetching) create op conflict cases:

 - a versionless create op that conflicts with another create op
 - a versionless create op that has been re-submitted because the
   connection was closed before the ack was received

The first of these cases should result in an error, and the second is
non-fatal and the error can be swallowed.

At the moment, this differentiation is made using the special
`DB.getCommittedOpVersion()` function, which - given an op with a
`src` and `seq` combination - will return `null` if the op hasn't been
committed, or its version number if it has.

If the op has a committed version number, we know that the submit is a
duplicate, and we can safely ignore it.

This approach is problematic, because:

 - it [needs a whole op index][2] just to handle this niche corner case
 - the default behaviour [fetches **all** ops][3] unless the driver
   overrides this behaviour

This change proposes that we actually don't need this function in most
cases, and implements an alternative approach to differentiate between
the above cases.

When creating an snapshot, we store the `src`, `seq` and `v` of the op
that was used to create it.

If a conflicting `create` is received, we can then compare directly with
this metadata, without needing to fetch anything extra from the
database.

This should work in the majority of cases, but won't work if the
metadata is missing, which could happen if:

 - the snapshot is "old": it was created before this change
 - the driver doesn't support metadata (eg [Postgres][4])

In the case where metadata is unavailable, we fall back to the existing
method, using `getCommittedOpVersion()`. However, if drivers support
metadata, this should happen sufficiently infrequently that consumers
could potentially remove the `src`/`seq`/`v` index with no noticeable
performance penalty.

[1]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/submit-request.js#L112
[2]: share/sharedb-mongo#94
[3]: https://github.com/share/sharedb/blob/7b20313ded4c302b9416cc6c7821694a7fa491b8/lib/db/index.js#L69
[4]: https://github.com/share/sharedb-postgres/blob/499702acd478645bcc249fa50ba6fc066d257d04/index.js#L140
@alecgibson alecgibson force-pushed the src-seq-create-meta branch from efabb26 to 5e5f1b4 Compare May 28, 2024 16:20
@alecgibson alecgibson merged commit eff464d into master May 28, 2024
7 checks passed
@alecgibson alecgibson deleted the src-seq-create-meta branch May 28, 2024 16:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants