Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for fetching a particular version of a snapshot #220

Merged
merged 28 commits into from
Aug 13, 2018
Merged

Add support for fetching a particular version of a snapshot #220

merged 28 commits into from
Aug 13, 2018

Conversation

alecgibson
Copy link
Collaborator

This change follows on from discussion in this issue. Its primary
aim is to allow clients to fetch an historical version of a document
(a snapshot), by providing either:

  • a version number, or
  • a desired Date

The entry-point for this feature is added to the Connection class,
deliberately separate from the Doc, because Doc is concerned with
"live" document actions such as subscribing and submitting ops,
whereas fetching an historical version of a document should not be
associated with these ideas.

The feature is called with:

connection.getSnapshot(collection, id, version, callback): void;

The details of the interface are detailed in the README, and in the
code documentation.

This change includes support for projections, and use of the
readSnapshots middleware. It also hooks into Connection's
hasPending method.

Performance optimisations are deemed out-of-scope for this change (see
the issue for more details).

Note that this change also adds a development dependency on lolex
which is used for mocking the time.

@coveralls
Copy link

coveralls commented Jun 29, 2018

Coverage Status

Coverage increased (+0.2%) to 95.39% when pulling e80fe46 on alecgibson:get-snapshot into 08f3339 on share:master.

@alecgibson
Copy link
Collaborator Author

Things that may need discussion:

  1. Function signature
  2. Treatment of deleted documents
  3. Anything I've missed

Function signature

In the original issue, we had discussed the following signature:

Connection.prototype.getSnapshot(collection: string, id: string, { version: number, timestamp: Date }, (Error, Snapshot) => void): void;

However, in this change I have implemented the slightly different:

Connection.prototype.getSnapshot(collection: string, id: string, version: (number|Date), (Error, Snapshot) => void): void;

This is because version should really be either/or - you should never be able to have both a version number defined and a Date. Also, if the idea is to ease further types of argument, then dealing with the possible combinations of arguments grows prohibitive and ugly.

Treatment of deleted documents

For now, I have added a deleted flag to the returned snapshot, which indicates if the requested snapshot was deleted. However, I was unsure of how this should act. Other possibilities include:

  • if the last op of a doc is a deletion (ie the doc is currently deleted), then return a deleted document error on any requested version
  • if the requested op of a doc is a deletion, then return a deleted document error

Anything I've missed

I'm very new to this codebase, so I've doubtlessly missed something. I've tried to implement everything I could see (including projections, middleware and hooking into Connection.hasPending), but please let me know if I've missed anything.

README.md Outdated
Collection name of the snapshot
* `id` _(String)_
ID of the snapshot
* `version` _(number | Date)_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you make a difference between a number that represent a version and one that represent a date ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you don't support a date passed has a number. Ok, from my understanding of the ticket this was supposed to be implemented but it's not an issue for me 👍

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I don't think there's any way we could do that sensibly without having two separate methods (which was my original proposal).

There's no real way to tell the difference between a really, really, really high version number and a really, really, really old diff.


it('applies the projection to a snapshot', function (done) {
backend.connect().getSnapshot('bookTitles', 'don-quixote', 2, function (error, snapshot) {
if (error) done(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may call twice done

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch

beforeEach(function (done) {
var doc = backend.connect().get('books', 'catch-22');
doc.create({ title: 'Catch 22' }, function (error) {
if (error) done(error);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you may call twice done

Copy link
Contributor

@gkubisa gkubisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alecgibson thanks for implementing this feature. It looks great in general although I found a number of small issues which will need to be resolved before I can merge this PR. Thanks again 👍

README.md Outdated
id: string; // ID of the snapshot
version: number; // version number of the snapshot
timestamp: number; // the UNIX timestamp of the snapshot
deleted: boolean; // true if the returned version is a deleted snapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we replace deleted with type for consistency with Doc? If a document exists at a particular version, type would be populated as for Doc, otherwise it would be null.

Re #218 (comment)

Also, regarding deleted documents, what's the expected behaviour? If a document is currently deleted (ie its final op is a deletion op), then do we tell the client that the document is deleted, regardless of version they try to fetch?

In my opinion the last recorded operation should not affect a snapshot in this way because a snapshot is a read-only view of the document at a specific version, while the last recorded operation changes over time, eg deleting and re-creating a document.

lib/agent.js Outdated
@@ -300,6 +300,8 @@ Agent.prototype._handleMessage = function(request, callback) {
var op = this._createOp(request);
if (!op) return callback({code: 4000, message: 'Invalid op message'});
return this._submit(request.c, request.d, op, callback);
case 'sv':
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sv - snapshot version? What do you think about sf - snapshot fetch? It's probably more consistent with the other actions.

Maybe eventually we could also update f -> df, s -> ds and u -> du (d for document).

lib/backend.js Outdated
if (op.create) {
type = types.map[op.create.type];
if (!type) return callback({ code: 4008, message: 'Unknown type' });
snapshot = op.create.data;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be snapshot = type.create(op.create.data);

lib/backend.js Outdated
}
}

if (!snapshot && !deleted) return callback({ code: 4015, message: 'Document does not exist' });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think snapshot needs to be verified... the create and apply functions should guarantee that it's always correct for the given type. Additionally, in this context we're applying operations which were applied successfully before, which gives us even more confidence in the correctness of the result. Furthermore, null, 0 and false are all valid snapshots for ot-json.

lib/backend.js Outdated
op = ops[index];

if (typeof timestamp === 'number' && op.m.ts > timestamp) {
op = ops[index - 1];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If index === 0, op will become undefined, which will cause a TypeError further down in this function. I'd actually prefer tracking version and timestamp as local variables (just like type, snapshot, etc) and limit the usage of op to within the loop.

* @param collection - the collection name of the snapshot
* @param id - the ID of the snapshot
* @param version - the version number, or Date of the snapshot to fetch. If an exact version or Date match is not made,
* then the next lowest version is returned. ie if a document has 6 versions, asking for v7 will return v6. If ops
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"then the next lowest version is returned" - should it be "then the next lower version is returned"?

* id: string; // ID of the snapshot
* version: number; // version number of the snapshot
* timestamp: number; // the UNIX timestamp of the snapshot
* deleted: boolean; // true if the returned version is a deleted snapshot
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replace deleted with type, as in README.md.

this.version = version;
} else if (version instanceof Date) {
this.timestamp = version.getTime();
} else if (!version) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would version == null be more appropriate?

SnapshotRequest.prototype._onConnectionStateChanged = function () {
if (this.connection.canSend && !this.sent) {
this.send();
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If !this.connection.canSend, then probably set this.sent to false, so that the request would be resent after re-connection.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It starts as false, and is only changed to true at the end of send().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about the following situation:

  1. Snapshot request is sent
  2. Connection is dropped
  3. Connection is restored

_onConnectionStateChanged will not re-send the request because this.sent is true. The server will not send a response to the new WebSocket connection because the request was not sent on that new WebSocket connection and the old WebSocket connection has been closed.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha cool. Still trying to piece together all the data flows around this library! Some of it's pretty confusing.

* }
*
*/
Connection.prototype.getSnapshot = function(collection, id, version, callback) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could version be an optional param?

@alecgibson
Copy link
Collaborator Author

@dcharbonnier & @gkubisa I've addressed all of your comments. Could you please re-review?

Copy link
Contributor

@curran curran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me!

Really looking forward to this feature landing.

Thank you @alecgibson for your work on this.

README.md Outdated
@@ -212,6 +212,29 @@ changes. Returns a [`ShareDB.Query`](#class-sharedbquery) instance.
* `options.*`
All other options are passed through to the database adapter.

`connection.getSnapshot(collection, id, version, callback): void;`
Get a read-only snapshot of a document at the requested version.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice documentation.

lib/backend.js Outdated
var options = { metadata: true };
var timestampIsNumber = typeof timestamp === 'number';

this._getOps(agent, index, id, 0, version, options, function (error, ops) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like if you specify a Date rather than a version number, all the ops are loaded into memory, but if you specify a version number, only the ops until that version are loaded into memory, is that right?

If this is the case, it may be worth mentioning the overhead of passing in a Date in the docs.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that's exactly right. As far as I'm aware, with the current definition for getOps, this is unavoidable, right? I'll update the docs.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, with the current definition for getOps, it appears to be unavoidable.

Although, conceivably, one could optimize that case by passing the date into the database driver, so the database query itself could specify "less than this date" (as the timestamps are there on the ops and queryable). That feels like a rabbit hole though, and would require changing the database driver API, and then implementing the change in the actual drivers. Maybe something to think about for the future, once this first iteration lands.

Thanks for updating the docs!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In fact, it looks like all ops are always loaded into memory, see https://github.com/share/sharedb-mongo/blob/master/index.js#L362, so I'd say we don't need a specific warning about getting a snapshot for a Date. https://github.com/share/sharedb-mongo will need to be optimized but that would be a separate piece of work.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine - will reset

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gkubisa Oh wow, that's surprising! I would have thought sharedb-mongo would query against the version. Good catch.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(As a side-note, is anyone currently maintaining sharedb-mongo? I've got a PR sitting around there, but it looks like nobody's touched anything in over a year?)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alecgibson This fork is active https://github.com/teamwork/sharedb-mongo/ . I'm using that fork currently as my dependency, instead of the original https://github.com/share/sharedb-mongo.

I opened a new issue just now Optimization opportunity: use "to" in getOps query share/sharedb-mongo#61

Copy link
Contributor

@gkubisa gkubisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for fixing the previous bunch of issues. I still discovered a few more but then that should be it. 👍

lib/backend.js Outdated
}
}

type = type ? { name: type.name, uri: type.uri} : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For consistency with the other messages and to limit the amount of data sent over the network, I think we should send only type.uri in the response, without the Object wrapper. type.uri will then need to be resolved into the full type object (with name, uri and all the functions) on the client side.

lib/backend.js Outdated
var type;
var snapshot;
var fetchedTimestamp;
var fetchedVersion;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could the variables be initialized?

var type = null;
var snapshot = undefined;
var fetchedTimestamp = timestamp;
var fetchedVersion = 0;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we should initialise fetchedTimestamp and fetchedVersion to these values. If the version can't be found, then we'll erroneously tell the consumer that we fetched things with these values, when we didn't.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Also the linter calls me out for initialising to undefined, which I think is a fair thing for it to complain about)

Copy link
Contributor

@gkubisa gkubisa Jul 11, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • var type = null; - I suggested it only to simplify the implicit type of the variable null | OtType, rather than undefined | null | OtType. Not a big deal either way.
  • var snapshot = undefined; - just to be explicit. I don't mind - it can be left unassigned.
  • var fetchedVersion = 0; - I think the version should be 0 for nonexistent snapshots for consistency with Doc and https://github.com/share/sharedb#data-model, especially this part of the docs: "ShareDB implicitly has a record for every document you can access.". I think it's also a good practice to stick to a single data type for each variable whenever possible. In the case of version I think it is possible and quite natural to have version === 0 for nonexistent snapshots - the same as Doc.version for nonexistent docs.
  • var fetchedTimestamp = timestamp; - similar to fetchedVersion - keep the data type the same (number). The user would know that the snapshot does not exist by checking that version === 0. Actually, it might be better to do var fetchedTimestamp = 0;, in case timestamp is not specified. Additionally, timestamp would be meaningless for nonexistent snapshots anyway.

Copy link
Collaborator Author

@alecgibson alecgibson Jul 12, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is possible and quite natural to have version === 0 for nonexistent snapshots

This feels weird to me - like how you were saying with fetching a version higher than the current version. It's possible to call this method and get two snapshots claiming to be version === 0, but with different data. But if that's how we already do it with Doc, then fine.

(Same thing for snapshots.)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I get it now... op version X in the database means that the op can be applied to a doc at version X. After the op is applied, the doc is at version X + 1. I think Backend.prototype._getSnapshot will need to be updated to reflect that, then we should both be happy. :-)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Riiiight. So creation is v1?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The initial create is applied to a doc at version 0, so after it's applied, the doc is v1.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great. Pushed a change that matches this logic. I'm now much happier about v0 being a non-existent doc.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, thanks @alecgibson 👍

lib/backend.js Outdated
for (var index = 0; index < ops.length; index++) {
var op = ops[index];
fetchedTimestamp = op.m.ts;
fetchedVersion = op.v;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd move fetchedTimestamp and fetchedVersion assignments below if (timestampIsNumber && fetchedTimestamp > timestamp), so that there would be no need to access previousOp and re-assign fetchedTimestamp and fetchedVersion, if we reach the operation past the requested timestamp.

Connection.prototype._handleSnapshot = function (error, message) {
var snapshotRequest = this.snapshotRequests[message.id];
if (!snapshotRequest) return;
snapshotRequest._handleResponse(error, message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete this.snapshotRequests[message.id];

is missing here - it'll prevent a memory leak.

version: message.version,
data: message.data,
timestamp: message.timestamp,
type: message.type
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please resolve the type's URI to a full type object.

backend.connect().getSnapshot('books', 'don-quixote');
});

it('errors if the version is -1', function (done) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

returns an empty snapshot when trying to fetch a snapshot before the document existed

I think this is great. 👍

For consistency, shouldn't we also return an empty snapshot, if asked for a version before the document existed (negative version)?

@alecgibson
Copy link
Collaborator Author

@gkubisa I've addressed your comments (apart from the variable declaration - discussed above)

@gkubisa
Copy link
Contributor

gkubisa commented Jul 11, 2018

I've been thinking about requesting snapshots for future timestamps and versions past the latest version, and I'm not sure, if the current approach is right... Currently we'd just return the latest available snapshot, however, that snapshot could be different on subsequent requests as more operations are recorded. In my opinion this is at odds with the idea of immutable snapshots. Could we return an error instead?

I would still keep the current behaviour (return the latest snapshot), if the version param of getSnapshot is omitted or null. In terms of implementation, this case could be trivially optimized be just loading the latest snapshot from the database, instead of applying all operations in order.

@dcharbonnier
Copy link
Contributor

I would even remove the date possibility, this create confusion, there should be an other api to request
{snapshotId, date}[] but there is no need to implement it on this PR, the snapshoot id is enough for this one IMO

@gkubisa
Copy link
Contributor

gkubisa commented Jul 11, 2018

I fully agree on this, @dcharbonnier. I'd be happy to drop the Date, if @alecgibson is ok with that. Alternatively we could keep the Date, until we have a suitable client API, which would make supporting Date in getSnapshot unnecessary.

@alecgibson
Copy link
Collaborator Author

I would even remove the date possibility

This change is completely useless to me if we don't have timestamps. I'm happy to break this apart into two separate methods, like getSnapshotByVersion and getSnapshotByTimestamp. I personally prefer this sort of approach, because it helps to alleviate aforementioned confusion.

requesting snapshots for future timestamps and versions past the latest version

I realise this is odd. It's certainly not idempotent. However, I would fully expect to be able to request a snapshot for the current time, and get a valid snapshot back (even if an op hasn't been recorded in some time).

How about this behaviour?

  • if the version is higher than the current document version, return an Invalid version error
  • if the timestamp is before Date.now(), existing behaviour (return a snapshot if a relevant timestamp is found, or else return current version)
  • if the timestamp is after Date.now(), return an Invalid version error

@gkubisa
Copy link
Contributor

gkubisa commented Jul 11, 2018

This change is completely useless to me if we don't have timestamps.

That's what I thought. :-)

I'm happy to break this apart into two separate methods, like getSnapshotByVersion and getSnapshotByTimestamp.

I'd be ok with two separate methods too but could the one which takes in version be called just getSnapshot? It's because getting a snapshot by version is the most natural way to get a snapshot and it's unlikely to change, whereas we might come up with a nicer API in the future, which would allow retrieving snapshots by timestamp and possibly render getSnapshotByTimestamp obsolete.

I would fully expect to be able to request a snapshot for the current time, and get a valid snapshot back (even if an op hasn't been recorded in some time).

Omitting the version/timestamp or setting it to null would always give you the latest/current snapshot.

How about this behaviour? (...)

Yes, that makes sense. 👍

One thing to keep in mind though is that the client and server time might differ, so that if the client's clock is ahead of the server's clock, the server might return an error, if the client requests the snapshot for its current time.

@alecgibson
Copy link
Collaborator Author

@gkubisa I've split it into two methods, but kept the internals basically the same. It also errors when asking for a snapshot after the current time or version. "Current time" is determined by the server.

@gkubisa
Copy link
Contributor

gkubisa commented Jul 11, 2018

Great, thanks @alecgibson !

From my side, the only remaining issue is https://github.com/share/sharedb/pull/220/files#r201677449.

Hopefully this PR will be merged soon. 👍

};

/**
* Get a read-only snapshot at a given version or time
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove timestamp references from the getSnapshot docs.

this.sent = false;
}

SnapshotRequest.prototype.hasPending = function () {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ready and hasPending are not needed because all requests in Connection.snapshotRequests are pending.

@alecgibson
Copy link
Collaborator Author

@gkubisa I've removed the hasPending method, and updated the JSDocs. I'm still not entirely sure about an "empty" snapshot having version 0, because the creation op is also version 0, which is confusing to me, but if you're happy then let's go with it.

Copy link
Contributor

@gkubisa gkubisa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alecgibson thanks for bearing with me and making all the changes! In my opinion this feature is complete but I'd like to give @nateps and @ericyhwang a chance to comment on it before merging.

In particular I'd like to confirm that it's ok to:

  • have the Connection.getSnapshotByTime method, as there might be a better way to achieve the same result - get a snapshot given a timestamp,
  • have Connection.getSnapshot and Connection.getSnapshotByTime accept a callback parameter. An alternative would be to return a Snapshot synchronously, make Snapshot inherit from EventEmitter and emit load and error events. I think that would be more in line with Doc and Query.

@alecgibson
Copy link
Collaborator Author

thanks for bearing with me and making all the changes!

No worries. Sorry if I seem impatient at times! It's challenging juggling this with my job, and I'm just eager to get this feature in.

there might be a better way to achieve the same result - get a snapshot given a timestamp

I'm all ears. Especially if there's a more efficient way of doing it!

An alternative would be to return a Snapshot synchronously

In principle I'm very okay with anything that removes callbacks, but I have these reservations:

  • producing a snapshot on a very large set of ops can take a very long time (10s of seconds, as I mentioned in my PR on sharedb-mongo), and I don't think we should block the client for anywhere near that length of time
  • it runs against the general approach taken elsewhere in the library, and also against the general JavaScript asynchronous spirit, which feels weird to me. Using a callback is an important signal to the consumer that this is an operation that could take a while, and they should be ready to deal with that (with whatever shiny spinners/loaders/apologetic messages/gifs/etc. they feel they need)

@gkubisa
Copy link
Contributor

gkubisa commented Jul 12, 2018

Sorry, I meant returning the Snapshot object synchronously in a pending state. The data would still be loaded asynchronously, populated into the Snapshot and then either load or error event would be emitted.

@alecgibson
Copy link
Collaborator Author

Aha I see. Yes, that would be more in keeping with Doc and Query. The main reason I didn't do that was to reinforce the idea that the snapshot in the callback is a dumb, read-only blob of data that has no functions or way of interacting with it.

@gkubisa
Copy link
Contributor

gkubisa commented Jul 12, 2018

I would slightly prefer returning a Snapshot, mostly for consistency and extensibility. Getting a snapshot by time could then be exposed on Snapshot, rather than on the Connection. I'd love to hear @nateps and @ericyhwang 's opinions on this too.

});

describe('a document with some simple versions a day apart', function () {
var emptySnapshot = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to update the variable names for clarity:

  • emptySnapshot -> v0
  • v0 -> v1
  • v1 -> v2
  • v2 -> v3

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch! Fixed.

lib/backend.js Outdated
Backend.prototype._getSnapshot = function (agent, index, id, version, timestamp, callback) {
version = (typeof version === 'number' && isFinite(version)) ? Math.max(0, version) : null;
var options = { metadata: true };
var timestampIsNumber = typeof timestamp === 'number';
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are we covered when we send NaN ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would have been covered by the if block, because 1 > NaN === false, but I've added an explicit check, too.

Alec Gibson added 17 commits August 2, 2018 13:23
This change makes some tweaks to the `getSnapshot` feature after code
review:

  - pass `type` around as just the URI, and resolve to the full object
    client-side
  - small tweak to variable declaration logic in `Backend._getSnapshot`
  - prevent memory leak of snapshot requests
  - update response for `-1` version fetch
This change splits `getSnapshot` apart into:

  - `getSnapshot` fetches by version number
  - `getSnapshotAtTime` fetches by timestamp

It also updates the logic to error if requesting a version after the
current version, or a timestamp after `Date.now()`, since either of
these requests would not be idempotent.
To stay consistent with the `Doc` class, this change updates the results
of `getSnapshot` to always have `version: 0` and `timestamp: 0` if no
snapshot was found (rather than having them `undefined`).
All `SnapshotRequest`s are pending, so we don't need these methods.
Instead, to check for pending, we just check if a `SnapshotRequest` is
queued.
The version attached to an operation in the database is the version
before the operation has been applied to (or the version to which the
operation can be applied).

That means that when applying op v1, the document is v1 before the op
has been applied, and then v2 after the op has been applied.

This change updates the `getSnapshot` method to return snapshots with
versions consistent with this (ie the snapshots are now 1-based instead
of 0-based).
This change adds a very simple `Snapshot` class, which currently has
no methods attached to it. Its primary use at the moment is to contain
the response to a snapshot request, and for use in `MemoryDB` in place
of `MemorySnapshot`.

As part of this change, the structure of the snapshot request return is
slightly modified to fit this class, including renaming of `version` to
`v`, and moving of `timestamp` to `m.ts`.
This change makes a couple of review markups. The biggest is the
removal of the ability to fetch a snapshot at a given time. It has been
agreed that this would be a useful API, but we need to think about the
ramifications of exposing this API before doing so (eg adding more
indexes on database adapters for lookup by time, etc.).

In order to move the core functionality forward - that is the ability to
fetch a snapshot by version - this change removes the ability to fetch
by time, and it is left (for now) to the consumer to look up the version
they need themselves.

Alongside this change, `getSnapshot` has also been renamed to
`fetchSnapshot` to imply that we touch the database layer, and the
`Snapshot` class has had its `collections` property removed to be
consistent with database adapters, whilst also making `m` a property
that is always present.
This change adds some review markups:

  - use `nf` instead of `sf` for "snapshot fetch" message type, to not
    confuse with "subscribe"
  - remove the internal `_getOps` method that was added for fetching
    metadata, which is no longer needed
  - add a `util.isInteger` method
  - rename `connection.snapshotRequests` to `_snapshotRequests` to
    signify that it's internal and the API could change
  - make sure that snapshot requests emit an event that is captured in
    `connection.whenNothingPending`
  - make snapshot requests use an incremental ID instead of a random
    one to be consistent with `Query`
  - make `SnapshotRequest` callbacks mandatory
  - make `SnapshotRequest` return a string type instead of the full
    object
@alecgibson
Copy link
Collaborator Author

@nateps @ericyhwang I've made the changes we discussed in the meeting yesterday (see commit message for details).

@adityachugh
Copy link

Great work guys, this looks extremely useful! 😊Any idea on when this may be merged?

lib/backend.js Outdated
type: snapshot.type
};

backend.trigger(backend.MIDDLEWARE_ACTIONS.readSnapshots, agent, request, function (error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should call:

var projection = ....;
backend._sanitizeSnapshots(agent, projection, collection, snapshots, cb)

lib/backend.js Outdated
return callback({ code: 4024, message: 'Requested version exceeds latest snapshot version' });
}

callback(null, {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use the Snapshot constructor here

lib/backend.js Outdated
@@ -580,6 +582,68 @@ Backend.prototype.getChannels = function(collection, id) {
];
};

Backend.prototype.fetchSnapshot = function(agent, index, id, version, callback) {
var backend = this;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add timing test around whole function. var start = Date.now() here and emit timing event in callback from sanitizeSnapshot() below, similar to backend.fetch()

lib/backend.js Outdated
};

Backend.prototype._fetchSnapshot = function (agent, index, id, version, callback) {
this.getOps(agent, index, id, 0, version, function (error, ops) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

call a this._getOps() method that does not call _sanitizeOps internally, thus not projecting nor emitting an op middleware action

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

NB: I'm directly using backend.db.getOps, because backend._getOps would have basically had the same signature and just been a wrapper.

@@ -523,12 +536,16 @@ Connection.prototype.createSubscribeQuery = function(collection, q, options, cal
Connection.prototype.hasPending = function() {
return !!(
this._firstDoc(hasPending) ||
this._firstQuery(hasPending)
this._firstQuery(hasPending) ||
this._firstSnapshotRequest(exists)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove the exists method, since we always call with this and it can simplify the _fristSnapshotRequest method below, which is an internal method

@@ -226,6 +231,9 @@ Connection.prototype.handleMessage = function(message) {
case 'bu':
return this._handleBulkMessage(message, '_handleUnsubscribe');

case 'nf':
return this._handleSnapshot(err, message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's call this _handleSnapshotFetch consistent with the message

};

SnapshotRequest.prototype._handleResponse = function (error, message) {
if (error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's emit ready here first before error checking, same as query so that we make sure we eventually call back to whenNothingPending on the connection in all cases

this.connection = connection;
this.id = id;
this.collection = collection;
this.version = util.isInteger(version) ? Math.max(0, version) : null;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's throw an error for negative versions rather than silently fixing it

test/db.js Outdated
{type: 'json0', id: '1', v: 1, data: {foo: 2, bar: 1}},
{type: 'json0', id: '2', v: 1, data: {foo: 1, bar: 2}},
{type: 'json0', id: '3', v: 1, data: {foo: 2, bar: 2}}
{ type: 'json0', id: '0', v: 1, data: {foo: 1, bar: 1}, m: null },
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's keep no spacing inside objects for consistency

This change:

  - bypasses `_sanitizeOps` and instead uses `_sanitizeSnapshot`
  - adds timing to `Backend.fetchSnapshot`
  - throws for negative versions
@alecgibson
Copy link
Collaborator Author

@ericyhwang / @nateps I've addressed the issues we spoke about in yesterday's meeting. Could you please re-review?

Copy link
Contributor

@ericyhwang ericyhwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nate asked me to take a look at the last round of changes and to take care of the rest if it looked good.

They LGTM, so I'm going to go ahead with merge+publish! 🎉

Thanks for the contribution and bearing with us through the process. With what we've learned, it should make future changes faster.

@ericyhwang ericyhwang merged commit 8705abd into share:master Aug 13, 2018
@ericyhwang
Copy link
Contributor

This has been published as [email protected]

@alecgibson alecgibson deleted the get-snapshot branch August 13, 2018 19:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants