Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Losing updates when using afterTimestamp and afterId #95

Open
nathansalter opened this issue Apr 1, 2020 · 2 comments
Open

Losing updates when using afterTimestamp and afterId #95

nathansalter opened this issue Apr 1, 2020 · 2 comments

Comments

@nathansalter
Copy link

Hello,

I started to look through the documentation and I noticed a slight problem with the afterTimestamp/afterId method of iterating through pages. This is fine for pages in the past, but pages happening after the current timestamp can possibly lose items from the subsequent pages. Consider the following updates in this sequence:

Operation ID Timestamp
Create 1 1585747490
Create 2 1585747491
Update 5 1585747492
Update 4 1585747492
Create 7 1585747493

Now if a client views a page between the Update of ID 5 and the Update of ID 4, they will get this page:

Operation ID Timestamp
Create 1 1585747490
Create 2 1585747491
Update 5 1585747492

However, using the afterId of 5 and afterTimestamp of 1585747492 will produce this next page:

Operation ID Timestamp
Create 7 1585747493

Because the ID for 4 is sorted after the afterTimestamp so cannot appear in the page starting with the afterId of 5 so the update is lost.

I'm not sure how this problem could be fixed in the specification, except by removing the requirement that items MUST be sorted by id and instead allowing them to be sorted by the order in which the updates actually occur. We've mitigated this issue slightly by using microseconds rather than seconds in the timestamp but the problem could still occur on high-traffic websites.

Does anyone have any better suggestions?

@nickevansuk
Copy link
Contributor

nickevansuk commented Apr 1, 2020

Great find - and somewhat related to the race condition challenge.

As you say using a timestamp field with a high degree of accuracy certainly mitigates it, but to completely mitigate the issue you've identified it is also necessary to filter out items with a "modified" date after 2 seconds in the past, to delay items appearing in the feed. As the above linked guidance mentions, many systems still exhibit small variances in the timestamps they provide, which this also covers.

This delay allows items with a specific modified value only to be read after it is not possible for further items to be allocated to the same modified value - which should solve the above?

As you've pointed out, this issue exists in all cases - such that the above delay should be implemented as standard practice, and not only where transactions are involved.

What do you think?

@nathansalter
Copy link
Author

Delaying items displaying in the feed definitely stops this issue, as long as you are using a consistent time source in the database. If you're not, you'd have to use change numbers instead anyway so that's not an issue.

I think recommending using higher accuracy timestamps and only displaying items a few seconds in the past definitely mitigates this issue and should stop it from being a problem. Great idea!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants