-
Notifications
You must be signed in to change notification settings - Fork 88
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tdl 15459 update bookmarking for transactions #141
base: crest-work
Are you sure you want to change the base?
Tdl 15459 update bookmarking for transactions #141
Conversation
tests/test_bookmarks_updated.py
Outdated
self.assertGreaterEqual(replication_key_value, simulated_bookmark_value, msg="Second sync records do not respect the previous bookmark") | ||
# verify the 2nd sync bookmark value is the max replication key value for a given stream | ||
self.assertLessEqual(replication_key_value, second_bookmark_value_utc, msg="Second sync bookmark was set incorrectly, a record with a greater replication key value was synced") | ||
# The `transactions` stream is a child of th `orders` stream. Hence the bookmark for transactions is solely dependent on the value of bookmark in 'transaction_orders' which stores the parent record's bookmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# The `transactions` stream is a child of th `orders` stream. Hence the bookmark for transactions is solely dependent on the value of bookmark in 'transaction_orders' which stores the parent record's bookmark. | |
# The `transactions` stream is a child of the `orders` stream. Hence the bookmark for transactions is solely dependent on the value of bookmark in 'transaction_orders' which stores the parent record's bookmark. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
tests/base.py
Outdated
# `transactions` is child stream of `orders` stream which is incremental. | ||
# We are writing a separate bookmark for the child stream in which we are storing | ||
# the bookmark based on the parent's replication key. | ||
# But, we are not using any fields from the child record for it. | ||
# That's why the `transactions` stream does not have replication_key but still it is incremental. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# `transactions` is child stream of `orders` stream which is incremental. | |
# We are writing a separate bookmark for the child stream in which we are storing | |
# the bookmark based on the parent's replication key. | |
# But, we are not using any fields from the child record for it. | |
# That's why the `transactions` stream does not have replication_key but still it is incremental. | |
# `transactions` is child stream of `orders` stream which is incremental. | |
# We are writing a separate bookmark for the child stream in which we are storing | |
# the bookmark based on the parent's replication key. | |
# But, we are not using any fields from the child record for it. | |
# That's why the `transactions` stream does not have replication_key but still it is incremental. |
Also, format this comment at every place in the code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let us walk through the code on Monday
…into TDL-15459-update-bookmarking-for-transactions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can generalise hardcoded tuple ('transactions') used at multiple location in tests. I found total 9 such instances in different TCs.
tests/test_start_date.py
Outdated
if target_value: | ||
if target_value and stream not in ('transactions'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Look like the indentation level is off here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the indentation
incremental_streams = {key for key, value in self.expected_replication_method().items() | ||
if value == self.INCREMENTAL and key in testable_streams} | ||
if value == self.INCREMENTAL and key in testable_streams and key not in ('transactions')} | ||
|
||
# Our test data sets for Shopify do not have any abandoned_checkouts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The abandoned_checkouts stream should remain under test. It is incorrectly removed on crest master. See base.py init
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
# verify the 2nd sync replication key value is greater or equal to the 1st sync bookmarks | ||
self.assertGreaterEqual(replication_key_value, simulated_bookmark_value, msg="Second sync records do not respect the previous bookmark") | ||
# verify the 2nd sync bookmark value is the max replication key value for a given stream | ||
self.assertLessEqual(replication_key_value, second_bookmark_value_utc, msg="Second sync bookmark was set incorrectly, a record with a greater replication key value was synced") | ||
|
||
# verify that we get less data in the 2nd sync | ||
# collects has all the records with the same value of replication key, so we are removing from this assertion |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the manipulated state should be altered so this does not happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, for the transactions stream, we had 2 bookmarks earlier i.e. transactions and transaction_orders(which stores the bookmark of the parent i.e. orders bookmark). However, as this card suggested removing the filtering of the transactions based on the transactions bookmark, we have now removed the transaction_orders completely. Hence this assertion of checking the replication key value against the bookmark value would now actually check the transaction_orders bookmark value against the replication value of the transactions. Thus, we have skipped this assertion for the stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I was not clear, I meant for the collections
stream not the transactions
. Can more data be generated, and the state injected for that stream be moved so that not less records are coming through on the second sync?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have generated more data for the collections
stream and updated the simulated state. Hence we don't have to skip it anymore!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
tests/test_bookmarks_updated.py
Outdated
self.assertLess(second_sync_count, first_sync_count, | ||
msg="Second sync does not have less records, bookmark usage not verified") | ||
|
||
# verify that we get atleast 1 record in the second sync | ||
if stream not in ('collects'): | ||
if stream not in ('collects', 'transactions'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If there is no data replicated for these streams on the 2nd sync, then I don't think there is much benefit to having them in this test. They should be marked as not under test and removed from expected_streams
. The expected_streams
or some variable like that should drive the whole test including table selection, assertions, etc. That way adding/removing streams to the test in the future is just a matter of updating a set or list in this test file. It should be done for each test not in base.py.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, we had data for transactions, hence added back the stream for that particular assertion
Description of change
created_at
QA steps
Risks
Rollback steps