Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tdl 15459 update bookmarking for transactions #141

Open
wants to merge 21 commits into
base: crest-work
Choose a base branch
from

Conversation

namrata270998
Copy link
Contributor

@namrata270998 namrata270998 commented Mar 16, 2022

Description of change

  • Updated the bookmarking for transactions stream to not filter based on created_at

QA steps

  • Verify that the tap returns all the transactions for updated orders

Risks

Rollback steps

  • revert this branch

tap_shopify/__init__.py Show resolved Hide resolved
self.assertGreaterEqual(replication_key_value, simulated_bookmark_value, msg="Second sync records do not respect the previous bookmark")
# verify the 2nd sync bookmark value is the max replication key value for a given stream
self.assertLessEqual(replication_key_value, second_bookmark_value_utc, msg="Second sync bookmark was set incorrectly, a record with a greater replication key value was synced")
# The `transactions` stream is a child of th `orders` stream. Hence the bookmark for transactions is solely dependent on the value of bookmark in 'transaction_orders' which stores the parent record's bookmark.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# The `transactions` stream is a child of th `orders` stream. Hence the bookmark for transactions is solely dependent on the value of bookmark in 'transaction_orders' which stores the parent record's bookmark.
# The `transactions` stream is a child of the `orders` stream. Hence the bookmark for transactions is solely dependent on the value of bookmark in 'transaction_orders' which stores the parent record's bookmark.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

tests/base.py Outdated
Comment on lines 247 to 251
# `transactions` is child stream of `orders` stream which is incremental.
# We are writing a separate bookmark for the child stream in which we are storing
# the bookmark based on the parent's replication key.
# But, we are not using any fields from the child record for it.
# That's why the `transactions` stream does not have replication_key but still it is incremental.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# `transactions` is child stream of `orders` stream which is incremental.
# We are writing a separate bookmark for the child stream in which we are storing
# the bookmark based on the parent's replication key.
# But, we are not using any fields from the child record for it.
# That's why the `transactions` stream does not have replication_key but still it is incremental.
# `transactions` is child stream of `orders` stream which is incremental.
# We are writing a separate bookmark for the child stream in which we are storing
# the bookmark based on the parent's replication key.
# But, we are not using any fields from the child record for it.
# That's why the `transactions` stream does not have replication_key but still it is incremental.

Also, format this comment at every place in the code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated

Copy link
Contributor

@KrisPersonal KrisPersonal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let us walk through the code on Monday

…into TDL-15459-update-bookmarking-for-transactions
Copy link
Contributor

@RushiT0122 RushiT0122 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can generalise hardcoded tuple ('transactions') used at multiple location in tests. I found total 9 such instances in different TCs.

tests/base.py Outdated Show resolved Hide resolved
if target_value:
if target_value and stream not in ('transactions'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Look like the indentation level is off here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the indentation

incremental_streams = {key for key, value in self.expected_replication_method().items()
if value == self.INCREMENTAL and key in testable_streams}
if value == self.INCREMENTAL and key in testable_streams and key not in ('transactions')}

# Our test data sets for Shopify do not have any abandoned_checkouts
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abandoned_checkouts stream should remain under test. It is incorrectly removed on crest master. See base.py init

Copy link
Contributor Author

@namrata270998 namrata270998 Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abandoned checkouts are added back in another PR in this commit which is not merged yet.

# verify the 2nd sync replication key value is greater or equal to the 1st sync bookmarks
self.assertGreaterEqual(replication_key_value, simulated_bookmark_value, msg="Second sync records do not respect the previous bookmark")
# verify the 2nd sync bookmark value is the max replication key value for a given stream
self.assertLessEqual(replication_key_value, second_bookmark_value_utc, msg="Second sync bookmark was set incorrectly, a record with a greater replication key value was synced")

# verify that we get less data in the 2nd sync
# collects has all the records with the same value of replication key, so we are removing from this assertion
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the manipulated state should be altered so this does not happen.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, for the transactions stream, we had 2 bookmarks earlier i.e. transactions and transaction_orders(which stores the bookmark of the parent i.e. orders bookmark). However, as this card suggested removing the filtering of the transactions based on the transactions bookmark, we have now removed the transaction_orders completely. Hence this assertion of checking the replication key value against the bookmark value would now actually check the transaction_orders bookmark value against the replication value of the transactions. Thus, we have skipped this assertion for the stream.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I was not clear, I meant for the collections stream not the transactions. Can more data be generated, and the state injected for that stream be moved so that not less records are coming through on the second sync?

Copy link
Contributor Author

@namrata270998 namrata270998 May 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have generated more data for the collections stream and updated the simulated state. Hence we don't have to skip it anymore!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

self.assertLess(second_sync_count, first_sync_count,
msg="Second sync does not have less records, bookmark usage not verified")

# verify that we get atleast 1 record in the second sync
if stream not in ('collects'):
if stream not in ('collects', 'transactions'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If there is no data replicated for these streams on the 2nd sync, then I don't think there is much benefit to having them in this test. They should be marked as not under test and removed from expected_streams. The expected_streams or some variable like that should drive the whole test including table selection, assertions, etc. That way adding/removing streams to the test in the future is just a matter of updating a set or list in this test file. It should be done for each test not in base.py.

Copy link
Contributor Author

@namrata270998 namrata270998 Apr 22, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My bad, we had data for transactions, hence added back the stream for that particular assertion

@namrata270998 namrata270998 requested a review from kspeer825 April 22, 2022 12:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants