Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize paginated getChildren #8

Merged
merged 7 commits into from
Dec 15, 2020

Conversation

huizhilu
Copy link

@huizhilu huizhilu commented Nov 16, 2020

Description

Resolves #7

Changes for the PR:

  1. On zk server, use a formulate to pre-calculate the children's packet length when adding the children to the returning list, so the children packet is ensured within the jute.maxbuffer limit. Formula: totalLength = (CHILD_EXTRA_BYTES + childLength) * numChildren + BASE_BYTES. As an example and tested in zk 3.6.x, the formula is like: total packet length = (4 + childNameLength) * numChildren + 100
  2. Offer an option to return all children in the paginated getChildren API: when maxReturned == MAX_INT, minCzxid == -1. The pagination requests will be handled in the getChildren API, until there is no more page of children left.
  3. Optimize the sorting logic for paginated getChildren in DataTree: only when pagination is needed, children are sorted. 99% of the case for returning all children, the children could be returned in the 1st page. No need to sort them to avoid performance downgrade.
  4. Returning a list of child name string for the API, instead of a list of PathWithStat.
  5. Add tests to cover additional logic.

Run Test

zookeeper-server

[INFO] Apache ZooKeeper - Server .......................... SUCCESS [20:05 min]

[INFO] Results:
[INFO]
[WARNING] Tests run: 3011, Failures: 0, Errors: 0, Skipped: 3
[INFO]
[INFO] ------------------------------------------------------------------------
[INFO] BUILD SUCCESS
[INFO] ------------------------------------------------------------------------
[INFO] Total time:  13:09 min
[INFO] Finished at: 2020-12-14T19:28:25-08:00
[INFO] ------------------------------------------------------------------------

@huizhilu huizhilu marked this pull request as ready for review November 18, 2020 09:13
@huizhilu huizhilu requested a review from junkaixue November 19, 2020 01:59
@huizhilu huizhilu force-pushed the pagination branch 3 times, most recently from 5b799eb to 7fc3e33 Compare November 19, 2020 05:05
Copy link

@junkaixue junkaixue left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed first part. Will review the later ones.

@huizhilu huizhilu force-pushed the pagination branch 7 times, most recently from 1aa85ac to 4ac2e1a Compare November 20, 2020 17:57
@huizhilu huizhilu force-pushed the pagination branch 3 times, most recently from 37adb17 to 60d5e51 Compare November 20, 2020 23:48
@huizhilu huizhilu requested a review from junkaixue November 20, 2020 23:57
Copy link

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no test. And I think some of the business logic has not been completed yet. Please comment if I missed anything.

Please add enough tests to cover the new logic.

Copy link
Author

@huizhilu huizhilu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tried to split the PR into smaller ones. Since the logic on client and server in the original patch is not independent, I can not split the changes and make the tests pass. So I decided to put all the changes in one PR that we could review the logic.

Copy link

@jiajunwang jiajunwang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that I click Approve by mistake...

Copy link

@kaisun2000 kaisun2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we add a todo to have a test for the the case that I raised as a followup, namely not clearing the hash set having may miss some data?

@huizhilu
Copy link
Author

Shall we add a todo to have a test for the the case that I raised as a followup, namely not clearing the hash set having may miss some data?

If we want to add a unit test, we have to mock the required states for the API to simulate the read/write race condition. It'll take some time for me to complete the test. Since the logic is carefully reviewed and manually tested, I will merge the code first and create an issue (#9) for adding the unit test later.

@huizhilu huizhilu merged commit 642bb52 into linkedin:li-dev/3.6.2-1-pagination Dec 15, 2020
junkaixue pushed a commit that referenced this pull request Dec 21, 2020
The optimizes the paginated getChildren API to satisfy use cases better. The changes include:
-  On zk server, uses a formulate to pre-calculate the children's packet length when adding the children to the returning list, so the children packet is ensured within the jute.maxbuffer limit.
- Adds an API getAllChildrenPaginated to fetch all children by leveraging the paginated getChildren API.
- Optimize the sorting logic for paginated getChildren in DataTree: only when pagination is needed, children are sorted. 99% of the case for returning all children, the children could be returned in the 1st page. No need to sort them to avoid performance downgrade.
- Returning a list of child name string for the API, instead of a list of PathWithStat
- Add more tests to cover the new logic
narendly pushed a commit that referenced this pull request Nov 23, 2021
The optimizes the paginated getChildren API to satisfy use cases better. The changes include:
-  On zk server, uses a formulate to pre-calculate the children's packet length when adding the children to the returning list, so the children packet is ensured within the jute.maxbuffer limit.
- Adds an API getAllChildrenPaginated to fetch all children by leveraging the paginated getChildren API.
- Optimize the sorting logic for paginated getChildren in DataTree: only when pagination is needed, children are sorted. 99% of the case for returning all children, the children could be returned in the 1st page. No need to sort them to avoid performance downgrade.
- Returning a list of child name string for the API, instead of a list of PathWithStat
- Add more tests to cover the new logic
rahulrane50 pushed a commit to linkedin/zkbridge that referenced this pull request Apr 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants