-
Notifications
You must be signed in to change notification settings - Fork 445
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
prototype of bulk import v2 distributed file examination #4898
base: 2.1
Are you sure you want to change the base?
Commits on Sep 17, 2024
-
prototype of bulk import v2 distributed file examination
This is prototype for a few new APIs that allow distributing the examination of files for bulk import. For a given bulk import directory with N files this would support a use case like the following. 1. For eack file a task is spun up on a remote server that calls the new LoadPlan.compute() API to determine what tablets the file overlaps. Then the new LoadPlan.toJson() method is called to serialize the load plan and send it to a central place. 2. All the load plans from the remote servers are deserialized calling the new LoadPlan.fromJson() method and merged into a single load plan that is used to do the bulk import. Another use case these new APIs could support is running this new code in the map reduce job that generates bulk import data. 1. In each reducer after it produces an rfile it could then call the new LoadPlan.compute(), then call LoadPlan.toJson() and save the result to a file. So after the map reduce job completes each rfile would have corresponding file with a load plan for that file. 2. Another process that runs after the map reduce job can load all the load plans from files and merge them using the new LoadPlan.fromJson() method. Then the merged LoadPlan can be used to do the bulk import. BulkNewIT.testComputeLoadPlan() simulates this map reduce use case by going through the steps in code that a map reduce job would. This tests the new APIs and shows what using it would look like. Both of these use cases avoid doing the analysis of files on a single machine doing the bulk import. Bulk import V1 had this functionality and would ask random tservers to do the file analysis. This could cause unexpected load on those tservers. Bulk V1 would interleave analyzing files and adding them to tablets. This could lead to odd situations where files are partially imported to some tablets and analysis fails, leaving the file partially imported. Bulk v2 does all analysis before any files are added to tablets, however it lacks this distributed analysis capability. This is an initial attempt to offer that functionality in bulk v2.
Configuration menu - View commit details
-
Copy full SHA for aa593ad - Browse repository at this point
Copy the full SHA aa593adView commit details -
Configuration menu - View commit details
-
Copy full SHA for 407358a - Browse repository at this point
Copy the full SHA 407358aView commit details -
Configuration menu - View commit details
-
Copy full SHA for fd70d34 - Browse repository at this point
Copy the full SHA fd70d34View commit details
Commits on Sep 18, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 058328a - Browse repository at this point
Copy the full SHA 058328aView commit details -
Configuration menu - View commit details
-
Copy full SHA for c8c5f21 - Browse repository at this point
Copy the full SHA c8c5f21View commit details -
Configuration menu - View commit details
-
Copy full SHA for 9ef0bcf - Browse repository at this point
Copy the full SHA 9ef0bcfView commit details
Commits on Sep 19, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 368b2a4 - Browse repository at this point
Copy the full SHA 368b2a4View commit details
Commits on Sep 26, 2024
-
Configuration menu - View commit details
-
Copy full SHA for e228b68 - Browse repository at this point
Copy the full SHA e228b68View commit details
Commits on Sep 27, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 9328277 - Browse repository at this point
Copy the full SHA 9328277View commit details -
Configuration menu - View commit details
-
Copy full SHA for 3190d19 - Browse repository at this point
Copy the full SHA 3190d19View commit details -
Configuration menu - View commit details
-
Copy full SHA for 174b4e0 - Browse repository at this point
Copy the full SHA 174b4e0View commit details -
Configuration menu - View commit details
-
Copy full SHA for f82d111 - Browse repository at this point
Copy the full SHA f82d111View commit details
Commits on Sep 30, 2024
-
Revert "update pom for including sha in version"
This reverts commit 174b4e0.
Configuration menu - View commit details
-
Copy full SHA for 9c7dc66 - Browse repository at this point
Copy the full SHA 9c7dc66View commit details -
Configuration menu - View commit details
-
Copy full SHA for 4285753 - Browse repository at this point
Copy the full SHA 4285753View commit details -
Configuration menu - View commit details
-
Copy full SHA for ec0febb - Browse repository at this point
Copy the full SHA ec0febbView commit details -
Configuration menu - View commit details
-
Copy full SHA for 97e4684 - Browse repository at this point
Copy the full SHA 97e4684View commit details
Commits on Oct 1, 2024
-
Update core/src/main/java/org/apache/accumulo/core/data/LoadPlan.java
Co-authored-by: Daniel Roberts <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for aabe2d8 - Browse repository at this point
Copy the full SHA aabe2d8View commit details -
Update core/src/test/java/org/apache/accumulo/core/data/LoadPlanTest.…
…java Co-authored-by: Daniel Roberts <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 2003eae - Browse repository at this point
Copy the full SHA 2003eaeView commit details -
Update core/src/test/java/org/apache/accumulo/core/data/LoadPlanTest.…
…java Co-authored-by: Daniel Roberts <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for 667f12e - Browse repository at this point
Copy the full SHA 667f12eView commit details -
Update core/src/test/java/org/apache/accumulo/core/data/LoadPlanTest.…
…java Co-authored-by: Daniel Roberts <[email protected]>
Configuration menu - View commit details
-
Copy full SHA for a5ead55 - Browse repository at this point
Copy the full SHA a5ead55View commit details -
Configuration menu - View commit details
-
Copy full SHA for 926dec7 - Browse repository at this point
Copy the full SHA 926dec7View commit details
Commits on Oct 30, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 235945b - Browse repository at this point
Copy the full SHA 235945bView commit details