-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
introduce common operations directory for OSB workloads #514
Conversation
Had some discussions offline with @OVI3D0. Michael proposed that instead of creating a single shared_operations file that contains the common setup steps in each workload, we should go forward with a modular approach. However, although most workloads have the same chronological steps, some workloads are missing some steps. Adding comments here for visibility:
Some workloads like eventdata or so are missing one of these steps (like a refresh after force merge or refresh after indexing) but let's go with your idea of streamlining them to be the same. Refreshes should not impact or change the behavior of the workload. I think one of the workloads has like a put-settings operation before it deletes the index. You're probably aware but just be sure to keep that operation. To add to your idea, what do you think of doing this?
|
80c8492
to
f7d5cfa
Compare
common_operations/force_merge.json
Outdated
{ | ||
"operation": { | ||
"operation-type": "force-merge", | ||
"request-timeout": 7200 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@OVI3D0 Could you add these params to allow users to manage segments?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, didn't see this. Added now👍
After some more discussions offline with @IanHoang, we came up with a new configuration for the common operations. create-index typically takes an with this latest revision, the default index_settings value has been renamed to a new default_index_settings value; and after a change to the create_index operation, if no index_settings value is passed, OSB should fall back to the default_index_settings value. After testing this and running a curl for my cluster settings, I can see the default settings are still being passed through. This will allow us to reduce these common operations down to something like this:
In the case of the geonames workload, this will reduce the file size by half! |
{ | ||
"operation": { | ||
"operation-type": "create-index", | ||
"settings": {{index_settings | default(default_index_settings | default({})) | tojson}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious, is default({}))
necessary if the workload has nothing specified?
{% with default_index_settings={} %}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found that when I removed that final default({}) I was getting jinja2 templating errors, even if I specified default_settings={} in the geonames workload
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's necessary because when default_index_settings
is not specified, the {}
is needed for the JSON to be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left two comments but the rest looks good to me
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
Signed-off-by: Michael Oviedo <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will really improve the workloads, thanks for working on it @OVI3D0. Will this model be applied to all the workloads in a separate check-in? I notice that only geonames
is enhanced here.
{ | ||
"operation": { | ||
"operation-type": "create-index", | ||
"settings": {{index_settings | default(default_index_settings | default({})) | tojson}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's necessary because when default_index_settings
is not specified, the {}
is needed for the JSON to be valid.
common_operations/force_merge.json
Outdated
{ | ||
"operation": { | ||
"operation-type": "force-merge", | ||
"request-timeout": 7200{%- if max_num_segments is defined %}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you feel request_timeout
should be parameterized here, so someone who wants to override it can do it? It's unlikely, but you never know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm of the opinion that there's no harm in parameterizing most of the workload values. I added a param for request-timeout in the latest revision. Let me know what you think!
common_operations/index_append.json
Outdated
@@ -0,0 +1,10 @@ | |||
{ | |||
"operation": "index-append", | |||
"warmup-time-period": 120, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar suggestion as the force merge request timeout.. If you do add parameters, the README's will need to be updated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar to the request-timeout comment, I added a param for this as well.
Yeah I will change the other workloads in separate PR's, I wanted to include geonames here as an example of the changes I plan to apply. |
Signed-off-by: Michael Oviedo <[email protected]>
{ | ||
"operation": { | ||
"operation-type": "force-merge", | ||
"request-timeout": {{ request_timeout | default(60) | tojson }}{%- if max_num_segments is defined %}, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be better to make this explicit, since it applies only to the force-merge operation. So calling it something like force_merge_request_timeout
will make it clear it does not apply to all operations in general.
I'd do this in the next set of changes, just so this PR can go in as is.
* restructure common operations folder based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * refactor common ops changes based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * rework common ops configuration Signed-off-by: Michael Oviedo <[email protected]> * add missing params and update geonames workload Signed-off-by: Michael Oviedo <[email protected]> * parameterize request-timeout and warmup-time-period Signed-off-by: Michael Oviedo <[email protected]> --------- Signed-off-by: Michael Oviedo <[email protected]> (cherry picked from commit 5119393) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * refactor common ops changes based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * rework common ops configuration Signed-off-by: Michael Oviedo <[email protected]> * add missing params and update geonames workload Signed-off-by: Michael Oviedo <[email protected]> * parameterize request-timeout and warmup-time-period Signed-off-by: Michael Oviedo <[email protected]> --------- Signed-off-by: Michael Oviedo <[email protected]> (cherry picked from commit 5119393) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * refactor common ops changes based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * rework common ops configuration Signed-off-by: Michael Oviedo <[email protected]> * add missing params and update geonames workload Signed-off-by: Michael Oviedo <[email protected]> * parameterize request-timeout and warmup-time-period Signed-off-by: Michael Oviedo <[email protected]> --------- Signed-off-by: Michael Oviedo <[email protected]> (cherry picked from commit 5119393) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * refactor common ops changes based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * rework common ops configuration Signed-off-by: Michael Oviedo <[email protected]> * add missing params and update geonames workload Signed-off-by: Michael Oviedo <[email protected]> * parameterize request-timeout and warmup-time-period Signed-off-by: Michael Oviedo <[email protected]> --------- Signed-off-by: Michael Oviedo <[email protected]> (cherry picked from commit 5119393) Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian * refactor common ops changes based on discussion with ian * rework common ops configuration * add missing params and update geonames workload * parameterize request-timeout and warmup-time-period --------- (cherry picked from commit 5119393) Signed-off-by: Michael Oviedo <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian * refactor common ops changes based on discussion with ian * rework common ops configuration * add missing params and update geonames workload * parameterize request-timeout and warmup-time-period --------- (cherry picked from commit 5119393) Signed-off-by: Michael Oviedo <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian * refactor common ops changes based on discussion with ian * rework common ops configuration * add missing params and update geonames workload * parameterize request-timeout and warmup-time-period --------- (cherry picked from commit 5119393) Signed-off-by: Michael Oviedo <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
* restructure common operations folder based on discussion with ian * refactor common ops changes based on discussion with ian * rework common ops configuration * add missing params and update geonames workload * parameterize request-timeout and warmup-time-period --------- (cherry picked from commit 5119393) Signed-off-by: Michael Oviedo <[email protected]> Signed-off-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com> Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…roject#514) * restructure common operations folder based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * refactor common ops changes based on discussion with ian Signed-off-by: Michael Oviedo <[email protected]> * rework common ops configuration Signed-off-by: Michael Oviedo <[email protected]> * add missing params and update geonames workload Signed-off-by: Michael Oviedo <[email protected]> * parameterize request-timeout and warmup-time-period Signed-off-by: Michael Oviedo <[email protected]> --------- Signed-off-by: Michael Oviedo <[email protected]>
Description
This change consolidates all of the common operations in OSB workloads into one shared directory. Now, rather than repeating these operations multiple times in each workload, they can simply all be pulled in with one line:
{{ benchmark.collect(parts="../../shared_operations/common_workload_setup.json") }},
Issues Resolved
#489
Testing
Tested by importing the common ops file into other workloads and executing tests
Backport to Branches:
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.