Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a dynamic target index in ISM Rollup #61

Closed
adityaj1107 opened this issue Jun 3, 2021 · 13 comments
Closed

Allow a dynamic target index in ISM Rollup #61

adityaj1107 opened this issue Jun 3, 2021 · 13 comments
Labels

Comments

@adityaj1107
Copy link
Contributor

Issue by mark-meyer
Wednesday Mar 31, 2021 at 00:46 GMT
Originally opened as opendistro-for-elasticsearch/index-management#428


I would like to apply a policy to daily log indices that performs a rollup in one of the states before deleting the index. However is seem the target_index requires a string and there is no option to interpolate something like {{ctx.index}} or similar. This makes it impossible to apply this policy to more than one index unless you are okay with rollup overwriting the previous one, which you probably aren't.

Describe the solution you'd like
I would like to specify the target_index field of an ism rollup dynamically based on the name (or some other metadata) of the index being rolled up such as "target_index": "rollup_{{ctx.index}}"

@adityaj1107 adityaj1107 added the enhancement New request label Jun 3, 2021
@adityaj1107
Copy link
Contributor Author

Comment by dbbaughe
Wednesday Mar 31, 2021 at 00:51 GMT


I believe @thalurur just did something similar for snapshot name too.

That being said, correct me if I'm wrong Ravi, but if @mark-meyer applies this policy to daily log indices even if they roll up into the same target index they will not overwrite each other. We do support multiple rollups in the same index. Each index that is managed by the policy would have a rollup job created for itself when the rollup action is created which will use its own rollup ID as a namespace in the target index to handle conflicts. And the rollup ID seems to consist of a hash of the job configuration and the source index which will be unique per daily log index.

Edit: Do we delete these temporary rollup jobs after they finish @thalurur?

@adityaj1107
Copy link
Contributor Author

Comment by thalurur
Monday Apr 05, 2021 at 22:30 GMT


At the moment ISM action will not delete the jobs after they are finished. Rest of the behavior described by Drew is accurate - at the moment the same policy will be creating multiple jobs (based on the source index the policy is being executed on) and all of these jobs writing data to the same target index. And data from each job cannot be touched by other jobs as their namespace includes the rollup job id.

Rollup implementation have some constraints imposed at the moment:

  1. If multiple rollup jobs have written data to same target index then data from only one job is picked when searching the rollup index, there is no ability to make the search check through all the jobs at the moment.
  2. If search requests consists of multiple rollup indices then the search on rollup indices is failed.

The above makes the only way to rollup data for daily log indices is to create one job that writes to one target index. So the rolled up data can be searched together.

What happens with current policy:
In this case for each new daily log index a new rollup job is created, but all the jobs write to the same target index.
Though the documents from different rollup jobs inside target index are shielded from overwriting, fetching documents from this index is constrained by constraint "1". During search on the target index a single job is picked from multiple jobs and this will prevent ever fetching all the data or user desired job data from target index.

What happens with proposed policy:
If we update the rollup ISM policy to take in a scripted target index, then for each new daily log index a new rollup job is created, writing to a new target index.
In this case since only one job is writing to target index during search there is no issue of which job is created and user can pick which index they want to search from. But, if user want to search across target indices or all target indices its not possible.

We have few options if we need to support either:

  • Adjust the constraint "2" - multiple rollup indices cannot be searched together. Adjusting this constraint and making the ISM rollup policy target index scriptable like requested will make it possible to search across all ISM policy created rollup jobs as long as all jobs wrote to different index.
  • Adjust the constraint "1" - make it possible to search multiple rollup job data on the same target index, this makes the current policy work
  • A combination of both - make it possible to search multiple rollup jobs on same target index and allow multiple rollup indices to be searched in same search request

@mark-meyer Couple of questions:

  1. Is there an use case where you would need to search all the data from different source indices that is rolled up through ISM policy together? (I am assuming you do cause since rollup definition is same in policy for all jobs, the data is just spread out temporally)
  2. If 1 is yes, do you need the data to be rolled up to different indices or having it single index works as well?

@adityaj1107
Copy link
Contributor Author

Comment by garlicsauce
Wednesday Apr 28, 2021 at 11:00 GMT


+1

I was trying to setup rollup policy with target index containing date math expression "target_index": "<test-{now/d}>",. Index was created - test-2021.04.28 however rollup job fails with an error saying

Status
Failed: Failed to update mappings of target index [<test-{now/d}>] with rollup job

Without such possibility I guess I need to create some script running every day that would create rollup jobs itself which is much uglier solution. To sum up - it would be nice to be able to setup a rollup job/rollup policy with dynamic target index.

thalurur pushed a commit to thalurur/open-index-management that referenced this issue Oct 22, 2021
opensearch-project#61)

* Adds VisualCreatePolicy page, missing backend routes/configs, updates all creation paths to show new modal, updates rates, etc.

Signed-off-by: Drew Baugher <[email protected]>

* Updates cypress tests

Signed-off-by: Drew Baugher <[email protected]>

* Fixes cypress test

Signed-off-by: Drew Baugher <[email protected]>

* Fixes duplicate action type, filters retry/timeout keys, and fixes transition condition default value when switching

Signed-off-by: Drew Baugher <[email protected]>
@dbbaughe
Copy link
Contributor

Additional comments:

  • Be able to use an alias as the rollover target
  • Be able to have date math in the rollover target index name so it dynamically changes each day
  • Allow for dynamic index naming based off index rolled up (e.g., if done via ISM action on a rolled index, against a source index of logs-90-000422, a config like %{indexname}-rollup would create logs-90-000422-rollup)

Main issue being that the target of the rollup is currently static, which means once that target index is too large then you need to manually stop the rollup job and start a new one to point to a new index. We need to be able to have a single continuous rollup job that can write to a target that can change (by using an alias where backing indices are rolled over, maybe data streams, dynamically configured index names, etc).

@setiah
Copy link

setiah commented Jul 26, 2022

To sum up, the ask is

  1. Allow target-index name in a rollup job to be dynamic so it is possible to rollup data into multiple time-based indices instead of one index (as it is today).
  2. As a follow up to 1, when the data is available in multiple target indices, a user should be able to run search across all rolled up indices. Currently search is only supported for one rollup index at a time since the rollup data goes into just one target index. Tracking issue - Allow rollup searches to multiple rollup indices if they have the same rollup #321

@mark-meyer @garlicsauce does this address your use case?

@petardz
Copy link
Contributor

petardz commented Jul 27, 2022

@setiah

Few question regarding your first comment:

  1. When creating rollup job, are we going to store resolved target_index value?
  2. Do we need to support anything other then "target_index": "rollup_{{ctx.source_index}}", like date math expressions?
  3. if answer to question 1 is no,(we're resolving it during job run) what are we returning in GetRollup(s)Action: resolved value or scripted?
    image

@petardz
Copy link
Contributor

petardz commented Aug 4, 2022

@setiah Here is example when using scripted target_index field:

  1. Index template
PUT _index_template/ism_rollover
{
  "index_patterns": ["log*"],
  "template": {
   "settings": {
    "plugins.index_state_management.rollover_alias": "log"
   }
 }
}
  1. ISM policy
PUT _plugins/_ism/policies/rollover_with_rollup_policy
{
  "policy": {
    "description": "Example rollover policy.",
    "default_state": "rollover",
    "states": [
      {
        "name": "rollover",
        "actions": [
          {
            "rollover": {
              "min_doc_count": 1
            }
          }
        ],
        "transitions": [
          {
            "state_name": "rp"
          }
        ]
      },
      {
        "name": "rp",
        "actions": [
          {
            "rollup": {
              "ism_rollup": {
                "target_index": "rollup_{{ctx.source_index}}",
                "description": "Example rollup job",
                "page_size": 200,
                "dimensions": [
                  {
                    "date_histogram": {
                      "source_field": "ts",
                      "fixed_interval": "60m",
                      "timezone": "America/Los_Angeles"
                    }
                  },
                  {
                    "terms": {
                      "source_field": "message.keyword"
                    }
                  }
                ],
                "metrics": [
                  {
                    "source_field": "msg_size",
                    "metrics": [
                      {
                        "sum": {}
                      }
                    ]
                  }
                ]
              }
            }
          }
        ],
        "transitions": []
      }
    ],
    "ism_template": {
      "index_patterns": [
        "log*"
      ],
      "priority": 100
    }
  }
}
  1. Create index
PUT log-000001
{
  "aliases": {
    "log": {
      "is_write_index": true
    }
  }
}
  1. Insert some docs
POST log/_doc?refresh=true
{
  "ts" : "2022-08-26T09:28:48+00:00",
  "message": "aaa1234",
  "msg_size": 10
}

Notice rollover_with_rollup_policy has been added to log-000001. Rollover action will be executed and then rollup action. After executing rollup action, rollup job would create rollup_log-000001 index.
log-000002 is created after rollover and rollover_with_rollup_policy will be attached to it

@downsrob
Copy link
Contributor

downsrob commented Aug 6, 2022

Scripted target index is complete for mustache templates but still needs support for aliases.

There are two clear options I see for alias:

  1. Add a flag when creating a rollup that the target index should be an alias, then the alias can be created at Rollup initialization time and we can create a template to make sure all backing indices always have the correct index mappings. Then a user can add a rollover policy for this alias and the work is done.
  2. Add support for an alias by having the user create an alias, create an empty backing index with no mappings and set it as the write index, and then assign the alias as the target index in the rollup policy. When we write to the alias on the rollups side, we will check that the index has empty mappings and then update the mappings and make the backing index a rollup index. When the alias points to a new index we would do the same, so it would be important that the user sets up the alias without mappings.

@setiah thoughts?

@petardz
Copy link
Contributor

petardz commented Aug 24, 2022

@setiah any thoughts on this?
First solution makes the most sense to me. It is the most user friendly. We would add another boolean field in Rollup struct, let's say "is_target_index_alias". This would require dashboards work too to include this field in rollup wizard/create page.

@JathinSanghvi
Copy link

JathinSanghvi commented Aug 25, 2022

@petardz, @downsrob - can rollup target_index be mapped to a index alias and can we have a ism policy for that target_index? if we use this method then it would be more powerful and dynamic. i tried to do this.

  1. create a index template for log*
  2. create a ism policy for log* that has roll_over, warm_migration and delete stages.
  3. add docs to log index that would create log-000001 index and ism policy will be applied and it performs roll_over based on conditions.
  4. create a index template for rollup-log*
  5. create a different ism policy for rollup-log* that can any stage independent of log* ism
  6. create a continuous rollup job with source index as log* and target_index as "rollup-log". rollup-log is a alias that should point to the current write index for rollup-log* index pattern.

when i tried to do this in aws opensearch v7.9 the rollup job fails with this error
"Failed: Failed to create target index [rollup-debug]"

maybe this idea could be a independent issue and not related to this. but i stumpled on this github issue while trying to troubleshoot the above error and felt this would be cleaner approach.

@petardz
Copy link
Contributor

petardz commented Aug 25, 2022

@JathinSanghvi We don't have currently alias capability in target_index, but your example can be setup through "scripted target_index". Checkout this comment with example: link

Regarding alias, if we support it in target_index it would be very similar flow to "scripted target_index" variant:

  1. Setup source index template and rollover policy
  2. Create continuous rollup job with alias in target_index and target_index_is_alias:true param:
PUT _plugins/_rollup/jobs/example
{
  "rollup": {
    "source_index": "log*",
    "target_index": "log_rollup",
    "target_index_is_alias":true
    "continuous": true,
    ...
    "dimensions": [
       ...
    ],
    "metrics": [
       ...
    ]
  }
}

During creation of the job, if alias doesn't exists, we would create backing index and alias.

  1. Setup rollover policy for log_rollup alias

Difference between alias variant from "scripted" variant is that with alias you would be able to control separately when rollup index rolls over. In scripted variant rollup index is tied to current source_index and when source_index rolls over then new rollup index is created.
Regarding searching, both variants would work the same.

@JathinSanghvi
Copy link

i feel there wont be additional work required for new variant, just the support for alias on target index should be enough. So then who ever is picking up the alias work should leave it to be flexible so that both variants can exist side by side. what do you think?

@MahendraAkkina
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants