[Doc] Logstash data streams integration #966

ppf2 · 2020-09-01T00:07:17Z

Data streams, a convenient, scalable way to ingest, search, and manage continuously generated time series data, was released in Elasticsearch 7.9.

While this feature is currently available in the default distribution of Elasticsearch, Logstash has not yet adopted it in its time-series indexing implementation.

The following walks you through how you can implement data streams integration with Logstash.

Using this recipe allows you to more easily workaround the well-known limitation in using dynamic variables with ILM+rollover in Logstash until more out of the box integration is available between Logstash and data streams.

Disclaimer: Keep in mind that Elasticsearch data streams only support create action today. If a document with the specified _id already exists, the indexing operation will fail (by design).

Step 1: Create the desired ILM policy in Elasticsearch (you can use either the API or Kibana UI):

# This is an arbitrary ILM policy that performs a rollover and delete
PUT _ilm/policy/my-30g-30d-ilm-policy
{
  "policy": {
    "phases": {
      "hot": {
        "actions": {
          "rollover": {
            "max_size": "30G"
          }
        }
      },
      "delete": {
        "min_age": "30d",
        "actions": {
          "delete": {}
        }
      }
    }
  }
}

Step 2: Create an index template using v2 templates (you can use either the API or Kibana UI). "v2 templates" refer to the new _index_template implementation in Elasticsearch.

# Certainly, you can customize other index setting/mapping options as part of the index template
PUT /_index_template/my-data-stream-template
{
  "index_patterns": [ "my-data-stream*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "settings": {
      "index.lifecycle.name": "my-30g-30d-ilm-policy"
    }
  }
}

(Optional) You can also create multiple index templates for each "type" of index/app if desired, e.g.,

PUT /_index_template/my-data-stream-app1-template
{
  "index_patterns": [ "my-data-stream-app1*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "settings": {
      "index.number_of_shards": 1,
      "index.refresh_interval": "30s",
      "index.lifecycle.name": "my-30g-30d-ilm-policy"
    }
  }
}

PUT /_index_template/my-data-stream-app2-template
{
  "index_patterns": [ "my-data-stream-app2*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "settings": {
      "index.number_of_shards": 3,
      "index.refresh_interval": "15s",
      "index.lifecycle.name": "my-30g-15d-ilm-policy"
    }
  }
}

(Optional) If you are running hot-warm architecture, make sure to include the index.routing.allocation.require setting in the index templates so that it will place new data stream indices in the hot tier by default. The following is an example for the hot-warm deployment template on Elastic Cloud.

# Elastic Cloud uses the node attribute "data" (by default) to define 
# the tiers in a hot-warm deployment template. When "data" is set to "hot", 
# it will allocate all `my-data-stream*` indices only to the hot tier in the deployment.
# If you are not running on Elastic Cloud, your node attribute/attribute value 
# will likely be different. 

# DO NOT simply copy and paste the example below without customization
# UNLESS you know the `data:hot` node attribute is properly set up in your environment :)
PUT /_index_template/my-data-stream-template
{
  "index_patterns": [ "my-data-stream*" ],
  "data_stream": { },
  "priority": 200,
  "template": {
    "settings": {
      "index.lifecycle.name": "my-30g-30d-ilm-policy",
      "index" : {
        "routing" : {
          "allocation" : {
            "require" : {
              "data" : "hot"  
            }
          }
        }
      }
    }
  }
}

Step 3: Configure Logstash Elasticsearch output

Example below assumes that the variable %{app_name} is already defined/populated to each event upstream from the output.

output {
  elasticsearch{
    user=>"elastic"
    password=>"password"
    index => "my-data-stream-%{app_name}"
    # To prevent LS output from interfering with the data stream setup,
    # ILM integration is explicitly disabled
    ilm_enabled => false 
    # Data streams only support create action
    "action" => "create"
    hosts => ["https://<es_host>:<es_port>"]
  }
}

As Logstash substitutes the field variable %{app_name} with its value in the event set upstream from the output, it will match the index template defined in Step 2. As a result, the underlying data stream for each "application type" will automatically be created.

Example of resulting backing indices (with rollover) of the data streams created for each "application type""

health status index
green  open   .ds-my-data-stream-app1-000001
green  open   .ds-my-data-stream-app1-000002
green  open   .ds-my-data-stream-app2-000001
green  open   .ds-my-data-stream-app2-000002

The text was updated successfully, but these errors were encountered:

andreykaipov · 2020-09-02T18:48:02Z

This is really awesome! Thank you for the walkthrough!

Is the goal to eventually support variable interpolation in the ilm_rollover_alias option via this approach, or will using the index option like in your example become the recommended approach to workaround #858? I can see it being an issue as it might not be apparent to end users that the underlying indices behind an interpolated ilm_rollover_alias are actually data streams and will only support create actions (at the moment).

It's probably still too early to tell, but I figured I'd ask to gauge if it's worth implementing the data streams approach for now on our end.

karenzone · 2020-09-02T19:36:53Z

Yes! Good stuff, @ppf2. Adding this to work in queue. Thanks for taking time to share this info with other users.

ppf2 · 2020-09-02T20:04:42Z

Is the goal to eventually support variable interpolation in the ilm_rollover_alias option via this approach, or will using the index option like in your example become the recommended approach to workaround #858?

I think the long term plan is for the output to have actual data stream settings (so that we don't have to deal with the unintuitive setup here, turning ILM off and switching index option to create, etc..). This will certainly require a code change so I will have the LS devs comment here :)

karenzone · 2020-09-02T20:58:29Z

@colinsurprenant FYI: Calling your attention to this issue as it relates to our docs work on datastreams.

colinsurprenant · 2020-09-02T22:43:02Z

Great stuff @ppf2 - FYI we are currently working on the design and implementation strategy for a new data streams output plugin which will be essentially a stripped down version of the current elasticsearch output, see elastic/logstash#12178. Please let us know if you have any feedback/comments etc!

kares · 2021-07-26T11:26:58Z

going to close this issue as elastic/logstash#12178 got shipped, let us know if there's anything more we need to do (e.g. in the docs)

ppf2 added the docs label Sep 1, 2020

ppf2 assigned jsvd, jakelandis and karenzone Sep 1, 2020

ppf2 mentioned this issue Sep 1, 2020

Is it possible to support indexing of dynamic variables using rollover? #858

Closed

kares closed this as completed Jul 26, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Doc] Logstash data streams integration #966

[Doc] Logstash data streams integration #966

ppf2 commented Sep 1, 2020 •

edited

Loading

andreykaipov commented Sep 2, 2020

karenzone commented Sep 2, 2020

ppf2 commented Sep 2, 2020

karenzone commented Sep 2, 2020 •

edited

Loading

colinsurprenant commented Sep 2, 2020

kares commented Jul 26, 2021

[Doc] Logstash data streams integration #966

[Doc] Logstash data streams integration #966

Comments

ppf2 commented Sep 1, 2020 • edited Loading

andreykaipov commented Sep 2, 2020

karenzone commented Sep 2, 2020

ppf2 commented Sep 2, 2020

karenzone commented Sep 2, 2020 • edited Loading

colinsurprenant commented Sep 2, 2020

kares commented Jul 26, 2021

ppf2 commented Sep 1, 2020 •

edited

Loading

karenzone commented Sep 2, 2020 •

edited

Loading