-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Logstash Integration with Elasticsearch Data Streams #12178
Comments
@acchen97 Two notes on the above issue:
|
@ruflin thanks for your notes. I've reconciled the former in the original issue. For the latter, I think we can stick with |
@mostlyjason FYI. |
will there be |
@enotspe I would think so, I don't see why we would differ from the authentication strategies of the current elasticsearch output. |
@colinsurprenant We have validation in place for the |
@ph @acchen97 @jen-huang So should we be looking into this Future Considerations item right away?
And the question is more about making this a configurable default behaviour or not? i.e. should we allow the user to disable that in the case of documents not from Agent and not containing these fields? |
And as a followup question, if the user sets |
Those fields have validation on agent side too to ensure safety with ES index name constraints. Following discussion in elastic/kibana#75846, we implemented 20/100/100 byte length restriction for type, dataset, and namespace strings, respectively. |
I'm still a bit hesitant on tackling this in the first version. This would only apply to data that is not sent from Agent, and I'm not sure how adding this would impact those use cases yet. Also, it's not clear to me if and how these |
It is important that we add these fields. The new indexing strategy requires these fields to be there. It is expected that all dashboards / visualizations we build and hopefully also the one from the community, will filter on these. It will make the queries and with the dashboards much faster. If the fields are not in line with the indexing strategy, things will break apart. |
As we are closing in on the release of the logstash data streams output plugin
|
@acchen97 . For a non-agent use case: We have a multi-tenant strategy where each tenant has its own index. Such as datalake-tenant1, datalake-tenant2. We use logstash to feed data and set the index to the correct tenant. Under the new indexing strategy and this plugin can we supports this model: logs-tenant-dataset? Where tenant = ecs field organization.id? |
@Karrade7 @acchen97 Good point. In the current model with But we could also provide string interpolation for the |
@colinsurprenant i think string interpolation and flexibility in general will be important here. |
@Karrade7 I am not sure I understand your concern correctly; there are 2 things at play here:
Are you saying that when not using |
@Karrade7 hyphens are indeed not allowed in the
@colinsurprenant I believe the |
Why should be "type" limited only to logs and metrics? Currently we use the similar naming but we use following options: logs, metrics, monitors (typically up/down monitors, events from heartbeat, ...) and data (real application data, not logs). |
@vbohata this a good question and ultimately nothing will really prevent someone from having a "custom" Data Streams |
in a bit confused, is this plugin already in the 7.10? after I saw a presentation from @ruflin about data streams I started digging about how to integrate metricbeat and filebeat with the data streams... Here what I'm adding to the Beat config:
And Logstash side im doing this:
in Kibana to enable all the templates and dashboards I just add a fake agent and everything is created. Seems working only with metricbeat and filebeat, only some fields sometimes are rising shards errors:
this maybe because I'm not setting correctly all the fields? With auditbeat data streams seem not working. |
@cdino Nice work! You don't need to create a fake Agent, if you go to @cdino What you discovered is, if you know what you are doing you don't need the new plugin ;-) 👏 Curious to hear what errors you got on the auditbeat side. |
@ruflin Thanks! Yes i will avoid to use it in production for now :) but I really like this approach will help us a lot in the future. |
Is the plugin released or not? Did not find repo nor details. Wanted to check roadmap for same |
@sc7565 this feature has not been released yet. It is on the near-term roadmap. |
Was thinking about it yesterday what happens if there's a
We're certainly planning a version check on ES >= 7.9 to see if data-streams are available. |
Discussions are ongoing with the ES team on what kind of primitives exist (such as "require_alias=true") or could be created so that data producers can ensure that they're writing to data streams and not wrongly creating indices where aliases should be. We could create a cache of "already seen index names", but this will never be truly accurate, as we can have a cache miss, confirm an alias exists, then someone decides to delete the alias and template afterwards, causing logstash to create an index instead of an alias. And of course checking without a cache and per document is not performant. |
a technical semi-blocker for data-stream support (due DS the plugin is in a need to check ES version): |
(initial) specification elastic/logstash#12178 Co-authored-by: Ry Biesemeyer <[email protected]> Co-authored-by: Karen Metts <[email protected]>
LS 7.13.0 is on track shipping data-stream support using it's logstash-output-elasticsearch plugin. |
as 7.13.0 is out we're fine to close this issue, |
As a fact, now logstash 7.10 writes to data stream using index template with described datastream{} section. What are the negative aspects of using this approach? Thanks |
Overview
This is an overview of the Logstash integration with Elasticsearch data streams. The integration will be added as a feature to the existing Elasticsearch output plugin. This will include new data stream options that will be recommended for indexing any time series datasets (logs, metrics, etc.) into Elasticsearch. The existing options will continue to be used for non-time series use cases. This feature will be available on both the default and OSS Logstash distributions.
Indexing Strategy
The data streams integration will adopt the new indexing strategy under the
{type}-{dataset}-{namespace}
format, leveraging the composable templates bundled in Elasticsearch starting in 7.9.The default data streams name used will be
logs-generic-default
. This default enables users to easily correlate data with other different data sources (e.g. withlogs-*
andlogs-generic-*
) in Elasticsearch. Given the new indexing strategy, thetype
,dataset
, andnamespace
of the data stream name can all be configured separately.As Logstash will not be fully ECS compliant until 8.0, there are caveats we need to document (or provide bootstrap checks) for users to avoid ECS conflicts.
Example Configuration
Basic default configuration
Minimal settings to get started in Logstash 7.x. Events with the
data_stream.*
fields will automatically get routed to the appropriate data streams. Defaults tologs-generic-logstash
if the fields are missing.Customize data stream name
Configuration Settings
These are the net new data stream specific settings that will be added to the Elasticsearch output plugin:
data_stream
(string, optional) - defines whether data will be indexed into an Elasticsearch data stream. Thedata_stream_*
settings will only be used if this setting is enabled. This setting supports the valuestrue
,false
, andauto
. Defaults tofalse
in Logstash 7.x andauto
starting in Logstash 8.0. More details on theauto
behavior can be found in this issue.data_stream_timestamp
(timestamp, required) - the timestamp used for the data stream. Defaults to@timestamp
.data_stream_type
(string, optional) - the data stream type used to construct the data stream at index time. Onlylogs
ormetrics
is allowed. This field does not support hyphens (-). Defaults tologs
.data_stream_dataset
(string, optional) - the data stream dataset used to construct the data stream at index time. This field does not support hyphens (-). Defaults togeneric
.data_stream_namespace
(string, optional) - the data stream namespace used to construct the data stream at index time. This field does not support hyphens (-). Defaults todefault
.data_stream_auto_routing
(boolean, optional) - automatically routes events by deriving the data stream name using specific event fields with the%{data_stream.type}-%{data_stream.dataset}-%{data_stream.namespace}
format. If enabled, thedata_stream.*
event fields will take precedence over thedata_stream_type
,data_stream_dataset
, anddata_stream_namespace
settings, but will fall back to them if any of the fields are missing from the event. Defaults totrue
.data_stream_sync_fields
(boolean, optional) - automatically syncs thedata_stream.*
event fields if they are missing from the event. This ensures thedata_stream.*
fields match the data stream name that events are indexed to. The field syncing behavior between this setting and thedata_stream_auto_routing
setting can be found in this issue. Defaults totrue
.Elastic Agent Compatibility
Logstash often acts as an intermediary for receiving data from other systems like the Elastic Agent and Kafka. For these use cases, Logstash will by default use the
data_stream.type
,data_stream.dataset
, anddata_stream.namespace
event fields to derive the data stream name. This allows events from the Elastic Agent to automatically be routed to the appropriate Elasticsearch data stream when using Logstash in between. This feature can be disabled by configuring thedata_stream_auto_routing
setting tofalse
.Format:
%{data_stream.type}-%{data_stream.dataset}-%{data_stream.namespace}
Events received from the Elastic Agent should generally have all the
data_stream.*
fields populated. In the case where any of these fields are missing, thedata_stream_sync_fields
setting will be used to sync these fields prior to indexing.Limitations
The primary limitation of data streams is the ability to perform updates to the documents. Logstash users have historically used the existing Elasticsearch output plugin’s capabilities to conduct document updates and achieve exactly once delivery semantics.
Future Considerations
logs-generic-default
is the default data stream for generic data from Logstash and the Elastic Agent. If users express feedback that it’s difficult to identify Logstash sourced data from the shared data stream, we could consider adding afrom-logstash
tag to thetags
ECS base field for events coming from Logstash.The text was updated successfully, but these errors were encountered: