Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add new config so that you can specify which data to index #26

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

bobbyrenwick
Copy link

We've decided that we don't need the raw timer data when we have the timerData
data too. This keeps the default as indexing everything but allows users
of the backend to specify the types that they want to index.

The only thing that users need to be careful of, is if they choose to
override the setting then they need to use any custom names they've given
each data type.

We've decided that we don't need the raw `timer` data when we have the `timerData`
data too. This keeps the default as indexing everything but allows users
of the backend to specify the types that they want to index.

The only thing that users need to be careful of, is if they choose to
override the setting then they need to use any custom names they've given
each data type.
@markkimsal
Copy link
Owner

Can you describe the scenario in which you're sending data to statsd that you don't want flushed to ES? I'm a bit confused by this pull request.

@bobbyrenwick
Copy link
Author

With these changes we save a huge amount of disk space. Without them we were indexing around 60GB a day but with the changes we now index around 4GB a day.

When you're using elasticsearch as a backend for statsd you lose the ability that Graphite and Whisper give you to set lower resolution time intervals for older data. By having a flush interval of 10 seconds on statsd, and not indexing the raw time data, we effectively have a lower resolution time interval and we like the trade off between disk space, resolution and also query performance.

@kaibra
Copy link

kaibra commented May 10, 2017

+1
We have the same problem. The aggregated data has everything we need already. There is no need to write all single documents into ES. Especially percentile-queries in Grafana can get very slow if they are based on the single docs instead of the ones with aggregated stats.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants