Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggregating data pre-store #21

Open
predakanga opened this issue Aug 19, 2013 · 2 comments
Open

Aggregating data pre-store #21

predakanga opened this issue Aug 19, 2013 · 2 comments

Comments

@predakanga
Copy link

Hey there,

I've recently implemented kairos on one of my websites as a replacement for RRD, storing to four different series for up to 60,000 users every 15 minutes.

So far the performance has been admirable, but the storage required blows up very quickly when you're looking at long periods (mine are simple, but long-term - the longest period retains the data for a full year).

In RRD, this is solved by aggregating the data before it's stored, instead of doing the aggregation at read time.

I'm planning to implement this in my own copy of kairos, but wanted to know whether you guys would be open to having that as a feature, as it may by contrary to your architecture or just plain confusing to users.

To give you an idea of the general approach, I've outlined how this would work for the Redis backend only:

This would be implemented by storing the aggregate average as a hash per period, with two values: the current average, and the number of items that have been stored for that period.

The actual storage would be implemented as a LUA script, to set avg = ((avg*count)+newvalue)/count+1, and count=count+1 - the LUA script would be loaded with "SCRIPT LOAD" when the RedisBackend is constructed and executed using "EVALSHA" in an appropriate _type_set.

@awestendorf
Copy link
Member

I do have plans for min, max and average series types which I think addresses your problem. Have you tried the histogram store? I use it in many cases where a list of values is too much.

You can also try mongo storage so that the data can be persisted to disk and not require as much RAM.

@predakanga
Copy link
Author

Histograms could do the trick for some of my cases, now that I think about it - there still is a particular case where I do need to use a cumulative moving average.

If you have your own plans, I'm happy to save a bit of effort on my own part and just implement it as a hack for now, not worry about the architecture so much

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants