aux-db: cookbook? #18

dneise · 2016-11-02T07:21:45Z

I think this database is awesome, since one does not have to connect to isdc or newdaq to get access to the data. However, I think there are some common and non trivial use cases, people solve again and again whenever they use the data within this DB.

I would like to come together with users of this DB and identify these, then think about a way of reusing solutions to common problems.

As an example, I discuss one thing I do very often with aux-data and find extremely annoying quite lengthy, the other I don't discuss at all, but I find it annoying as well.

correlating heterogeneously sampled data and
getting aux-data only for times, when telescope was taking physics data.

TL;DR:

I often struggle with this and similar problems, and I hope this does not show how stupid I am, but how complex the problem is. I guess also you might struggle with similar task sometimes. I think there is some background knowledge needed to solve these problems best, which is not provided by the aux-db, as e.g. is this value measured or just logged? I think we can help each other with our experience and should therefore somehow come together.

I think, one might not be able to provide actual code to solve these problems for everybody, but code fragments and best practices might certainly help. I am thinking of an enlarged README, or a wiki, or a cookbook ... not sure. A library of functions or classes, which people might either use, or just steal some code from is also cool.

How do you feel about this? Are there any users of this aux-db interested in exchange?

Correlations

Aux data from different services (even when coming from the same server) is generally sampled heterogeneously, while exceptions might exist. A trivial example could be questions like:

How was the trigger rate, when looking to different directions?
How was the SiPM current consumption for different trigger thresholds?

Solutions I have seen so far include:

Ignore it
resample with new fixed frequency, say 1/s.
resample one time series according to the others timestamps.
resample both time series according to the union of both timestamps.
assign samples a common timestamp, when their temporal distance does not exceed a problem specific boundary.

All of these approaches have their weaknesses, some increase the amount of data tremendously, some reduce the amount of data which might hide problems, some are computationally expensive, some contain magic or problem specific numbers, so cannot be generalized. So while it would be nice to solve this problem already on the injection side of the database, so that any data requested for the same time span can be trivially correlated, I think it's not possible.

Logged vs. Measured Data

When correlating inhomogeneously sampled time series, one usually thinks about resampling, which needs a method. (Linear) Interpolation is often used for its simplicity. However in our aux DB, there are both logged and measured values. In order to save space, the DataLogger writes new entries only into the aux-files, if they differ from the previous entries. So a server measuring something with fairly low resolution, and outputting and integer value, might update its value every second, but the aux-file and thus the aux-db only contains this value whenever it changed. So linear interpolation in this case is maybe not advisable. Also logged values are put into the aux db, i.e. values we know precisely, since we set them as opposed to measure them. Usually (though I cannot prove this) serves update their aux-services as soon as they change a value, they control. So e.g. the trigger threshold is set by the FtmCtrl, when ever the threshold is changed, the service will be updated and the value will be written to the aux-db. So linearly interpolating the trigger threshold for times when there was no entry in the aux-db would be insane. So up to now, it sounds one should always resample by using the the last or left value one finds, right.... Well not really.

When thinking about resampling measured values like, e.g. the measured current, one finds a problem. So, assume you want to correlate the current and the direction we are looking at. And in order to really see all the data there is, you decide to use the method: "resample both time series according to the union of both timestamps." So in the end both time series a probably twice as long but are guaranteed to have the same timestamps and are guaranteed to still contain all of their original values (plus a few more). For resampling you chose the method: "bfill" or "left". Now generally the union of the timestamps might result in cases where timestamps are very close together. Now assume, you do not only want to learn how the current behaves depending on the direction, but also how the current changes with time depending on the direction. So you quickly calculate the derivative and find out you are screwed.

dneise added help wanted question labels Nov 2, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

aux-db: cookbook? #18

aux-db: cookbook? #18

dneise commented Nov 2, 2016

aux-db: cookbook? #18

aux-db: cookbook? #18

Comments

dneise commented Nov 2, 2016

TL;DR:

Correlations

Solutions I have seen so far include:

Logged vs. Measured Data