-
Notifications
You must be signed in to change notification settings - Fork 16
Concepts illustrated
Let's take a simple example of calibration data, say "target position," which is defined by three coordinates x, y, z, each represented by floating point number.
Using C++ as an example language, if a user asks for "target position":
auto data = calibration->GetCalib("/target/position");
the calibration database should provide the appropriate data in the current context so it can be used:
if (data["z"] > 30) ...
"The current context" is the key phrase here, since the values of target position could be different for different runs, values may change with time, e.g., if more precise calibration is performed. Also, a user may want to use a personal version of data for various reasons.
The picture above illustrates features of CCDB that involve control of the context:
- The data returned depends on run number.
- A history mechanism: by default CCDB honors the last assignment of data to a particular run, but one can always recover assignments made in the past.
- Variations (equivalent to "branches" in version control systems): users have the ability to create and work with alternative versions of the data, varying the run assignments and/or the data itself.
Data is associated by the namepath. The namepath string is unique across all detector systems. Forward slash(/) is used to specify a hierarchical namepath.
For example:
/target/position
/FDC/driftvelocity/timewalk_parameters
/FDC/base_time_offset
This allows implementors of individual detector systems to specify a hierarchy with as much or little depth as is needed, appropriate to the physical structure of their device.
Namepath format: Allowed symbols are a-z, A-Z, 0-9, '_' and '-'. Spaces or other special symbols are not allowed in the namepath (this simplifies console management and database validation of namepath objects).
/My-path/to/data_01 # OK
/Some...thing/is wrong here! # ERROR: illegal symbols '...', '!' and spaces.
Each namepath corresponds to a "table type". Table type * - defines columns and a number of rows. The idea behind of "type table" is that while data values may change, "the shape" of the table never does.
The term "table" is ambiguous when talking about data, and it is especially ambiguous when topic is related to databases. Thus CCDB uses the term "table type" for a definition of data (the shape), and "data set" or "constant set" for the data itself, i.g. values of table type.
In the example with target position, the namepath "/target/position" is a table type, defining data arranged in three columns of type "float" and one row. Whatever the values of /target/position are, they are always presented as one row of "x", "y", "z".
Each column has a type and may have a name. However column names are optional.
The "/target/position" example has 3 columns named "x", "y", and "z". They could be also identified as 0, 1, 2.
As another example, the /FDC/driftvelocity/timewalk_parameters
parameters may have members
"slope", "offset", and "exponent".
By contrast, a set of constants with a namepath "/FDC/CathodeStrips/pedestals" may have 100 values identified simply as 0, 1, 2, 3, ...
Column types - Column types may be one of the following:
- int
- uint
- long
- ulong
- bool
- double
- string
Each table type may have multiple versions of constant sets, i.e. the data. Each constant set has at least one "assignment". The assignment holds the information specifying the association of the constant set with a context:
- run range - Runs for which the data is valid
- creation time - Creation time of the assignment (used for the history feature)
- variation - Name of the variation
- comment - Any useful comments about the assignment
One could say that assignment shows the context within which the data "is correct". One could also say that while a constant set is data, an associated assignment is a header for this data.
A particular constant set can have several assignments. This allows the use the same constants for different run ranges and helps avoid "update anomalies."
To summarize all of the above concepts: One namepath may have several versions of data (constant sets), each such constant set is connected to one or more assignments. These assignments hold information about the right context (run number, variation, date) for each constant set.
There are two contradicting accessing data use cases:
- In general, getting constants should be as easy as to say "Give /target/position constants for June 2011"
- There should be the way to give the particular set of constants. This means that there should be an unique key for every set of data.
CCDB uses so called "Requests" to solve both of the problems. The full form of the request is an "unique composite key" for the particular data values.
Full form of the request is
</path/to/data>:<run>:<variation>:<time>
But to get the data user can specify only a part of the request. The minimal request to get the data is just /path/to/data
One may ommit any part of the request except name-path:
-
/path/to/data
- just path to data, no run, variation and timestamp specified -
/path/to/data::mc
- no run specified, variation is "mc", no date specified -
/path/to/data:::2029
- only path and date(year) are specified
As it is shown in the examples above, one skips the run number and leave its place like "::" to specify path and variation but to use the default run:
So
+-- variation
|
|
/path/to/data::mc
^
|
+-- place where run number should be
And the request
/path/to/data:::2029
means that the path and the date are specified but a run number and a variation should be deduced. See the next chapter (Default values)
The time is parsed as:
*YYYY:MM:DD-hh:mm:ss*
Any non digit character may be used as separator instead of ':' and '-'
so all these time strings are the same
2029/06/17-22:03:05 2029-06-17-22-03-05 2029/06/17:22/03/05 2029a06b17c22d03e05
One can omit any part of the time from the right. In this case the latest date for the omitted part is to be returned.
Example:
"2011" - (Year 2011. Everything else is omitted), it will be interpreted as 2011/12/31-23:59:59 timestamp so the latest constants for the year 2011 will be returned.
"2012/05/21" - it interpreted as 2012/05/21-23:59:59, meaning to be the latest constants for 21 May 2012
CCDB searches the closest constants before or equal to timestampt provided.
WARNING - Use requests (something with ':' after namepath) to manage the constants or debug the C++/python/Javan code. Never use requests (instead of simple namepath) in C++ or Java production code. See the next chapter!
There are two general cases of using the requests:
- To read out constants in physics analysis
- To manage constants with ccdb console or python
In case of physics analysis, most probably, the software provides the run number being processed and allows to set a variation other parameters through command arguments or environment variables.
like JANA example:
> export JANA_CALIB_CONTEXT="variation=mc"
> hd_root data_for_run_5000.evio
# now JANA knows the preferred variation and run number
So, CCDB defaults and priorities (first is highest)
Run number:
- Run number specified in a request (if you use "/path/to/data:100" request, constants for run 100 will be returned dependless of the run being processed)
- Software set global default run number. (if 10200 run is being processed and you use "/path/to/data" data for run #10200 will be returned)
- 0 - (means run number 0).
Variation:
- Variation specified in a request (if you use "/path/to/data::mc" request, constants for variation mc will be used dependless of the run being processed)
- Global preferred variation set by software.
- the "default" variation.
Time stamp:
- Request specified time will be used
- Constants time set by software.
- Current time
When one uses ccdb console tool in interactive mode, one can set the default run number by running 'run' command
Example:
sh> ccdb -r 100 -i # ccdb in interactive mode default run 100
> cat /path/to/data # commands get constants for run 100
> cat /path/to/data:333 # constants for run 333 are returned
# because request run has higher priority
> ccdb cat /path/to/data # constants for run 0 are returned - no other run given
The production physics analysis code should never use requests so that run, variation and possibly data is provided by the software framework as in the example above. But sometimes it is vital to get the particular data for debug purposes. Use requests in this case and don't forget to remove everything (except the name-path) after the debugging.
calibrarion->GetCalib("/my/data"); // Good. Run number is