Provides storage of and access to the QA monitoring results for the CLAS12 experiment at Jefferson Lab
- How to Use the QADB in Your Analysis
- QA Information
- How to Access the QADB
- How to Access the Faraday Cup Charge
- Database Maintenance
- QA Ground Rules
- Contributions
The QADB is used to filter data based on Quality Assurance (QA) observations. The database stores information about the "defects" of each run: each run is subdivided into "QA bins", and for each bin, a set of "defect bits" may or may not be assigned. See the table of available data sets for which data are included in the QADB.
The user must decide which defect bits should be filtered out of their analysis. See the table of defect bits and decide which bits to use in the filter.
Important
Special care must be taken for the Misc
defect bit, which is assigned for
runs (or part of runs) that have abnormal conditions, whether found on the
timelines or documented in the log book:
- Each QA bin that has the
Misc
defect bit set includes a comment in the QADB, explaining why the bit was set - The analyzer must decide whether or not data with the
Misc
defect bit should be excluded from their analysis - To help with this decision-making, use the
qadb-info misc
command, or use theMisc
summary tables are found in each dataset's directory, which provide the comment(s) for each run
The QADB is available on ifarm
as the qadb
module:
module avail qadb
# then 'module load' the one you want
Alternatively, you may download and use this repository locally:
git clone --recurse-submodules https://github.com/JeffersonLab/clas12-qadb.git
source clas12-qadb/environ.sh # or environ.csh, if using csh
The program qadb-info
may be used to get information about the QADB, including:
- available data sets
- defect bits
- FC charge, filtered by QA defects chosen by the user
- query the QADB by run number, event number, and/or QA bin number
For usage guidance, just run:
qadb-info
Tip
If qadb-info
is not found, either:
- it's at
./bin/qadb-info
, so type the full path to it - add
bin/
to your$PATH
, which you can do with
source environ.sh # for bash, zsh
source environ.csh # for csh, tcsh
Caution
Do not call qadb-info
in an analysis event loop, since it will run too slowly.
Instead, use the provided software or operate on the QADB files directly.
The following tables describe the available data sets in the QADB. The columns are:
- Pass: the Pass number of the data set (higher is newer)
- Data Set Name: a unique name for the data-taking period; click it to see the corresponding QA timelines
- Typically
[RUN_GROUP]_[RUN_PERIOD]
[RUN_PERIOD]
follows the convention[SEASON(sp/su/fa/wi)]_[YEAR]
, and sometimes includes an additional keyword
- Typically
- Run range: the run numbers in this data set
- Status:
- Up-to-Date: this is the most recent Pass of these data, and the QADB has been updated for it
- Deprecated: a newer Pass exists for these data, but the QADB for this version is still preserved
- TO DO: the Pass for these data exist, but the QADB has not yet been updated for it
- Data Directory: the input data used for the QA; this is the top level directory, where trains (skim files) and full DSTs are stored
- Data Files: the specific files (e.g. train) used for the QA
Note
The tables below are for the latest version of this repository, which may not be in a tagged version yet. If you
are on ifarm
, the latest QADB version is found as the qadb/dev
module, and you may switch to it via:
module switch qadb/dev
You may also check currently loaded version of this README
file on ifarm
, which is found at $QADB/README.md
.
Caution
The QADB for older data sets may have some issues, and may even violate the QA ground rules. It is HIGHLY recommended to check the known important issues to see if any issues impact your analysis.
Pass | Data Set Name and Timelines Link | Run Range | Status | Data Directory | Data Files |
---|---|---|---|---|---|
2 | rga_fa18_inbending |
5032 - 5419 | Up-to-Date | /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main |
nSidis train |
2 | rga_fa18_outbending |
5422 - 5666 | Up-to-Date | /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2 |
nSidis train |
2 | rga_sp19 |
6616 - 6783 | Up-to-Date | /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst |
nSidis train |
1 | rga_fa18_inbending |
5032 - 5419 | Deprecated | /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass1 |
full DST files |
1 | rga_fa18_outbending |
5422 - 5666 | Deprecated | /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass1 |
full DST files |
1 | rga_sp19 |
6616 - 6783 | Deprecated | /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass1 |
full DST files |
Pass | Data Set Name and Timelines Link | Run Range | Status | Data Files |
---|---|---|---|---|
2 | rgb_sp19 |
6156 - 6603 | TO DO | /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/recon/ |
2 | rgb_fa19 |
11093 - 11300 | TO DO | |
2 | rgb_wi20 |
11323 - 11571 | TO DO | |
1 | rgb_sp19 |
6156 - 6603 | Up-to-Date | /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass1/v0/dst/recon |
1 | rgb_fa19 |
11093 - 11300 | Up-to-Date | /cache/clas12/rg-b/production/recon/fall2019/torus+1/pass1/v1/dst/recon |
1 | rgb_wi20 |
11323 - 11571 | Up-to-Date | /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass1/v1/dst/recon |
Pass | Data Set Name and Timelines Link | Run Range | Status | Data Directory | Data Files |
---|---|---|---|---|---|
1 | rgc_su22 |
16042 - 16771 | Up-to-Date | /cache/clas12/rg-c/production/summer22/pass1 |
sidisdvcs train |
Pass | Data Set Name and Timelines Link | Run Range | Status | Data Directory | Data Files |
---|---|---|---|---|---|
1 | rgf_sp20_torusM1 |
12210 - 12329 | TO DO | /cache/clas12/rg-f/production/recon/spring2020/torus-1_solenoid-0.8/pass1v0/dst/recon |
|
1 | rgf_su20_torusPh |
12389 - 12434 | TO DO | /cache/clas12/rg-f/production/recon/summer2020/torus+0.5_solenoid-0.745/pass1v0/dst/recon |
|
1 | rgf_su20_torusMh |
12436 - 12443 | TO DO | /cache/clas12/rg-f/production/recon/summer2020/torus-0.5_solenoid-0.745/pass1v0/dst/recon |
|
1 | rgf_su20_torusM1 |
12447 - 12951 | TO DO | /cache/clas12/rg-f/production/recon/summer2020/torus-1_solenoid-0.745/pass1v0/dst/recon |
Pass | Data Set Name and Timelines Link | Run Range | Status | Data Directory | Data Files |
---|---|---|---|---|---|
2 | rgk_fa18_7.5GeV |
5674 - 5870 | TO DO | ||
2 | rgk_fa18_6.5GeV |
5875 - 6000 | TO DO | ||
1 | rgk_fa18_7.5GeV |
5674 - 5870 | Up-to-Date | /cache/clas12/rg-k/production/recon/fall2018/torus+1/7546MeV/pass1/v0/dst/recon |
full DST files |
1 | rgk_fa18_6.5GeV |
5875 - 6000 | Up-to-Date | /cache/clas12/rg-k/production/recon/fall2018/torus+1/6535MeV/pass1/v0/dst/recon |
full DST files |
Pass | Data Set Name and Timelines Link | Run Range | Status | Data Directory | Data Files |
---|---|---|---|---|---|
1 | rgm_fa21 |
15019 - 15884 | Up-to-Date | /cache/clas12/rg-m/production/pass1/allData_forTimelines/ |
full DST files |
- QA information is stored for each QA bin, in the form of defect bits
- the user needs only the run number and event number to query the QADB
- A QA bin is:
- the set of events between a fixed number of scaler readouts (roughly a time bin, although there are fluctuations in a bin's duration)
- for older QADBs, Run Groups A, B, K, and M of Pass 1 data, the QA bins were DST 5-files
- A defect bit is:
- a bit (of a binary number) that is
1
if the QA bin exhibits the corresponding defect or0
if not - each defect bit corresponds to a different defect, as shown in the table below
- many defects check the value of N/F, defined as the trigger electron yield N, normalized by the DAQ-gated Faraday Cup charge F
- a bit (of a binary number) that is
Bit | Name | Description | Additional Notes |
---|---|---|---|
0 | TotalOutlier |
Outlier FD electron N/F, but not TerminalOutlier or MarginalOutlier |
|
1 | TerminalOutlier |
Outlier FD electron N/F of first or last QA bin of run | |
2 | MarginalOutlier |
Marginal FD electron outlier N/F, within one standard deviation of cut line | |
3 | SectorLoss 1 |
FD electron N/F diminished for several consecutive QA bins | For older datasets (RG-A,B,K,M pass 1), this bit replaced the assignment of TotalOutlier , TerminalOutlier , and MarginalOutlier ; newer datasets only add the SectorLoss bit and do not remove the outlier bits. |
4 | LowLiveTime |
Live time < 0.9 | This assignment of this bit may be correlated with a low fraction of events with a defined (nonzero) helicity. |
5 | Misc |
Miscellaneous defect, documented as comment | This bit is often assigned to all QA bins within a run, but in some cases, may only be assigned to the relevant QA bins. The analyzer must decide whether data assigned with the Misc bit should be excluded from their analysis; the comment is provided for this purpose. Analyzers are also encouraged to check the Hall B log book for further details. Note that special runs, such as empty target or low luminosity runs, also typically have this bit set; for such runs, the other defect bits may be meaningless, namely the outlier bits. |
6 | TotalOutlierFT |
Outlier FT electron N/F, but not TerminalOutlierFT or MarginalOutlierFT |
cf. TotalOutlier . |
7 | TerminalOutlierFT |
Outlier FT electron N/F of first or last QA bin of run | cf. TerminalOutlier . |
8 | MarginalOutlierFT |
Marginal FT electron outlier N/F, within one standard deviation of cut line | cf. MarginalOutlier . |
9 | LossFT 1 |
FT electron N/F diminished for several consecutive QA bins | cf. SectorLoss . |
10 | BSAWrong |
Beam Spin Asymmetry is the wrong sign | This bit is assigned per run. The asymmetry is significant, but the sign is opposite than expected; analyzers must therefore flip the helicity sign. |
11 | BSAUnknown |
Beam Spin Asymmetry is unknown, likely because of low statistics | This bit is assigned per run. There are not enough data to determine if the helicity sign is correct for this run. |
12 | TSAWrong |
Target Spin Asymmetry is the wrong sign | Not yet used. |
13 | TSAUnknown |
Target Spin Asymmetry is unknown, likely because of low statistics | Not yet used. |
14 | DSAWrong |
Double Spin Asymmetry is the wrong sign | Not yet used. |
15 | DSAUnknown |
Double Spin Asymmetry is unknown, likely because of low statistics | Not yet used. |
16 | ChargeHigh |
FC Charge is abnormally high | NOTE: the assignment criteria of this bit are still under study. |
17 | ChargeNegative |
FC Charge is negative | The FC charge is calculated from the charge readout at QA bin boundaries. Normally the later charge readout is higher than the earlier; this bit is assigned when the opposite happens. |
18 | ChargeUnknown |
FC Charge is unknown; the first and last time bins always have this defect | QA bin boundaries are at scaler charge readouts. The first QA bin, before any readout, has no initial charge; the last QA bin, after all scaler readouts, has no final charge. Therefore, the first and last QA bins have an unknown, but likely very small charge accumulation. |
19 | PossiblyNoBeam |
Both N and F are low, indicating the beam was possibly off | NOTE: the assignment criteria of this bit are still under study. |
- this bit may not be reliably defined in later datasets; use the other outlier bits instead
You may access the QADB in many ways:
- human-readable tables are stored in
qadb/*/qaTree.json.table
; see the section QA data storage, Table files below for details for how to read these files - QADB JSON files are stored in
qadb/*/qaTree.json
Classes in both C++ and Groovy are provided, for access to the QADB within analysis code.
In either case, you need environment variables; if you are using an ifarm
build, they
have already been set for you, otherwise:
source environ.sh # for bash, zsh
source environ.csh # for csh, tcsh
Then:
- for Groovy, follow
src/README.md
- for C++, follow
srcC/README.md
Important
C++ access needs rapidjson
, provided as a
submodule of this repository in srcC/rapidjson
. If this directory
is empty, you can clone the submodule by running
git submodule update --init --recursive
The QADB files are organized by dataset: one subdirectory of qadb/
per dataset.
Each directory contains:
- Summary tables regarding the
Misc
defect bit assignment are stored inmiscTable.md
; use these to help decide which runs'Misc
bits you want to omit from your analysis - A human-readable table of the full QADB is stored in
qaTree.json.table
, a "Table File"; see below for how to interpret this file - The QADB itself is stored in
json
files, meant for programmatic access
The dataset directories are organized by cook number (pass):
- within
qadb/
, thepass*/
directories are for each cook (pass1
,pass2
, etc.)- within each
pass*/
directory are subdirectories for each dataset
- within each
- the
latest/
directory contains symbolic links to the latest cook of each data set with a QADB
Human-readable format of QA result, stored in qaTree.json.table
- each run begins with the keyword
RUN:
; lines below are for each of that run's QA bins and their QA results, with the following syntax:run_number bin_number defect_bits :: comment
- defect bits have the following form:
bit_number-defect_name[list_of_sectors]
, and[all]
means that all 6 sectors have this defect - comments are usually associated with
Misc
defects, but not always
- defect bits have the following form:
- The QADB itself is stored as JSON files in
qaTree.json
- the format is a tree:
qaTree.json ─┬─ run number 1
├─ run number 2 ─┬─ bin number 1
│ ├─ bin number 2
│ ├─ bin number 3 ─┬─ evnumMin
│ │ ├─ evnumMax
│ │ ├─ sectorDefects
│ │ ├─ defect
│ │ └─ comment
│ ├─ bin number 4
│ └─ bin number 5
├─ run number 3
└─ run number 4
- for each bin, the following variables are defined:
evnumMin
andevnumMax
represent the range of event numbers associated with this bin; use this to map a particular event number to a bin numbersectorDefects
is a map with sector number keys paired with lists of associated defect bitsdefect
is a decimal representation of theOR
of each sector's defect bits, for example,11=0b1011
means that theOR
of the defect bit lists is[0,1,3]
comment
stores an optional comment regarding the QA result
- the charge is also stored in JSON files in
chargeTree.json
, with a similar format:
chargeTree.json ─┬─ run number 1
├─ run number 2 ─┬─ bin number 1
│ ├─ bin number 2
│ ├─ bin number 3 ─┬─ fcChargeMin
│ │ ├─ fcChargeMax
│ │ ├─ ufcChargeMin
│ │ ├─ ufcChargeMax
│ │ └─ nElec ─┬─ sector 1
│ │ ├─ sector 2
│ │ ├─ sector 3
│ │ ├─ sector 4
│ │ ├─ sector 5
│ │ └─ sector 6
│ ├─ bin number 4
│ └─ bin number 5
├─ run number 3
└─ run number 4
- for each bin, the following variables are defined:
fcChargeMin
andfcChargeMax
represent the minimum and maximum DAQ-gated Faraday cup charge, in nCufcChargeMin
andufcChargeMax
represent the minimum and maximum FC charge, but not gated by the DAQ- the difference between the maximum and minimum charge is the accumulated charge in that bin
nElec
lists the number of electrons from each sector
The charge is stored in the QADB for each QA bin, so that it is possible to determine the amount of accumulated charge for data that satisfy your specified QA criteria. To calculate the charge, you'll need to add up the charge from each bin that you include in your analysis. To help, you can either:
- use the command
qadb-info charge
; use its options to specify:- the dataset and/or list of runs
- which defect bits that you want to allow or reject
- of the runs which only have the
Misc
bit, choose those that you want to allow or reject - the output format
- use the software: see
chargeSum.groovy
orchargeSum.cpp
for usage example in an analysis event loop; basically:- call
QADB::AccumulateCharge()
within your event loop, after your QA cuts are satisfied; the QADB instance will keep track of the accumulated charge you analyzed (accumulation performed per QA bin) - at the end of your event loop, the total accumulated charge you analyzed is
given by
QADB::GetAccumulatedCharge()
- call
Caution
For Pass 1 QA results for Run Groups A, B, K, and M, we find some evidence that the charge from bin to bin may slightly overlap, or there may be gaps in the accumulated charge between each bin; the former leads to a slight over-counting and the latter leads to a slight under-counting
- this issue is why we transitioned from using DST files as QA bins to using nth scaler readouts as bin boundaries
- corrections of this issue to these older QADBs will not be applied
Documentation for QADB maintenance and revision
- the QADB files are produced by
clas12-timeline
- if you have produced QA results for a new data set, and would like to add
them to the QADB, or if you would like to update results for an existing
dataset, follow the following procedure:
-
mkdir qadb/pass${pass}/${dataset}/
, then copy the finalqaTree.json
andchargeTree.json
to that directory - add/update a symlink to this dataset in
qadb/latest
, if this is a new Pass -
runa pre-commit hook will take care of thisutil/makeTables.sh
-
update customized QA criteria sets, such asthis function is no longer maintainedOkForAsymmetry
- update the above table of data sets
- submit a pull request
-
- defect bits must be added in the following places:
- Groovy:
src/clasqa/Tools.groovy
(copy fromclasqa
repository version)src/clasqa/QADB.groovy
src/examples/dumpQADB.groovy
(optional)
- C++:
srcC/include/QADB.h
srcC/examples/dumpQADB.cpp
(optional)
- Documentation:
qadb/defect_definitions.json
, then useutil/makeDefectMarkdown.rb
to generate Markdown table forREADME.md
- Groovy:
Important
The following rules are enforced for the QA procedure and the resulting QADB:
- The QA procedure runs on the data as they are and does not fix any of their problems.
- The QADB only provides defect identification and does not provide analysis-specific decisions.
- At least two people independently perform the "manual QA" part of the QA procedure, and the results are cross checked and merged.
All contributions are welcome, whether to the code, examples, documentation, or the QADB itself. You are welcome to open an issue and/or a pull request. If the maintainer(s) do not respond in a reasonable time, send them an email.