Skip to content

Latest commit

 

History

History
408 lines (347 loc) · 25.1 KB

README.md

File metadata and controls

408 lines (347 loc) · 25.1 KB

QADB

CLAS12 Quality Assurance Database

Provides storage of and access to the QA monitoring results for the CLAS12 experiment at Jefferson Lab

Table of Contents

  1. How to Use the QADB in Your Analysis
  2. QA Information
  3. How to Access the QADB
  4. How to Access the Faraday Cup Charge
  5. Database Maintenance
  6. QA Ground Rules
  7. Contributions

How to Use the QADB in Your Analysis

The QADB is used to filter data based on Quality Assurance (QA) observations. The database stores information about the "defects" of each run: each run is subdivided into "QA bins", and for each bin, a set of "defect bits" may or may not be assigned. See the table of available data sets for which data are included in the QADB.

The user must decide which defect bits should be filtered out of their analysis. See the table of defect bits and decide which bits to use in the filter.

Important

Special care must be taken for the Misc defect bit, which is assigned for runs (or part of runs) that have abnormal conditions, whether found on the timelines or documented in the log book:

  • Each QA bin that has the Misc defect bit set includes a comment in the QADB, explaining why the bit was set
  • The analyzer must decide whether or not data with the Misc defect bit should be excluded from their analysis
  • To help with this decision-making, use the qadb-info misc command, or use the Misc summary tables are found in each dataset's directory, which provide the comment(s) for each run

The QADB is available on ifarm as the qadb module:

module avail qadb
# then 'module load' the one you want

Alternatively, you may download and use this repository locally:

git clone --recurse-submodules https://github.com/JeffersonLab/clas12-qadb.git
source clas12-qadb/environ.sh  # or environ.csh, if using csh

QA Information

Information from qadb-info

The program qadb-info may be used to get information about the QADB, including:

  • available data sets
  • defect bits
  • FC charge, filtered by QA defects chosen by the user
  • query the QADB by run number, event number, and/or QA bin number

For usage guidance, just run:

qadb-info

Tip

If qadb-info is not found, either:

  • it's at ./bin/qadb-info, so type the full path to it
  • add bin/ to your $PATH, which you can do with
source environ.sh   # for bash, zsh
source environ.csh  # for csh, tcsh

Caution

Do not call qadb-info in an analysis event loop, since it will run too slowly. Instead, use the provided software or operate on the QADB files directly.

Available Data Sets

The following tables describe the available data sets in the QADB. The columns are:

  • Pass: the Pass number of the data set (higher is newer)
  • Data Set Name: a unique name for the data-taking period; click it to see the corresponding QA timelines
    • Typically [RUN_GROUP]_[RUN_PERIOD]
    • [RUN_PERIOD] follows the convention [SEASON(sp/su/fa/wi)]_[YEAR], and sometimes includes an additional keyword
  • Run range: the run numbers in this data set
  • Status:
    • Up-to-Date: this is the most recent Pass of these data, and the QADB has been updated for it
    • Deprecated: a newer Pass exists for these data, but the QADB for this version is still preserved
    • TO DO: the Pass for these data exist, but the QADB has not yet been updated for it
  • Data Directory: the input data used for the QA; this is the top level directory, where trains (skim files) and full DSTs are stored
  • Data Files: the specific files (e.g. train) used for the QA

Note

The tables below are for the latest version of this repository, which may not be in a tagged version yet. If you are on ifarm, the latest QADB version is found as the qadb/dev module, and you may switch to it via:

module switch qadb/dev

You may also check currently loaded version of this README file on ifarm, which is found at $QADB/README.md.

Caution

The QADB for older data sets may have some issues, and may even violate the QA ground rules. It is HIGHLY recommended to check the known important issues to see if any issues impact your analysis.

Run Group A

Pass Data Set Name and Timelines Link Run Range Status Data Directory Data Files
2 rga_fa18_inbending 5032 - 5419 Up-to-Date /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass2/main nSidis train
2 rga_fa18_outbending 5422 - 5666 Up-to-Date /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass2 nSidis train
2 rga_sp19 6616 - 6783 Up-to-Date /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass2/dst nSidis train
1 rga_fa18_inbending 5032 - 5419 Deprecated /cache/clas12/rg-a/production/recon/fall2018/torus-1/pass1 full DST files
1 rga_fa18_outbending 5422 - 5666 Deprecated /cache/clas12/rg-a/production/recon/fall2018/torus+1/pass1 full DST files
1 rga_sp19 6616 - 6783 Deprecated /cache/clas12/rg-a/production/recon/spring2019/torus-1/pass1 full DST files

Run Group B

Pass Data Set Name and Timelines Link Run Range Status Data Files
2 rgb_sp19 6156 - 6603 TO DO /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass2/v0/dst/recon/
2 rgb_fa19 11093 - 11300 TO DO
2 rgb_wi20 11323 - 11571 TO DO
1 rgb_sp19 6156 - 6603 Up-to-Date /cache/clas12/rg-b/production/recon/spring2019/torus-1/pass1/v0/dst/recon
1 rgb_fa19 11093 - 11300 Up-to-Date /cache/clas12/rg-b/production/recon/fall2019/torus+1/pass1/v1/dst/recon
1 rgb_wi20 11323 - 11571 Up-to-Date /cache/clas12/rg-b/production/recon/spring2020/torus-1/pass1/v1/dst/recon

Run Group C

Pass Data Set Name and Timelines Link Run Range Status Data Directory Data Files
1 rgc_su22 16042 - 16771 Up-to-Date /cache/clas12/rg-c/production/summer22/pass1 sidisdvcs train

Run Group F

Pass Data Set Name and Timelines Link Run Range Status Data Directory Data Files
1 rgf_sp20_torusM1 12210 - 12329 TO DO /cache/clas12/rg-f/production/recon/spring2020/torus-1_solenoid-0.8/pass1v0/dst/recon
1 rgf_su20_torusPh 12389 - 12434 TO DO /cache/clas12/rg-f/production/recon/summer2020/torus+0.5_solenoid-0.745/pass1v0/dst/recon
1 rgf_su20_torusMh 12436 - 12443 TO DO /cache/clas12/rg-f/production/recon/summer2020/torus-0.5_solenoid-0.745/pass1v0/dst/recon
1 rgf_su20_torusM1 12447 - 12951 TO DO /cache/clas12/rg-f/production/recon/summer2020/torus-1_solenoid-0.745/pass1v0/dst/recon

Run Group K

Pass Data Set Name and Timelines Link Run Range Status Data Directory Data Files
2 rgk_fa18_7.5GeV 5674 - 5870 TO DO
2 rgk_fa18_6.5GeV 5875 - 6000 TO DO
1 rgk_fa18_7.5GeV 5674 - 5870 Up-to-Date /cache/clas12/rg-k/production/recon/fall2018/torus+1/7546MeV/pass1/v0/dst/recon full DST files
1 rgk_fa18_6.5GeV 5875 - 6000 Up-to-Date /cache/clas12/rg-k/production/recon/fall2018/torus+1/6535MeV/pass1/v0/dst/recon full DST files

Run Group M

Pass Data Set Name and Timelines Link Run Range Status Data Directory Data Files
1 rgm_fa21 15019 - 15884 Up-to-Date /cache/clas12/rg-m/production/pass1/allData_forTimelines/ full DST files

Defect Bit Definitions

  • QA information is stored for each QA bin, in the form of defect bits
    • the user needs only the run number and event number to query the QADB
  • A QA bin is:
    • the set of events between a fixed number of scaler readouts (roughly a time bin, although there are fluctuations in a bin's duration)
    • for older QADBs, Run Groups A, B, K, and M of Pass 1 data, the QA bins were DST 5-files
  • A defect bit is:
    • a bit (of a binary number) that is 1 if the QA bin exhibits the corresponding defect or 0 if not
    • each defect bit corresponds to a different defect, as shown in the table below
    • many defects check the value of N/F, defined as the trigger electron yield N, normalized by the DAQ-gated Faraday Cup charge F

Table of Defect Bits

Bit Name Description Additional Notes
0 TotalOutlier Outlier FD electron N/F, but not TerminalOutlier or MarginalOutlier
1 TerminalOutlier Outlier FD electron N/F of first or last QA bin of run
2 MarginalOutlier Marginal FD electron outlier N/F, within one standard deviation of cut line
3 SectorLoss1 FD electron N/F diminished for several consecutive QA bins For older datasets (RG-A,B,K,M pass 1), this bit replaced the assignment of TotalOutlier, TerminalOutlier, and MarginalOutlier; newer datasets only add the SectorLoss bit and do not remove the outlier bits.
4 LowLiveTime Live time < 0.9 This assignment of this bit may be correlated with a low fraction of events with a defined (nonzero) helicity.
5 Misc Miscellaneous defect, documented as comment This bit is often assigned to all QA bins within a run, but in some cases, may only be assigned to the relevant QA bins. The analyzer must decide whether data assigned with the Misc bit should be excluded from their analysis; the comment is provided for this purpose. Analyzers are also encouraged to check the Hall B log book for further details. Note that special runs, such as empty target or low luminosity runs, also typically have this bit set; for such runs, the other defect bits may be meaningless, namely the outlier bits.
6 TotalOutlierFT Outlier FT electron N/F, but not TerminalOutlierFT or MarginalOutlierFT cf. TotalOutlier.
7 TerminalOutlierFT Outlier FT electron N/F of first or last QA bin of run cf. TerminalOutlier.
8 MarginalOutlierFT Marginal FT electron outlier N/F, within one standard deviation of cut line cf. MarginalOutlier.
9 LossFT1 FT electron N/F diminished for several consecutive QA bins cf. SectorLoss.
10 BSAWrong Beam Spin Asymmetry is the wrong sign This bit is assigned per run. The asymmetry is significant, but the sign is opposite than expected; analyzers must therefore flip the helicity sign.
11 BSAUnknown Beam Spin Asymmetry is unknown, likely because of low statistics This bit is assigned per run. There are not enough data to determine if the helicity sign is correct for this run.
12 TSAWrong Target Spin Asymmetry is the wrong sign Not yet used.
13 TSAUnknown Target Spin Asymmetry is unknown, likely because of low statistics Not yet used.
14 DSAWrong Double Spin Asymmetry is the wrong sign Not yet used.
15 DSAUnknown Double Spin Asymmetry is unknown, likely because of low statistics Not yet used.
16 ChargeHigh FC Charge is abnormally high NOTE: the assignment criteria of this bit are still under study.
17 ChargeNegative FC Charge is negative The FC charge is calculated from the charge readout at QA bin boundaries. Normally the later charge readout is higher than the earlier; this bit is assigned when the opposite happens.
18 ChargeUnknown FC Charge is unknown; the first and last time bins always have this defect QA bin boundaries are at scaler charge readouts. The first QA bin, before any readout, has no initial charge; the last QA bin, after all scaler readouts, has no final charge. Therefore, the first and last QA bins have an unknown, but likely very small charge accumulation.
19 PossiblyNoBeam Both N and F are low, indicating the beam was possibly off NOTE: the assignment criteria of this bit are still under study.
  1. this bit may not be reliably defined in later datasets; use the other outlier bits instead

How to Access the QADB

You may access the QADB in many ways:

Text Access

  • human-readable tables are stored in qadb/*/qaTree.json.table; see the section QA data storage, Table files below for details for how to read these files
  • QADB JSON files are stored in qadb/*/qaTree.json

Software Access

Classes in both C++ and Groovy are provided, for access to the QADB within analysis code. In either case, you need environment variables; if you are using an ifarm build, they have already been set for you, otherwise:

source environ.sh   # for bash, zsh
source environ.csh  # for csh, tcsh

Then:

Important

C++ access needs rapidjson, provided as a submodule of this repository in srcC/rapidjson. If this directory is empty, you can clone the submodule by running

git submodule update --init --recursive

QADB Files and Tables

The QADB files are organized by dataset: one subdirectory of qadb/ per dataset. Each directory contains:

  • Summary tables regarding the Misc defect bit assignment are stored in miscTable.md; use these to help decide which runs' Misc bits you want to omit from your analysis
  • A human-readable table of the full QADB is stored in qaTree.json.table, a "Table File"; see below for how to interpret this file
  • The QADB itself is stored in json files, meant for programmatic access

The dataset directories are organized by cook number (pass):

  • within qadb/, the pass*/ directories are for each cook (pass1, pass2, etc.)
    • within each pass*/ directory are subdirectories for each dataset
  • the latest/ directory contains symbolic links to the latest cook of each data set with a QADB

Table Files

Human-readable format of QA result, stored in qaTree.json.table

  • each run begins with the keyword RUN:; lines below are for each of that run's QA bins and their QA results, with the following syntax:
    • run_number bin_number defect_bits :: comment
      • defect bits have the following form: bit_number-defect_name[list_of_sectors], and [all] means that all 6 sectors have this defect
      • comments are usually associated with Misc defects, but not always

JSON files

qaTree.json

  • The QADB itself is stored as JSON files in qaTree.json
  • the format is a tree:
qaTree.json ─┬─ run number 1
             ├─ run number 2 ─┬─ bin number 1
             │                ├─ bin number 2
             │                ├─ bin number 3 ─┬─ evnumMin
             │                │                ├─ evnumMax
             │                │                ├─ sectorDefects
             │                │                ├─ defect
             │                │                └─ comment
             │                ├─ bin number 4
             │                └─ bin number 5
             ├─ run number 3
             └─ run number 4
  • for each bin, the following variables are defined:
    • evnumMin and evnumMax represent the range of event numbers associated with this bin; use this to map a particular event number to a bin number
    • sectorDefects is a map with sector number keys paired with lists of associated defect bits
    • defect is a decimal representation of the OR of each sector's defect bits, for example, 11=0b1011 means that the OR of the defect bit lists is [0,1,3]
    • comment stores an optional comment regarding the QA result

chargeTree.json

  • the charge is also stored in JSON files in chargeTree.json, with a similar format:
chargeTree.json ─┬─ run number 1
                 ├─ run number 2 ─┬─ bin number 1
                 │                ├─ bin number 2
                 │                ├─ bin number 3 ─┬─ fcChargeMin
                 │                │                ├─ fcChargeMax
                 │                │                ├─ ufcChargeMin
                 │                │                ├─ ufcChargeMax
                 │                │                └─ nElec ─┬─ sector 1
                 │                │                          ├─ sector 2
                 │                │                          ├─ sector 3
                 │                │                          ├─ sector 4
                 │                │                          ├─ sector 5
                 │                │                          └─ sector 6
                 │                ├─ bin number 4
                 │                └─ bin number 5
                 ├─ run number 3
                 └─ run number 4
  • for each bin, the following variables are defined:
    • fcChargeMin and fcChargeMax represent the minimum and maximum DAQ-gated Faraday cup charge, in nC
    • ufcChargeMin and ufcChargeMax represent the minimum and maximum FC charge, but not gated by the DAQ
    • the difference between the maximum and minimum charge is the accumulated charge in that bin
    • nElec lists the number of electrons from each sector

How to Access the Faraday Cup Charge

The charge is stored in the QADB for each QA bin, so that it is possible to determine the amount of accumulated charge for data that satisfy your specified QA criteria. To calculate the charge, you'll need to add up the charge from each bin that you include in your analysis. To help, you can either:

  • use the command qadb-info charge; use its options to specify:
    • the dataset and/or list of runs
    • which defect bits that you want to allow or reject
    • of the runs which only have the Misc bit, choose those that you want to allow or reject
    • the output format
  • use the software: see chargeSum.groovy or chargeSum.cpp for usage example in an analysis event loop; basically:
    • call QADB::AccumulateCharge() within your event loop, after your QA cuts are satisfied; the QADB instance will keep track of the accumulated charge you analyzed (accumulation performed per QA bin)
    • at the end of your event loop, the total accumulated charge you analyzed is given by QADB::GetAccumulatedCharge()

Caution

For Pass 1 QA results for Run Groups A, B, K, and M, we find some evidence that the charge from bin to bin may slightly overlap, or there may be gaps in the accumulated charge between each bin; the former leads to a slight over-counting and the latter leads to a slight under-counting

  • this issue is why we transitioned from using DST files as QA bins to using nth scaler readouts as bin boundaries
  • corrections of this issue to these older QADBs will not be applied

QADB Maintenance

Documentation for QADB maintenance and revision

Adding to or revising the QADB

  • the QADB files are produced by clas12-timeline
  • if you have produced QA results for a new data set, and would like to add them to the QADB, or if you would like to update results for an existing dataset, follow the following procedure:
    • mkdir qadb/pass${pass}/${dataset}/, then copy the final qaTree.json and chargeTree.json to that directory
    • add/update a symlink to this dataset in qadb/latest, if this is a new Pass
    • run util/makeTables.sh a pre-commit hook will take care of this
    • update customized QA criteria sets, such as OkForAsymmetry this function is no longer maintained
    • update the above table of data sets
    • submit a pull request

Adding new defect bits

  • defect bits must be added in the following places:
    • Groovy:
      • src/clasqa/Tools.groovy (copy from clasqa repository version)
      • src/clasqa/QADB.groovy
      • src/examples/dumpQADB.groovy (optional)
    • C++:
      • srcC/include/QADB.h
      • srcC/examples/dumpQADB.cpp (optional)
    • Documentation:
      • qadb/defect_definitions.json, then use util/makeDefectMarkdown.rb to generate Markdown table for README.md

QA Ground Rules

Important

The following rules are enforced for the QA procedure and the resulting QADB:

  1. The QA procedure runs on the data as they are and does not fix any of their problems.
  2. The QADB only provides defect identification and does not provide analysis-specific decisions.
  3. At least two people independently perform the "manual QA" part of the QA procedure, and the results are cross checked and merged.

Contributions

All contributions are welcome, whether to the code, examples, documentation, or the QADB itself. You are welcome to open an issue and/or a pull request. If the maintainer(s) do not respond in a reasonable time, send them an email.