Quality Tags #79

mmaiers-nmdp · 2018-09-26T15:03:28Z

Categorize the list (HH2016) further.
Some are just "descriptive statistics" or "features".
Some are indicators of how "good" the data is.

At DaSH8 we should implement a few of these using "AWS Lambda"

Simple examples:
RES_MISS_LOCI - depends on GT
Wn Statistic - global 2-locus pairwise LD (depends on GT)
DIV_50_REL - depends on HT only

fscheel · 2018-09-26T15:11:07Z

The following mechanisms must be part of the implementation:

have a central place to plug in further metrics calculations during upload
compute new metrics on existing datasets without service disruption

fscheel · 2018-09-27T13:29:27Z

This is the list of defined "quality" metrics in PHYCUS right now:

          - DIV_LAMBDA
          - DIV_50
          - DIV_50_REL
          - SAM_SIZE
          - SAM_POP
          - DIV_PGD
          - DIV_HEAVY_TAIL
          - RES_TRS_COUNT
          - RES_TRS
          - RES_SHARE_AMBIG
          - RES_MISS_LOCI
          - DEV_HWE
          - ERR_STD
          - ERR_SAMP_80_100
          - SUM_FREQ_GAP
          - ERR_OFFSET
          - LD_MEASURE
          - KFOLD_IMPUTE
          - KFOLD_PRED_ACTUAL
          - KFOLD_N

It seems that we can calulate some ourselves. But right now we also accept all of these values. How do we handle values that we receive but also calculate ourselves? Some options:

Verify them (and return an error or warning ❓ if they don't match)
Ignore them and use our own calculation
Trust the input
Error
Change the spec to disallow (a subset ❓) of metrics

hpeberhard · 2018-10-02T16:08:25Z

I talked with Florian about these today and I would like to share a few thoughts with you.

I believe it is important to come to a minimum valuable product / minimum loveable product soon.
Loveable includes not perfect ;-).
Hackers at Hackathons should have something to do that has the potential to lead to a minimum * product.
Take one of each metrics to start: no GT needed, sample size needed, GT needed, e.g. DIV_50(_REL), DIV_PGD, RES_MISS_LOCI.
I would opt for "verify them and return a warning if the do not match. However, I can life with any other option you chose too.

sauter · 2018-10-07T13:48:07Z

I would suggest to go through the list and classify the metrics according to inputs needed for their calculation. for those metrics that can be calculated on the fly via upload the service should compute them and neither expect nor accept user input.

mmaiers-nmdp · 2018-10-07T18:22:55Z

We (@mmaiers-nmdp & @fscheel) reviewed the list of quality metrics and most of the them should be computed by the server and not accepted by the client.
The spec does have a place for them in the hfc submission.

We see 4 options for how to deal with quality list values submitted by the client
1.The server can’t accept it because there is no place for it (-) too much work to have two different structures/versions of the code; danger that someone changes one but not the other
2.The server WONT accept it - return error, does not persist data (-) too strict
3.The server will SILENTLY ignore it? persist HF data, but not quality list for qualities that we want the server to compute (+) easy; if someone complains, then change it
4.The server will ignore quality list and return a WARNING (-) nobody will read it; over-engineered

fscheel added the hackathon label Sep 26, 2018

fscheel mentioned this issue Oct 7, 2018

Add quality metric calculation #89

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Quality Tags #79

Quality Tags #79

mmaiers-nmdp commented Sep 26, 2018

fscheel commented Sep 26, 2018

fscheel commented Sep 27, 2018

hpeberhard commented Oct 2, 2018

sauter commented Oct 7, 2018

mmaiers-nmdp commented Oct 7, 2018

Quality Tags #79

Quality Tags #79

Comments

mmaiers-nmdp commented Sep 26, 2018

fscheel commented Sep 26, 2018

fscheel commented Sep 27, 2018

hpeberhard commented Oct 2, 2018

sauter commented Oct 7, 2018

mmaiers-nmdp commented Oct 7, 2018