Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quality Tags #79

Open
mmaiers-nmdp opened this issue Sep 26, 2018 · 5 comments
Open

Quality Tags #79

mmaiers-nmdp opened this issue Sep 26, 2018 · 5 comments

Comments

@mmaiers-nmdp
Copy link
Contributor

Categorize the list (HH2016) further.
Some are just "descriptive statistics" or "features".
Some are indicators of how "good" the data is.

At DaSH8 we should implement a few of these using "AWS Lambda"

Simple examples:
RES_MISS_LOCI - depends on GT
Wn Statistic - global 2-locus pairwise LD (depends on GT)
DIV_50_REL - depends on HT only

@fscheel
Copy link
Contributor

fscheel commented Sep 26, 2018

The following mechanisms must be part of the implementation:

  1. have a central place to plug in further metrics calculations during upload
  2. compute new metrics on existing datasets without service disruption

@fscheel
Copy link
Contributor

fscheel commented Sep 27, 2018

This is the list of defined "quality" metrics in PHYCUS right now:

          - DIV_LAMBDA
          - DIV_50
          - DIV_50_REL
          - SAM_SIZE
          - SAM_POP
          - DIV_PGD
          - DIV_HEAVY_TAIL
          - RES_TRS_COUNT
          - RES_TRS
          - RES_SHARE_AMBIG
          - RES_MISS_LOCI
          - DEV_HWE
          - ERR_STD
          - ERR_SAMP_80_100
          - SUM_FREQ_GAP
          - ERR_OFFSET
          - LD_MEASURE
          - KFOLD_IMPUTE
          - KFOLD_PRED_ACTUAL
          - KFOLD_N

It seems that we can calulate some ourselves. But right now we also accept all of these values. How do we handle values that we receive but also calculate ourselves? Some options:

  • Verify them (and return an error or warning ❓ if they don't match)
  • Ignore them and use our own calculation
  • Trust the input
  • Error
  • Change the spec to disallow (a subset ❓) of metrics

@hpeberhard
Copy link
Collaborator

I talked with Florian about these today and I would like to share a few thoughts with you.

  1. I believe it is important to come to a minimum valuable product / minimum loveable product soon.
  2. Loveable includes not perfect ;-).
  3. Hackers at Hackathons should have something to do that has the potential to lead to a minimum * product.
  4. Take one of each metrics to start: no GT needed, sample size needed, GT needed, e.g. DIV_50(_REL), DIV_PGD, RES_MISS_LOCI.
  5. I would opt for "verify them and return a warning if the do not match. However, I can life with any other option you chose too.

@sauter
Copy link
Collaborator

sauter commented Oct 7, 2018

I would suggest to go through the list and classify the metrics according to inputs needed for their calculation. for those metrics that can be calculated on the fly via upload the service should compute them and neither expect nor accept user input.

@mmaiers-nmdp
Copy link
Contributor Author

We (@mmaiers-nmdp & @fscheel) reviewed the list of quality metrics and most of the them should be computed by the server and not accepted by the client.
The spec does have a place for them in the hfc submission.

We see 4 options for how to deal with quality list values submitted by the client
1.The server can’t accept it because there is no place for it (-) too much work to have two different structures/versions of the code; danger that someone changes one but not the other
2.The server WONT accept it - return error, does not persist data (-) too strict
3.The server will SILENTLY ignore it? persist HF data, but not quality list for qualities that we want the server to compute (+) easy; if someone complains, then change it
4.The server will ignore quality list and return a WARNING (-) nobody will read it; over-engineered

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants