Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Current issues with lovd_getVariantInfo() #581

Open
4 of 5 tasks
ifokkema opened this issue Jan 10, 2022 · 4 comments
Open
4 of 5 tasks

Current issues with lovd_getVariantInfo() #581

ifokkema opened this issue Jan 10, 2022 · 4 comments

Comments

@ifokkema
Copy link
Member

ifokkema commented Jan 10, 2022

There are some issues with the current version of lovd_getVariantInfo() as in the improve/getVariantInfo branch. The below variants should be HGVS but are not regarded as such.

  • c.361A>T^362C>G and related, e.g., c.361^362C>G
  • c.3789_3797delins[NG_001212.4:g.6520_6528]
    NG is not allowed, NC is.
  • g.6128749_6128787delins[NC_000022.11:17178886_17178924]
    It has been clarified that this is not HGVS. We could have this fixed in lovd_fixHGVS(), though.
  • g.(32366565_32380940)_(32380940_32382772)dup
    I believe this should be accepted? (center positions are the same)
  • g.(33038290_33229611)_(33229611_?)del
    (idem)

We also need to decide how (if?) to handle variants like:

  • g.[57365055C>T];[57365055C>T]
  • g.[57365118C>G;57373613_57373615del]
  • g.[57367503C>T(;)57373597C>T]
  • g.112036755_112036756ins[112036782_112036805;112036797C>T]
  • 1122–1457 del 326 bp
  • m.1000_100del
  • Anything with the o prefix

For insertions; there are currently too many pieces of code that handle variant suffixes, with complex if()s and regular expressions.

  • lovd_getVariantInfo() doesn't nearly handle everything that lovd_fixHGVS() is handling. It would make much more sense to use the power of lovd_fixHGVS() also for lovd_getVariantInfo(), so it will ALWAYS provide a proper suggestion instead of sometimes. It would be better to suggest one fix for the entire suffix, so not per piece. lovd_fixHGVS() can then be rewritten just to copy the suggestion so the logic stays in one place.
  • If the supported suffix will get any more complex, perhaps it's time to then pull the suffix handling into a separate function. Especially for insertions, both lovd_getVariantInfo() and lovd_fixHGVS() now have extensive code handling possible suffixes. That new function will then parse the suffix and return an object; either a length object, sequence object, variant object, or an array of objects, etc. The rest of the code will then check if that object fits the variant, etc.
@loeswerkman
Copy link
Collaborator

Regarding the handling of variants such as c.361A>T^362C>G:

Our best option is to:

  1. extract the main regular expression from getVariantInfo and put this into another function, together with the creation of the associative array which holds all information seperately.
  2. Then, when we find a ^ in a variant, we can split the description at each ^ and seperately send it to the regular expression.
    --> Each of the then seperated variants will be missing either a prefix or a variant type (e.g. A>G). We should find and fill in the missing information.
  3. When each seperate variant is checked, we can return true/false if $bCheckHGVS. If !$bCheckHGVS, we can make an $aResponse filled by the lowest start position, the highest end positition, type filled by "^", and the warnings and errors filled by a combination of what we found in them seperately.

@ifokkema
Copy link
Member Author

ifokkema commented Feb 9, 2022

The function above could be named lovd_parseVariant(). Now adding a stub to sort-of support the "or" type variants.

@ifokkema
Copy link
Member Author

ifokkema commented Feb 9, 2022

Now adding a stub to sort-of support the allele notation (combined variants). Leaving the issue open for now, but the worst is handled.

@ifokkema
Copy link
Member Author

If the supported suffix will get any more complex, perhaps it's time to then pull the suffix handling into a separate function. Especially for insertions, both lovd_getVariantInfo() and lovd_fixHGVS() now have extensive code handling possible suffixes. That new function will then parse the suffix and return an object; either a length object, sequence object, variant object, or an array of objects, etc. The rest of the code will then check if that object fits the variant, etc. There's currently too many pieces of code that handle variant suffixes, with complex if()s and regular expressions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants