-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Extract direction of effect from variant annotations #20
Comments
I have finally located this, it is in variant annotations which looks to be very useful - one example with a few columns excluded:
|
The following is a summary of the notebook here, looking into variant annotation tables generally and direction of effect more specifically. OverviewPharmGKB provides 3 variant annotation tables, described in the readme as follows:
These variant annotations (plus drug labels/guidelines, not covered here) provide evidence for the clinical annotations, connected via an evidence ID. The three types of variant annotations have different but overlapping columns; in general, each row describes an assertion made by a publication (PMID) about the effect of one or more allele/genotypes. See examples here and some breakdown of how the information is provided here. Taken together, the variant annotations provide evidence for all clinical annotations and some kind of direction of effect for nearly all (>96%). If we select either of the larger tables, we get evidence for about half of all clincial annotations. Direction of effect representationIf we focus on only one of these tables, we need "only" go through an exercise of selecting which columns we want to extract, and which (if any) we want to map. If we want to use all three tables, since the column schema differs among them, we could do the same thing and have lots of optional attributes to cover all three types of annotation, or just use the free-text sentence that is provided for each annotation. If we want to use all three tables but present them in a unified and structured way, we need to come up with a generic representation of what "direction of effect" means. Based on the sentence breakdown I came up with one suggestion, but this obviously would require much more discussion:
This tells us the direction (1) and what the effect is (2&3). Of course we need some other fields to connect things, but this could be the core "direction of effect" concept. Of the above, (1) is always either "decreased" or "increased", (2) takes values in a relatively small but not fixed vocabulary (could perhaps be mapped to an EFO term), and (3) is open and would probably ideally be mapped. The values that appear can be found here. Allele / genotype representationThis is really about how to connect the direction of effect to our current evidence strings. Associating variant annotations to clinical annotations via PMID or evidence ID is the most straightfoward method. Logically, however, direction of effect annotations should be allele / genotype specific, so I looked briefly into how these are represented in the new tables - basically, can we get a direction of effect per genotype or haplotype ID. I think this is mostly doable, as the representation is consistent with what we've seen in the clinical annotations tables. The exception is that sometimes they've provided a metabolyzer type instead of an allele/genotype for comparison in the variant annotation. See here for an example. Honestly I have no idea how we could handle these right now. Some questions to consider
|
As discussed, this spreadsheet contains examples of clinical annotations with their allele/genotype annotations and variant annotations. There are 4 clinical annotations in total, including one with "metabolyzer type" comparisons which we didn't have time to cover in the meeting. The data is split into tabs by variant annotation type and includes all columns, though I've hidden some to make things a bit easier to read. Let me know any questions or thoughts you have! |
@tcezard @DSuveges @ireneisdoomed @tskir - I've been looking into how to associate variant annotations with clinical annotations at the level of genotype/allele. We can go over this together in our next meeting, but meanwhile feel free to leave your thoughts and questions. I've done a rough proof-of-concept of what the associations might look like. The algorithm is not particularly clever, it basically amounts to decomposing genotype/allele strings until we get to alleles, and doing exact string matching to line things up. It also makes a core assumption, which is that annotations on alleles can be associated with any genotype containing that allele. So if we have variant annotations on alleles You can see the algorithm itself in the notebook here and the results on a handful of examples in the spreadsheet here (with columns removed for readability). I also ran this on the entire dataset and found it could successfully associate 93.7% of variant annotations, and it found at least one variant annotation for 65.1% of clinical annotation genotypes. There are definitely some tricky cases I know it fails on, and probably some I don't know about! Also, the approximately 2/3 clinical annotation genotype coverage is not unexpected given that we often don't have variant annotations for the ref/ref genotype. I've got at least a couple questions for us to discuss:
|
I think we should start by reiterating what we are aiming to get out of the association between genotype and variant annotation:
This being said, the algorithm is good enough for a first pass and without starting to explode the * alleles in their individual rs components we might not get much better. For the ref/ref genotypes, they should be associated with the evidence that mentions this allele and not the other. OT could on its side highlight this genotype as the ref/ref so not associated with any evidence I though we said previously that all Clinical annotation had at least one piece of evidence associated from either of the 3 source. Is the 65% due to the fact that we are counting the exploded clinical annotation per genotype id ? |
Yes that's exactly it. I just checked coverage of clinical annotations not exploded by genotype, and it's 99.3%. (Of course if we just wanted to associate variant annotations with clinical annotations, not exploding by allele/genotype, our coverage would be 100% as we can do this directly by IDs...) |
@DSuveges @ireneisdoomed Hi all, based on past discussions I've looked a bit into how we might extract or summarise direction of effect information from associated variant annotations. I've written this up in a document along with some examples of what the evidence strings could look like, just as a first step. Please take a look and leave your comments. |
PGKB says they now provide the deconstructed sentences for the clinical annotations in their downloads, so it should be possible to pull out directionality related to the genotype-phenotype association.
The text was updated successfully, but these errors were encountered: