Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue 18: Implement genotype IDs to support variants with multiple alleles #24

Merged
merged 9 commits into from
Oct 20, 2023

Conversation

apriltuesday
Copy link
Collaborator

@apriltuesday apriltuesday commented Oct 9, 2023

Closes #18
Better expected output diff here

Note that in this implementation, ref/ref genotypes have no consequence or gene annotated; these will be annotated in other genotypes associated with the same variant though. For example:

RSID Genotype ID Gene Consequence Annotation text
rs3766246 21_36070377_G_A,A ENSG00000159228 SO_0001583 "AA genotype has increased risk..."
rs3766246 21_36070377_G_A,A ENSG00000185917 SO_0001627 "AA genotype has increased risk..."
rs3766246 21_36070377_G_A,G ENSG00000159228 SO_0001583 "AG genotype has increased risk..."
rs3766246 21_36070377_G_A,G ENSG00000185917 SO_0001627 "AG genotype has increased risk..."
rs3766246 21_36070377_G_G,G . . "GG genotype has decreased risk..."

We might need a follow-up issue to modify this behaviour.

I've also added counts for multi-allelic variants as requested by OT, will post the numbers once I run the entire dataset but here's what the report looks like for the test set:

Total clinical annotations: 10
	With RS: 9 (90.00%)
		1. Exploded by allele: 30 (3.3x)
		2. Exploded by drug: 66 (2.2x)
		3. Exploded by phenotype: 78 (1.2x)
Total evidence strings: 80
	With CHEBI: 62 (77.50%)
	With EFO phenotype: 30 (37.50%)
	With functional consequence: 39 (48.75%)
	With VEP gene: 39 (48.75%)
Gene comparisons per annotation
	With PGKB genes: 8 (80.00%)
	With VEP genes: 6 (60.00%)
	PGKB genes != VEP genes: 8 (80.00%)
Total RS: 9
	With parsed alleles: 7 (77.78%)
		With >2 alleles: 1 (14.29%)

@apriltuesday apriltuesday changed the title Issue 18 Issue 18: Implement genotype IDs to support variants with multiple alleles Oct 9, 2023
@apriltuesday apriltuesday marked this pull request as ready for review October 9, 2023 14:24
@apriltuesday apriltuesday self-assigned this Oct 9, 2023
Copy link
Member

@tcezard tcezard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

opentargets_pharmgkb/evidence_generation.py Outdated Show resolved Hide resolved
opentargets_pharmgkb/counts.py Outdated Show resolved Hide resolved
opentargets_pharmgkb/variant_coordinates.py Outdated Show resolved Hide resolved
@M-casado
Copy link
Collaborator

Note that in this implementation, ref/ref genotypes have no consequence or gene annotated

Couldn't we use the reference_genome term from SO. I assume getting the context gene would be fairly easy as well.

@apriltuesday
Copy link
Collaborator Author

Couldn't we use the reference_genome term from SO. I assume getting the context gene would be fairly easy as well.

I was going to ask OT what they prefer but yes, we could get the gene & return a SO term (another possibility is no_sequence_alteration)

Copy link
Collaborator

@M-casado M-casado left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few small comments

opentargets_pharmgkb/evidence_generation.py Outdated Show resolved Hide resolved
opentargets_pharmgkb/validation.py Outdated Show resolved Hide resolved
opentargets_pharmgkb/variant_coordinates.py Outdated Show resolved Hide resolved
@apriltuesday apriltuesday merged commit 5bc4b31 into EBIvariation:main Oct 20, 2023
1 check passed
@apriltuesday apriltuesday deleted the issue-18 branch January 8, 2025 14:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Handle multiple alt alleles
3 participants