-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Genotypes in probabilities? #16
Comments
Hi @Hoeze , you can use |
Thanks for your fast response @horta. In [22]: from bgen_reader import *
In [23]: bgen = read_bgen("complex.bgen")
Reading samples|===========================================================================================================================================================================================================================|
Mapping variants: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 51590.46it/s]
In [24]: geno = bgen["genotype"][8].compute()
In [25]: v = bgen["variants"].compute().iloc[8]
In [26]: print(v)
id
rsid M9
chrom 01
pos 9
nalleles 8
allele_ids A,G,GT,GTT,GTTT,GTTTT,GTTTTT,GTTTTTT
vaddr 783
Name: 8, dtype: object
In [27]: print(geno["probs"])
[[ 1. 0. 0. 0. 0. 0. 0. 0. nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
[ 0. 0. 0. 0. 0. 0. 1. 0. nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
[ 0. 0. 0. 0. 1. 0. 0. 0. nan nan nan nan nan nan nan nan nan nan
nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 1. 0.
0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]]
In [28]: print(geno["probs"].shape)
(4, 36) I already found the In the end, my target is to extract in a certain genomic range the most likely sequence for each individual. |
That is the tricky part because BGEN allows for very general genotype. It depends wether you have Unphased genotype, Phased genotypes, number of alleles, ploiyd. The quick start gives a quick idea on how to perform the association between probability and genotype (in particular, read the comments in the code). But the ultimate answer is in the section Per-sample order of stored probabilities of bgen specification. |
Hm, I see. How do you solve this problem in your projects? |
Have a look at |
But make sure you have unphased genotype: "This function supports unphased genotypes only." |
Ah, thank you for the hint, I found what I searched for: bgen-reader-py/bgen_reader/_helper.py Line 1 in 2651d87
Would it be possible to have this function publicly exported in genotype? |
Sure. Will do it for the 3.1.0 release |
Hi, how do I retrieve the genotype column ID for
genotype[x].compute()["probs"]
?The text was updated successfully, but these errors were encountered: