Genotypes in probabilities? #16

Hoeze · 2019-05-22T19:25:47Z

Hi, how do I retrieve the genotype column ID for genotype[x].compute()["probs"]?

The text was updated successfully, but these errors were encountered:

horta · 2019-05-22T21:09:36Z

Hi @Hoeze , you can use x to index bgen["variants"] and retrieve the information you want about the x-th variant: bgen["variants"].iloc[x].

Hoeze · 2019-05-23T00:18:15Z

Thanks for your fast response @horta.
I tried your suggestion like in the following example:

In [22]: from bgen_reader import *                                                                                                                                                                                                          

In [23]: bgen = read_bgen("complex.bgen")                                                                                                                                                                                                   
Reading samples|===========================================================================================================================================================================================================================|
Mapping variants: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 51590.46it/s]

In [24]: geno = bgen["genotype"][8].compute()                                                                                                                                                                                               

In [25]: v = bgen["variants"].compute().iloc[8]                                                                                                                                                                                             

In [26]: print(v)                                                                                                                                                                                                                           
id                                                
rsid                                            M9
chrom                                           01
pos                                              9
nalleles                                         8
allele_ids    A,G,GT,GTT,GTTT,GTTTT,GTTTTT,GTTTTTT
vaddr                                          783
Name: 8, dtype: object

In [27]: print(geno["probs"])                                                                                                                                                                                                               
[[ 1.  0.  0.  0.  0.  0.  0.  0. nan nan nan nan nan nan nan nan nan nan
  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
 [ 0.  0.  0.  0.  0.  0.  1.  0. nan nan nan nan nan nan nan nan nan nan
  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
 [ 0.  0.  0.  0.  1.  0.  0.  0. nan nan nan nan nan nan nan nan nan nan
  nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan nan]
 [ 0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  1.  0.
   0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.  0.]]

In [28]: print(geno["probs"].shape)                                                                                                                                                                                                         
(4, 36)

I already found the allele_ids, but how do I know which column in geno["probs"] contains which genotype?
For example, column 1 could contain A/A, A/G, etc.

In the end, my target is to extract in a certain genomic range the most likely sequence for each individual.

horta · 2019-05-23T09:19:04Z

That is the tricky part because BGEN allows for very general genotype. It depends wether you have Unphased genotype, Phased genotypes, number of alleles, ploiyd. The quick start gives a quick idea on how to perform the association between probability and genotype (in particular, read the comments in the code). But the ultimate answer is in the section Per-sample order of stored probabilities of bgen specification.

Hoeze · 2019-05-23T12:00:23Z

Hm, I see. How do you solve this problem in your projects?
Do you have a method which calculates it?
Otherwise, I could try to write one...

horta · 2019-05-23T13:12:07Z

Have a look at allele_expectation: https://bgen-reader.readthedocs.io/en/latest/expectation.html

horta · 2019-05-23T13:12:26Z

But make sure you have unphased genotype: "This function supports unphased genotypes only."

Hoeze · 2019-05-23T14:15:18Z

Ah, thank you for the hint, I found what I searched for:

bgen-reader-py/bgen_reader/_helper.py

Line 1 in 2651d87

def get_genotypes(ploidy, nalleles):

Would it be possible to have this function publicly exported in genotype?

horta · 2019-05-24T06:04:50Z

Sure. Will do it for the 3.1.0 release

horta added the enhancement label May 24, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Genotypes in probabilities? #16

Genotypes in probabilities? #16

Hoeze commented May 22, 2019

horta commented May 22, 2019 •

edited

Loading

Hoeze commented May 23, 2019 •

edited

Loading

horta commented May 23, 2019

Hoeze commented May 23, 2019

horta commented May 23, 2019

horta commented May 23, 2019

Hoeze commented May 23, 2019 •

edited

Loading

horta commented May 24, 2019

Genotypes in probabilities? #16

Genotypes in probabilities? #16

Comments

Hoeze commented May 22, 2019

horta commented May 22, 2019 • edited Loading

Hoeze commented May 23, 2019 • edited Loading

horta commented May 23, 2019

Hoeze commented May 23, 2019

horta commented May 23, 2019

horta commented May 23, 2019

Hoeze commented May 23, 2019 • edited Loading

horta commented May 24, 2019

horta commented May 22, 2019 •

edited

Loading

Hoeze commented May 23, 2019 •

edited

Loading

Hoeze commented May 23, 2019 •

edited

Loading