Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HLA typing on Nanopore or PacBio data #55

Open
kokyriakidis opened this issue Mar 22, 2021 · 7 comments
Open

HLA typing on Nanopore or PacBio data #55

kokyriakidis opened this issue Mar 22, 2021 · 7 comments

Comments

@kokyriakidis
Copy link

Hi!

Is there a chance it will work with long reads like Nanopore or Pacbio?

@chbe-helix
Copy link
Collaborator

Hi Kokyriakidis,

Great question! It might. We'd have to work with you to figure out a custom pipeline to get it to work. My largest concern would be with the error rate of common long read technologies. Here is a rough idea of some of the changes and customizations we'd need to consider:

  1. get HISAT-genotype's custom genotype genome built with a long read aligner
  2. Possible error correction in the long reads.
  3. Feed the alignments into HISAT-gentoype (this is already possible)

It would be a little work to get it done but it may be possible. Let me know your thoughts.

Thanks,
Chris

@kokyriakidis
Copy link
Author

kokyriakidis commented Mar 22, 2021

Hi Chris,

The current best practice is to allign Nanopore reads with minimap2

minimap2 -a -z 600,200 -ax map-ont --MD -t {threads}  \
            -R "@RG\\tID:{sample}\\tSM:{sample}"  \
            {reference} {query} | \
samtools sort -@ {threads} -o {output} -

and PacBio reads with pbmm2

pbmm2 align --num-threads {threads} \
            --preset CCS \
            --rg "@RG\\tID:{sample}\\tSM:{sample}" \
            --log-level INFO \
            {extra} \
            {reference} \
            {query} \
            {bam})

PacBio HiFi reads do not need any error correction. Most recent PacBio data are HiFi nowadays.

Nanopore data may require error correction but I am not sure if this is gonna mess with the alleles.

If I provide a phased haplotagged BAM file will HISAT-genotype be able to HLA type using the phasing information from the haplotagged BAM?

(My end goal is to create a Pharmacogenomics workflow that can handle Illumina, Nanopore and PacBio data. I wanted to incorporate HISAT-genotype in this workflow as a tool that can handle all types of data)

Thanks,
Konstantinos

@chbe-helix
Copy link
Collaborator

Hi Konstantinos,

If the BAM files are generated with coordinates that match the genotype genome reference, it could work. HISAT-genotype uses a custom Genotype Genome reference that has shifted coordinates to GRCh38. So, if you can generate haplotagged BAM files with genotype genome coordinates that would be ideal. We're working towards a GRCh38 to genotype genome (and vice versa) mapping to integrate with HISAT-genotype. That will likely happen in a later release though and may be outside of when you're looking to develop your workflow.

If you provide a phased haplotagged BAM file with genotype genome coordinates, I'd be happy to see if we can get HISAT-genotype working for your purposes.

Thanks,
Chris

@kokyriakidis
Copy link
Author

Hmm I need GRCh38 as input for later stages like variant calling. I will try to map with minimap2 using this custom genotype genome reference and make an evaluation of the variants produced compared to using GRCh38.

I will try to get you a phased haplotagged BAM with genotype genome coordinates in order to see how useful the information that carries is for HISAT-genotype.

Working directly with Nanopore or PacBio fastq reads needs a custom model from your end if I understand correctly.

So, I will keep this isssue open for research purposes.

Thanks,
Konstantinos

@chbe-helix
Copy link
Collaborator

Hi Konstantinos,

Sounds good! I understand needing to use GRCh38 coordinates and, yes, I will likely need to make modifications to HISAT-genotype and it's models to get it working with long reads. I'm happy to work with you on this endeavor. Let me know if you can get useful genotype genome coordinate BAM files for me to edit the models and I will let you know when I get the GRCh38 to genotype genome map working.

Thanks,
Chris

@adbeggs
Copy link

adbeggs commented Aug 14, 2022

HI both,
Any progress on this? I am happy to help @chbe-helix as I also have an interest in this area and am working with Oxford Nanopore on long read HLA typing.

Best wishes

Andrew

@alisamatisse
Copy link

I would be also happy to learn more on how to run HISAT on long-read sequencing (PacBio HiFi) data...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants