Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid Data #6

Open
BrendaLee1 opened this issue Nov 24, 2024 · 8 comments
Open

Invalid Data #6

BrendaLee1 opened this issue Nov 24, 2024 · 8 comments

Comments

@BrendaLee1
Copy link

Hi,
Thank you for this tool, I tried the pipline you provided on my data, but during denovo detection, I got a lot of warning:
2024-11-24 20:55:02 [WARN] - Skipping invalid data in M
2024-11-24 20:55:02 [WARN] - Skipping invalid data in C
2024-11-24 20:55:02 [WARN] - Skipping invalid data in F

the files in input path named as follows:
Sample.spanning.bam
Sample.vcf.gz
Sample.sorted.vcf.gz
Sample.sorted.vcf.gz.csi
Sample.spanning.sorted.bam
Sample.spanning.sorted.bam.bai

but the de-novo output seems very strange:
trid genotype denovo_coverage allele_coverage allele_ratio child_coverage child_ratio mean_diff_father mean_diff_mother father_dropout_prob mother_dropout_prob allele_origin denovo_status per_allele_reads_father per_allele_reads_mother per_allele_reads_child father_dropout mother_dropout child_dropout index father_MC mother_MC child_MC father_AL mother_AL child_AL father_overlap_coverage mother_overlap_coverage
chr19_47047398_47047538_CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC 0 0 0 0.0 0 0.0 0.0 0.0 0.0 0.0 . . . . . . . . 0 . . . . . . . .
chr19_2041258_2041346_GGCCCCAACCA 0 0 0 0.0 0 0.0 0.0 0.0 0.0 0.0 . . . . . . . . 0 . . . . . . . .

All the values were set to zero through the whole output file.

I checked the vcf files of trio data generated by trgt, an exmple showed below:

Father: chr19 47047398 . CAGCCTCGCCCTTCTTTTCCTTCAAATGCCGCCATCTCCTACCGAGTATGGCCTGGGCCAATCCCATCCATGTCCTACCGAGTATGGCCTGGGCCAATCCCACCCACGTCCGTCCCCATTCACGTCCTTTACAAACAGCCC . 0 . TRID=chr19_47047398_47047538_CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC;END=47047538;MOTIFS=CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC;STRUC=(CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC)n GT:AL:ALLR:SD:MC:MS:AP:AM 0/0:140,140:138-142,140-140:20,20:2,2:0(30-102),0(30-102):0.5,0.5:.,.

Mother:
chr19 47047398 . CAGCCTCGCCCTTCTTTTCCTTCAAATGCCGCCATCTCCTACCGAGTATGGCCTGGGCCAATCCCATCCATGTCCTACCGAGTATGGCCTGGGCCAATCCCACCCACGTCCGTCCCCATTCACGTCCTTTACAAACAGCCC . 0 . TRID=chr19_47047398_47047538_CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC;END=47047538;MOTIFS=CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC;STRUC=(CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC)n GT:AL:ALLR:SD:MC:MS:AP:AM 0/0:140,140:138-142,140-140:19,19:2,2:0(30-102),0(30-102):0.5,0.5:.,.

Child:
chr19 47047398 . CAGCCTCGCCCTTCTTTTCCTTCAAATGCCGCCATCTCCTACCGAGTATGGCCTGGGCCAATCCCATCCATGTCCTACCGAGTATGGCCTGGGCCAATCCCACCCACGTCCGTCCCCATTCACGTCCTTTACAAACAGCCC . 0 . TRID=chr19_47047398_47047538_CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC;END=47047538;MOTIFS=CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC;STRUC=(CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC)n GT:AL:ALLR:SD:MC:MS:AP:AM 0/0:140,140:138-141,140-140:14,14:2,2:0(30-102),0(30-102):0.5,0.5:.,.

best

@tmokveld
Copy link
Collaborator

Hi thank you for your interest!

Regarding the empty output, trgt-denovo will produce an empty output if it encounters any problem whatsoever at a given locus.

That said, the warning you encountered, Skipping invalid data in M, is unusual. I ran a test by creating VCFs with your sample data and running trgt-denovo on some of my own BAM data, but could not reproduce the warning, where I got expected output:

trid	genotype	denovo_coverage	allele_coverage	allele_ratio	child_coverage	child_ratio	mean_diff_father	mean_diff_mother	father_dropout_prob	mother_dropout_prob	allele_origin	denovo_status	per_allele_reads_father	per_allele_reads_mother	per_allele_reads_child	father_dropoutmother_dropout	child_dropout	index	father_MC	mother_MC	child_MC	father_AL	mother_AL	child_AL	father_overlap_coverage	mother_overlap_coverage
chr19_47047398_47047538_CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC	0	0	8	0.0	16	0.0	0.0	0.0	0.0	0.0	?	X	8,8	8,8	8,8	N	N	N	02,2	2,2	2,2	140,140	140,140	140,140	7,7	7,7
chr19_47047398_47047538_CCATGTCCTACCGAGTATGGCCTGGGCCAATCCCAC	0	0	8	0.0	16	0.0	0.0	0.0	0.0	0.0	?	X	8,8	8,8	8,8	N	N	N	12,2	2,2	2,2	140,140	140,140	140,140	7,7	7,7

Could you share which TRGT version you have used? Additionally (if possible) would you be willing to share the VCFs and BAMs you are using? I can also be reached at [email protected]).

@BrendaLee1
Copy link
Author

Hi,
Thank you for your reply. I tried trgt-v1.3.0-x86_64-unknown-linux-gnu and now I test trgt-v1.4.0-x86_64-unknown-linux-gnu and not finished yet. For the bam data, you mean spaning bam or whole genome bam data?

@tmokveld
Copy link
Collaborator

Great! Yes just the spanning BAM. Easiest would be to just include the reads spanning the problematic region (using samtools view, or run TRGT on only one region)

@BrendaLee1
Copy link
Author

BrendaLee1 commented Nov 25, 2024

Hi,
I attached vcf and bam data below:
trgt_test.tar.gz
trgt version: 1.4.0

When I use trgt-denovo 0.2.0, I got empty output file and Skipping invalid data warning.

Best.

@tmokveld
Copy link
Collaborator

Thanks I just had a look at it and it seems that the TRGT BAM files have malformed reads. Specifically the rq fields seem to have an invalid type rq:d:-1 (d is not supported in the BAM spec). This is supposed to be rq:f:-1.0.

There should be an update of TRGT to 1.4.1 soon, I'll let you know when it is released.

@tmokveld
Copy link
Collaborator

TRGT v1.4.1 was just released (https://github.com/PacificBiosciences/trgt/releases/tag/v1.4.1).

Would you be able to try this version and see if it fixes the problem?

@BrendaLee1
Copy link
Author

BrendaLee1 commented Nov 26, 2024

Thank you for your efforts, The new version fixed the problem.
I notice that the datasets you provided didn't include some of tandem repeat region we interested,I wonder wether the TRGT tool can generate repeat definition file for our own repeat region?

@tmokveld
Copy link
Collaborator

tmokveld commented Nov 26, 2024

To create your own repeat definitions, an option you could try is tr-solve (https://github.com/trgt-paper/tr-solve).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants