Does it work with non-model species? #3

DuttaAnik · 2024-07-11T07:10:29Z

Hello,
Thanks for developing the tool. Does this tool work with non-model species of different ploidy?

JustinChu · 2024-07-11T19:46:53Z

Unfortunately the code was designed specifically for diploid genomes. The code considers if a site is homozygous or heterozygous, though can handle if missing sites exist too. If you fed in sites with only 2 alleles that have frequencies that are roughly equal (as a hack), it may provide some results, but I cannot guarantee that the results will make sense.

This does have me thinking if we could create a model to handle genomes with a generic ploidy, without sacrificing statistical power.

DuttaAnik · 2024-07-11T20:03:01Z

Thanks for the reply. Although it is a far-fetched idea, it would be really cool to have this option in this tool along with handling multi-allelic sites. To my knowledge, no good tools are available to detect sample swap in non-model organisms.

JustinChu · 2024-07-11T21:02:32Z

I would be interested in if the tool gives back any meaningful results in your case if you run it (with the hack). If I were to guess, I think given enough sites with high enough variability, In the worst case I think it will say everything is unrelated so I don't think it would hurt.

DuttaAnik · 2024-07-15T07:34:06Z

Hi, I have a few questions. First, thanks for fixing the parsing bug. It works now.

So, in this following command:
scripts/generateSites name=prefix ref=reference.fa vcf=snps.vcf I should use the multisample VCF file that contains SNPs from all the samples, right?

Then, in this command:
ntsmVCF -p prefix -s sites.fa -r reference.fa multiVCF.vcf
Should I use the same VCF that I used in the first command? This is a bit confusing. And the sites.fa I assume is created from the first command, right?

Lastly, can I use a list of raw fastq files instead of writing them one by one in the code below? If yes, what should be the format of the list file?
Because I have more than 100s of fastq files.
ntsmCount -t 2 -s sites.fa sample_part1.fq sample_part2.fq > counts.txt

Thank you.

JustinChu · 2024-07-15T18:04:21Z

So, in this following command: scripts/generateSites name=prefix ref=reference.fa vcf=snps.vcf I should use the multisample VCF file that contains SNPs from all the samples, right?

Edit*: Actually, the VCF that is used here doesn't need to be a multisample VCF. it just needs the biallelic variants.

Then, in this command: ntsmVCF -p prefix -s sites.fa -r reference.fa multiVCF.vcf Should I use the same VCF that I used in the first command? This is a bit confusing. And the sites.fa I assume is created from the first command, right?

Edit* The multi VCF file here must be a multisample VCF with reliable genotyping results from a reliable set of samples to capture the population structure. It can be but does not have to be is not the same as above. Also, ideally the multisample VCF used should not contain any of the samples used in the sample swap detection process downstream. The sites.fa is correct. I've changed the readme to clarify where sites.fa comes from. I've also added text to mention that using a rotation matrix is optional.

Lastly, can I use a list of raw fastq files instead of writing them one by one in the code below? If yes, what should be the format of the list file? Because I have more than 100s of fastq files. ntsmCount -t 2 -s sites.fa sample_part1.fq sample_part2.fq > counts.txt

At the moment I don't have support for a file list. However, unix glob (i.e. wildcards *) should work. Also, to be clear each sample will need its own count file and thus a separate ntsmCount command.

JustinChu added the question Further information is requested label Jul 11, 2024

JustinChu mentioned this issue Jul 15, 2024

unable to parses error #4

Closed

DuttaAnik changed the title ~~Does it work with plant samples?~~ Does it work with non-model species? Jul 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Does it work with non-model species? #3

Does it work with non-model species? #3

DuttaAnik commented Jul 11, 2024 •

edited

Loading

JustinChu commented Jul 11, 2024

DuttaAnik commented Jul 11, 2024

JustinChu commented Jul 11, 2024 •

edited

Loading

DuttaAnik commented Jul 15, 2024

JustinChu commented Jul 15, 2024 •

edited

Loading

Does it work with non-model species? #3

Does it work with non-model species? #3

Comments

DuttaAnik commented Jul 11, 2024 • edited Loading

JustinChu commented Jul 11, 2024

DuttaAnik commented Jul 11, 2024

JustinChu commented Jul 11, 2024 • edited Loading

DuttaAnik commented Jul 15, 2024

JustinChu commented Jul 15, 2024 • edited Loading

DuttaAnik commented Jul 11, 2024 •

edited

Loading

JustinChu commented Jul 11, 2024 •

edited

Loading

JustinChu commented Jul 15, 2024 •

edited

Loading