pNpS-per-gene.txt output is empty #10

franlat · 2022-02-21T01:34:24Z

Hello!
I have been using POGENOM for a while with prokaryotic MAGs and I had 0 problems with it. I have been generating my GFFs with Prodigal, VCF files with freebayes and everything is ok.

However, at the moment, I am running the same analyses in 4 different Eukaryotic genomes. I am aware POGENOM wasn't designed with Eukaryotic genomes in mind, however I'd still want to see some pNpS ratios for those genes.

The first problem I encountered was that I couldn't use Prodigal to generate my GFF files since they are not Prokaryotic genomes, so instead I used AUGUSTUS, which does. My genes look fine, yet when passing the GFF file to pogenom.pl it complained that there were duplicated gene IDs. To solve this I only left the "gene" entries in the GFF file; excluding CDS, exons, introns, etc (I know it's not optimal, but regardless I just want to see what these values look like).

Because of this, POGENOM now works without any errors and gives me Fst values and frequencies, however all the tables for pNpS or intradiversity are empty (just the sample headers).

If I try to modify the GFF and switch the "gene" tag into "CDS" (to emulate the prodigal gff) then I get this error:

Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value within %codon_aminoacid in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1479. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value within %codon_aminoacid in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1479. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. substr outside of string at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1474. Use of uninitialized value $base in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1476.

The problem with only keep the original CDS lines of the GFF is that it does not include all the gene info and it generates duplicated gene IDs.

Here's what my (simplified paths) command looks like:

perl pogenom.pl --min_count 10 --min_found 4 --vcf_file TA genome.fb.vcf --out genome.c10.s4 --gff_file genome.onlygene.gff --genetic_code_file standard_genetic_code.txt --fasta_file genome.fasta --genome_size ${gsize}

Could it be a problem with the GFF format? The euk genomes, unlike the prok genomes, have masked regions (with Ns) and although I removed the conflicting N regions when generating the VCFs to try to not overestimate SNPs, could this be the issue? Can it be a problem of the genetic code used?

Thank you in advance!

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pNpS-per-gene.txt output is empty #10

pNpS-per-gene.txt output is empty #10

franlat commented Feb 21, 2022 •

edited

Loading

pNpS-per-gene.txt output is empty #10

pNpS-per-gene.txt output is empty #10

Comments

franlat commented Feb 21, 2022 • edited Loading

franlat commented Feb 21, 2022 •

edited

Loading