Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pNpS-per-gene.txt output is empty #10

Open
franlat opened this issue Feb 21, 2022 · 0 comments
Open

pNpS-per-gene.txt output is empty #10

franlat opened this issue Feb 21, 2022 · 0 comments

Comments

@franlat
Copy link

franlat commented Feb 21, 2022

Hello!
I have been using POGENOM for a while with prokaryotic MAGs and I had 0 problems with it. I have been generating my GFFs with Prodigal, VCF files with freebayes and everything is ok.

However, at the moment, I am running the same analyses in 4 different Eukaryotic genomes. I am aware POGENOM wasn't designed with Eukaryotic genomes in mind, however I'd still want to see some pNpS ratios for those genes.

The first problem I encountered was that I couldn't use Prodigal to generate my GFF files since they are not Prokaryotic genomes, so instead I used AUGUSTUS, which does. My genes look fine, yet when passing the GFF file to pogenom.pl it complained that there were duplicated gene IDs. To solve this I only left the "gene" entries in the GFF file; excluding CDS, exons, introns, etc (I know it's not optimal, but regardless I just want to see what these values look like).

Because of this, POGENOM now works without any errors and gives me Fst values and frequencies, however all the tables for pNpS or intradiversity are empty (just the sample headers).

If I try to modify the GFF and switch the "gene" tag into "CDS" (to emulate the prodigal gff) then I get this error:

Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value within %codon_aminoacid in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1479. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value within %codon_aminoacid in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1479. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. Use of uninitialized value in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1481. substr outside of string at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1474. Use of uninitialized value $base in string eq at /home/flatorre/scratch/flatorre/POGENOM/pogenom.pl line 1476.

The problem with only keep the original CDS lines of the GFF is that it does not include all the gene info and it generates duplicated gene IDs.

Here's what my (simplified paths) command looks like:

perl pogenom.pl --min_count 10 --min_found 4 --vcf_file TA genome.fb.vcf --out genome.c10.s4 --gff_file genome.onlygene.gff --genetic_code_file standard_genetic_code.txt --fasta_file genome.fasta --genome_size ${gsize}

Could it be a problem with the GFF format? The euk genomes, unlike the prok genomes, have masked regions (with Ns) and although I removed the conflicting N regions when generating the VCFs to try to not overestimate SNPs, could this be the issue? Can it be a problem of the genetic code used?

Thank you in advance!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant