Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ValueError: The gene family has not beed associated to a partition. #262

Closed
frdel1 opened this issue Aug 7, 2024 · 7 comments
Closed

ValueError: The gene family has not beed associated to a partition. #262

frdel1 opened this issue Aug 7, 2024 · 7 comments
Assignees
Labels

Comments

@frdel1
Copy link

frdel1 commented Aug 7, 2024

Hi,
I am experiencing the following error with ppanggolin all:
"ValueError: The gene family has not beed associated to a partition."

Steps to reproduce:

# get a bunch of genomes to create the pangenome
datasets download genome accession GCF_009935005.1 GCF_001015835.1 GCF_009933955.1 GCF_001932715.1 GCF_027863375.1 --include gbff

# create the organism.gbff.list file
# create the pangenome with ppanggolin all
conda activate ppanggolin-2.1.0
ppanggolin all --anno /path/to/organism.gbff.list --cpu 1 --identity 0.8 --output /path/to/output_ppanggolin_all

Best wishes

@jpjarnoux
Copy link
Member

Hi !

Sorry to hear about that.
Could you launch your command again with the option --verbose 2 and share the results ?

Thanks

@frdel1
Copy link
Author

frdel1 commented Aug 7, 2024

consol.out.txt

Sure, here it is.
Are you able to reproduce the bug by downloading the set of genomes specified in the datasets download genome accession command and running ppanggolin all ?

@jpjarnoux
Copy link
Member

Hi!

Thanks for the output. As I suspected, you don't have enough genomes in your pangenome.
The partitioning method is based on the NEM algorithm, and to work with the default parameters, we suggest using at least 15 genomes. You can find more information about the PPanGGOLiN method in the publication here.

Yet all is not lost. First, add the -K 2 option to the' all' command. This option will force PPanGGOLiN to compute only two partitions.
Then, If it did not work, I could suggest following the step-by-step pangenome construction in the documentation (skip the workflow part), or if you kept your pangenome, you could directly use the command explained here to custom the partitioning. @ggautreau will be a greater help than me at this stage.

@frdel1
Copy link
Author

frdel1 commented Aug 8, 2024

Hi!
Thanks for the explanation and the tips, I will follow your advice and use at least 15 genomes then.

@jpjarnoux
Copy link
Member

Another tip, if you don't mind me saying so.
You can build a pangenome with all genomes of your species from RefSeq or GenBank, for example, and project the pangenome on your five genomes of interest as explained here.

@frdel1
Copy link
Author

frdel1 commented Aug 8, 2024

Thanks ! I have tried ppanggolin projection already, good stuff :)

@JeanMainguy
Copy link
Member

Hi,
We've changed the log to show a warning instead of a debug message when the partition step fails, making it easier to spot the problem in the version 2.1.1.
Ideally, PPanGGOLiN should still work even if partitioning fails, as mentioned in issue #270.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants