Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

errors with guppy toc and large trees #366

Open
antgonza opened this issue Feb 6, 2019 · 4 comments
Open

errors with guppy toc and large trees #366

antgonza opened this issue Feb 6, 2019 · 4 comments

Comments

@antgonza
Copy link

antgonza commented Feb 6, 2019

Hope this is the right place for this issue.

Anyway, as part of our Qiita archive releases (More Info -> BETA: download Archive files) we are building trees that contain all the fragments processed in the system. However, during the last release we encounter an error with guppy toc but we can determine the source.

A bit more background: every single deblur sequence generated in the system, is processed via fragment-insertion/SEPP, and the placements are stored in the DB; monthly we retrieve all those sequences placements and we generate a full tree. The current issue is the last step in SEPP where guppy toc simply fails with no visible error. Note that this is the first month we do this and we have ran with extra memory (used vs. requested). Also, that all these placements have been added to a tree for each individual processed data but fails when doing the full dump - IMOO pointing to size.

The file we are having issues with can be found here:

  • placements.json.gz (md5: b6afea7dcb4e2a20c8db7d13c4a932cd) 1.9G
  • placements.json (md5: c663fd36f49f19bf97a689237507ec0e) is 6.4G

Any help will be greatly appreciated.

cc: @sjanssen2, @smirarab

@matsen
Copy link
Owner

matsen commented Feb 6, 2019

@antgonza Thank you for the model bug report.

However, I'm afraid that this won't get fixed. Pplacer development stopped years ago and this sounds non-trivial. I suggest you explore the Stamatakis lab's recent work such as http://genesis-lib.org/ and their EPA-NG.

@antgonza
Copy link
Author

antgonza commented Feb 6, 2019

Thank your for your prompt reply and kind words. Sad news but I get it.

Out of curiosity, is there a "translator" of commands and formats (not sure if needed) from pplacer/guppy to genesis-lib?

@matsen
Copy link
Owner

matsen commented Feb 6, 2019

I don't know of one. You should ping them!

@matsen
Copy link
Owner

matsen commented Feb 6, 2019

I don't really get your use case, but I feel the need to remind you that pplacer is not a typical phylogenetic inference program and was never meant to be one. It sounds to me like you are trying to use it to build a full tree.

Our perspective is that it was to place sequences on a tree, with the result being the tree with a collection of placements on it, which we thought of as a bunch of marker points on the tree. One then analyzes that object.

To drive the difference home, if we have identical sequences they will get placed at the same location on the tree. That's reasonable. But if you do guppy tog on them you will get two branches with a non-trivial pendant branch length. That's silly. One should have instead a single branch off the reference tree with the two sequences attaching to that branch with zero branch length.

To summarize, tog was put in somewhat begrudgingly so people could "see" placements with normal tree viz software. I don't think it should be used in production.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants