Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble to run phylosift on server #475

Open
dong5600 opened this issue Jan 22, 2016 · 7 comments
Open

Trouble to run phylosift on server #475

dong5600 opened this issue Jan 22, 2016 · 7 comments

Comments

@dong5600
Copy link

Hello,

I was able to successfully run Phylosift on my desktop. Since it is too slow (at least 3-5 hrs/per sample) to make my work of >50 samples feasible, I have tried to install the program on our institute's server, which has not gotten positive results.

After unzip the phylosift file on the server, I did a test run for the data I had success for my desktop version, but it gave error information and the marker database was not downloaded in the first run. I searched online, and followed the protocol in an earlier report (https://groups.google.com/forum/#!topic/phylosift/6DkF-rzKbdw) manually downloaded the databases and uncompress them. But still could not get progress.

My error message and the --debug information are listed as below. Any help is highly appreciated!

Results for the taxasummary.txt:

Taxon_ID Taxon_Rank Taxon_Name Probability_Mass

Unclassifiable Unknown Unknown 0

Error information:

PhyloSift -- Phylogenetic analysis of genomes and metagenomes
(c) 2011, 2012 Aaron Darling and Guillaume Jospin

CITATION:
PhyloSift. A. E. Darling, G. Jospin, E. Lowe, F. A. Matsen, H. M. Bik, J. A. Eisen. Submitted to PeerJ

PhyloSift incorporates several other software packages, please consider also citing the following papers:

    pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree.
    Frederick A Matsen, Robin B Kodner, and E Virginia Armbrust
    BMC Bioinformatics 2010, 11:538

    Adaptive seeds tame genomic sequence comparison.
    SM Kielbasa, R Wan, K Sato, P Horton, MC Frith
    Genome Research 2011.

    Infernal 1.0: Inference of RNA alignments
    E. P. Nawrocki, D. L. Kolbe, and S. R. Eddy
    Bioinformatics 25:1335-1337 (2009)

    Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome.
    Langmead B, Trapnell C, Pop M, Salzberg SL. Genome Biol 10:R25.

    HMMER 3.0 (March 2010); http://hmmer.org/
    Copyright (C) 2010 Howard Hughes Medical Institute.
    Freely distributed under the GNU General Public License (GPLv3).

    Phylogenetic Diversity within Seconds.
    Bui Quang Minh, Steffen Klaere and Arndt von Haeseler
    Syst Biol (2006) 55 (5): 769-773.

rm: cannot remove `/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa/blastDir/.aa.1': No such file or directory

Debug info:

All systems are good to go, continuing the screening
deleting an old run
/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa
MODE : all
Using updated markers
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_list.txt
Before runBlast 2016-01-22 16:53:07
USING 0
Input type is dna, fasta
Making fifos
Launching search process 1
Running /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/bin/lastal -F15 -e75 -f0 /home/a-m/dong5600/share/phylosift/markers/replast "/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa/blastDir/last_0.pipe" |Opening /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v1.0.1/PS_temp/ACF_bin1.fa/blastDir/reads.fasta.1
Octopus is handing out sequences
Octopus handed out 172 sequences
Writing candidates from process 1
ReadsFile: ACF_bin1.fa
.lastal Got 0 markers with hits
.lastal Got 0 nucleotide markers with hits
After runBlast 2016-01-22 16:53:07
Before runAlign 2016-01-22 16:53:07
after marker prep
AFTER ALIGN and MASK
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_list.txt
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_list.txt
AFTER concatenateALI
After runAlign 2016-01-22 16:53:07
Before runPplacer 2016-01-22 16:53:07
After runPplacer 2016-01-22 16:53:07
Before runSummarize 2016-01-22 16:53:07

******STARTING SUMMARY

Writing sequences
Total classifiable probability mass is 0
Before runKrona 2016-01-22 16:53:07
Generating krona
After runKrona 2016-01-22 16:53:07
Debug lvl : 1
After runBlast 2016-01-22 16:53:07
MODE :: all

@dong5600
Copy link
Author

BTW, my command was: ./phylosift all --isolate ACF_bin1.fa

@gjospin
Copy link
Owner

gjospin commented Jan 23, 2016

Did you index the database before trying to run phylosift?

Phylosift index --debug

Should do the trick. If you edited your phylosiftrc file to tell PS where to look for the database then you might need to also add the following flag --config new_phylosiftrc

This line is suspect because it should end with list.txt and not listtxt
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt
Not sure what happened there.

I'd try the indexing first. It usually is done automatically when downloading the db but since that didn't work then I would think it didn't happen.

Sent from my iPhone

On Jan 22, 2016, at 3:23 PM, dong5600 [email protected] wrote:

Hello,

I was able to successfully run Phylosift on my desktop Since it is too slow (at least 3-5 hrs/per sample) to make my work of >50 samples feasible, I have tried to install the program on our institute's server, which has not gotten positive results

After unzip the phylosift file on the server, I did a test run for the data I had success for my desktop version, but it gave error information and the marker database was not downloaded in the first run I searched online, and followed the protocol in an earlier report (https://groupsgooglecom/forum/#!topic/phylosift/6DkF-rzKbdw) manually downloaded the databases and uncompress them But still could not get progress

My error message and the --debug information are listed as below Any help is highly appreciated!

Results for the taxasummarytxt:
#Taxon_ID Taxon_Rank Taxon_Name Probability_Mass
Unclassifiable Unknown Unknown 0

Error information:

PhyloSift -- Phylogenetic analysis of genomes and metagenomes
(c) 2011, 2012 Aaron Darling and Guillaume Jospin

CITATION:
PhyloSift A E Darling, G Jospin, E Lowe, F A Matsen, H M Bik, J A Eisen Submitted to PeerJ

PhyloSift incorporates several other software packages, please consider also citing the following papers:

pplacer: linear time maximum-likelihood and Bayesian phylogenetic placement of sequences onto a fixed reference tree
Frederick A Matsen, Robin B Kodner, and E Virginia Armbrust
BMC Bioinformatics 2010, 11:538

Adaptive seeds tame genomic sequence comparison
SM Kielbasa, R Wan, K Sato, P Horton, MC Frith
Genome Research 2011

Infernal 10: Inference of RNA alignments
E P Nawrocki, D L Kolbe, and S R Eddy
Bioinformatics 25:1335-1337 (2009)

Bowtie: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome
Langmead B, Trapnell C, Pop M, Salzberg SL Genome Biol 10:R25

HMMER 30 (March 2010); http://hmmerorg/
Copyright (C) 2010 Howard Hughes Medical Institute
Freely distributed under the GNU General Public License (GPLv3)

Phylogenetic Diversity within Seconds
Bui Quang Minh, Steffen Klaere and Arndt von Haeseler
Syst Biol (2006) 55 (5): 769-773

rm: cannot remove `/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa/blastDir/aa1': No such file or directory

Debug info:

All systems are good to go, continuing the screening
deleting an old run
/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa
MODE : all
Using updated markers
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt
Before runBlast 2016-01-22 16:53:07
USING 0
Input type is dna, fasta
Making fifos
Launching search process 1
Running /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/bin/lastal -F15 -e75 -f0 /home/a-m/dong5600/share/phylosift/markers/replast "/home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa/blastDir/last_0pipe" |Opening /home/a-m/dong5600/Yellowstone_omics/Yellowstone_omics/DNA/metagenomics/Pond_facies/Phylogenetic_bin/Phylosift/phylosift_v101/PS_temp/ACF_bin1fa/blastDir/readsfasta1
Octopus is handing out sequences
Octopus handed out 172 sequences
Writing candidates from process 1
ReadsFile: ACF_bin1fa
lastal Got 0 markers with hits
lastal Got 0 nucleotide markers with hits
After runBlast 2016-01-22 16:53:07
Before runAlign 2016-01-22 16:53:07
after marker prep
AFTER ALIGN and MASK
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt
Using a marker list file /home/a-m/dong5600/share/phylosift/markers/marker_listtxt
AFTER concatenateALI
After runAlign 2016-01-22 16:53:07
Before runPplacer 2016-01-22 16:53:07
After runPplacer 2016-01-22 16:53:07
Before runSummarize 2016-01-22 16:53:07

******STARTING SUMMARY

Writing sequences
Total classifiable probability mass is 0
Before runKrona 2016-01-22 16:53:07
Generating krona
After runKrona 2016-01-22 16:53:07
Debug lvl : 1
After runBlast 2016-01-22 16:53:07
MODE :: all


Reply to this email directly or view it on GitHub.

@dong5600
Copy link
Author

Thank you for reply.

I have not index the database yet. When reading other posts, I also noticed that the automatically loaded database was stored in a folder named .XXX/shared/phylosift (/home/a-m/dong5600/share/phylosift/ for my case). I am wondering whether I should directly download the marker files under this path, which is different from where my phylosift was installed on the server.

I will try the index command too.

I also have another question, since I have >50 samples, can I ran multiple samples at the same time? I tried 2 on my desktop, but it did not work. Please advise!

Thank you and will update the status!

@gjospin
Copy link
Owner

gjospin commented Jan 23, 2016

If you have a lot of memory at your disposal you can run multiple instances at once.
Keep in mind that in order to run efficiently fast you might need around 24gigs of ram per instance. That is because of the pplacer step.
The search step can be run on multiple CPUs but that's the only section of the pipeline that can do that.
If you have less memory pplacer is engineered to use temporary files written to the disk to be able to operate. The IO becomes the limiting step at that point.

We have the luxury of having a computer cluster so I launch 1 phylosift instance per machine using as many cpus as the machines will allow. So I can get through about 20-50 samples per day depending on the availability of the cluster.

I hope this helps.

Sent from my iPhone

On Jan 22, 2016, at 6:31 PM, dong5600 [email protected] wrote:

Thank you for reply.

I have not index the database yet. When reading other posts, I also noticed that the automatically loaded database was stored in a folder named .XXX/shared/phylosift (/home/a-m/dong5600/share/phylosift/ for my case). I am wondering whether I should directly download the marker files under this path, which is different from where my phylosift was installed on the server.

I will try the index command too.

I also have another question, since I have >50 samples, can I ran multiple samples at the same time? I tried 2 on my desktop, but it did not work. Please advise!

Thank you and will update the status!


Reply to this email directly or view it on GitHub.

@dong5600
Copy link
Author

Sounds good. Let me figure out the index issue first. Will update the status. Thank you!

@dong5600
Copy link
Author

Thank you very much for the suggestion, the program worked.

To follow up the question to run batch samples, could you suggest how to write command including multiple files? I did not find related information in the tutorial. An alternative ways was to write individual scripts and submit in batch. Please advice.

Thanks a lot!

@gjospin
Copy link
Owner

gjospin commented Jan 23, 2016

I write an external wrapper script that writes and executes job scripts for
each of file that in my sample pool.

On Sat, Jan 23, 2016 at 1:28 PM, dong5600 [email protected] wrote:

Thank you very much for the suggestion, the program worked.

To follow up the question to run batch samples, could you suggest how to
write command including multiple files? I did not find related information
in the tutorial. An alternative ways was to write individual scripts and
submit in batch. Please advice.

Thanks a lot!


Reply to this email directly or view it on GitHub
#475 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants