-
Notifications
You must be signed in to change notification settings - Fork 5
Tutorial
We will demonstrate an example run of PhaBOX2 on a small dataset here. We assume the tool has been installed according to the installation instructions.
First, let's download the example dataset exmaple_contigs.fa and the phabox database (if you have not downloaded it yet), containing 390 sequences:
wget https://github.com/KennthShang/PhaBOX/releases/download/v2/example_contigs.fa
wget https://github.com/KennthShang/PhaBOX/releases/download/v2/phabox_db_v2.zip
# unzip the database
unzip phabox_db_v2.zip > /dev/null
NOTE: If you are in mainland CHINA, please click the button to download the example and database via your browser. Then upload them to your HPC.
A screenshot of the exmaple_contigs.fa should look like:
>example_0
AGATACTAACTCTGCTGCATAGACAAGAAATTCGTCTTTGCGGGAATATTTACCTGCAAG
GTATATTTTCACATTAACCTCTCAAAAAGCGTTTAACCACTGCTGGTACAACCCCATTTC
·······
The next step is to run phabox2 to classify these contigs. There are many options for users. Here, we only demonstrate the basic usage of running phabox2. Detailed options can be found in the Options.
phabox2 --task end_to_end --contigs example_contigs.fa --outpth example_out --threads 40 --dbdir phabox_db_v2
We set the output folder as example_out
and use the end_to_end
mode to run phabox2. In this mode, phabox2 will identify the viruses and assign taxonomy classification, host, and lifestyle (if it's a prokaryotic virus) to the identified viruses.
During the run, the logs in the console will show the current step in phabox2:
Running program: PhaMer (virus identification)
[1/7] filtering the length of contigs...
[2/7] calling genes with prodigal...
[3/7] running all-against-all alignment...
[4/7] converting sequences to sentences for language model...
[5/7] Predicting the viruses...
······
Running program: PhaGCN (taxonomy classification)
······
Running program: CHERRY (Host prediction)
······
Running program: PhaTYP (Lifestyle prediction)
······
If the run is completed successfully, in the end, we will see this console output providing some statistics about the number of hits that were found:
PhaMer finished! please check the results in example_out/final_prediction/phamer_prediction.tsv
PhaGCN finished! please check the results in example_out/final_prediction/phagcn_prediction.tsv
Cherry finished! please check the results in example_out/final_prediction/cherry_prediction.tsv
PhaTYP finished! please check the results in example_out/final_prediction/phatyp_prediction.tsv
Summarized finished! please check the results in example_out/final_prediction/final_prediction_summary.tsv
The detailed output format can be found via Output