Protologger is an all-in-one genome description tool, aimed at simplifying the process of gathering the data required for writing protologues. This includes providing; taxonomic, functional and ecological insights, described in detail below;
- 16S rRNA identity values
- 16S rRNA gene phylogenetic tree
- Taxonomic assignment based on GTDB (GTDB-Tk r89)
- Genomic tree (phylophlan3)
- ANI values (FastANI)
- POCP values
- KEGG based pathway reconstruction
- CAZyme profiling
- Antibiotic resistance profiling (CARD)
- Prevalence and average relative abundance across 1,000 samples for 19 unique environments (IMNGS)
- Occurence based on MASH comparison to a database of >50,000 MAGs from thousands of samples
Hitch, T.C.A., Riedel, T., Oren, A. et al. Automated analysis of genomic sequences facilitates high-throughput and comprehensive description of bacteria. ISME Communications. 1, 16 (2021). https://doi.org/10.1038/s43705-021-00017-z
According to the latest version of the International Code on the Nomenclature of Prokaryotes (ICNP), publication of a novel taxa must include a description of that taxas' features. The format of this information is termed a 'protologue', of which some examples are provided on the example page. Whilst the ICNP are vague on the specifics of what should be included, generally protologues include; the functional features, isolation source and taxonomic placement of the species in relation to existing validly named taxa. Therefore, Protologger provides all the necessary information for writing protologues, reducing the burden on cultivation experts for the validation of names for novel taxa.
If you have single isolates that you wish to study, why not use our Galaxy web-server which, freely available at; http://protologger.de/
Included on our website is an implementation of GAN (The Great Automatic Nomenclator) (https://github.com/telatin/gan) which accepts Protologger output to generate ecological and functionally informed names.
Protologger has been re-written in python3 for easy conda installation on linux systems.
There are four steps that are required to get Protologger working;
- Create a python3 environment using the command;
conda create -n protologger python=3.7 prokka
- Install Protologger into this environment;
conda install -c thitch protologger
- Once installed, the databases must be downloaded using the following command;
setup-protologger.sh
- Make sure you have Usearch installed (version 5.2.32 is the tested version) and is in your $PATH
When finished (~5 hours depending on your internet speed), Protologger will be ready to run. Additionally, the command protologger-update.sh
can be run to download the latest validation list which is updated monthly.
If PROKKA has issues please try the following commands;
export PERL5LIB=$CONDA_PREFIX/lib/perl5/site_perl/5.22.0/
conda remove blast
conda install -c bioconda blast=2.9.0
conda install -c conda-forge -c bioconda -c defaults prokka
Protologger.py requires three inputs to be run on the commandline, detailed below.
Input flag | File type | Description |
---|---|---|
-r | Nucleotide FASTA file | Provide a file containing the 16S rRNA gene sequence for your species of interest |
-g | Nucleotide FASTA file | Provide the genome file of your species of interest |
-p | String | Provide the name of your project which will be used to name your input and output folders |
-q | NA | This option activates 'quick' mode which ignores both GTDB-Tk and PhyloPhlAn analysis, meaning Protologger can be run on a desktop PC |
Within the publication we provide the Protologger output for four distinct datasets; the HBC, the BIO-ML collection, the Hungate1000 and the iMGMC.
The Protologger output from all four datasets are downloadable here.
Protologger has been applied to characterise isolates from the chicken gut (CHiBAC), for which the entire Protologger outputs are available here
We always aim to expand the MAG database used within Protologger. If there is an additional dataset you wish included, please contact us at; [email protected]
The list of currently included datasets (numbers represent Bacterial and Archaeal numbers) is as follows;
Publication | Number of MAGs | Description |
---|---|---|
Parks et al (2020) | 3,397 | Generic |
Woodcraft et al (2018) | 568 | Permafrost |
Anantharaman et al (2016) | 303 | Groundwater |
Crits-Cristoph et al (2018) | 225 | Soil |
Dombrowski et al (2018) | 36 | Hydro-thermal sediment |
Tully et al (2018) | 339 | Ocean |
Lesker et al (2020) | 831 | Mouse gut |
Wylensek et al (2020) | 589 | Pig gut |
Stewart et al (2018) | 488 | Bovine rumen |
Almeida et al (2019) | 39,891 | Human gut |
Pasolli et al (2019) | 225 | Human stool/vagina/skin/oral cavity |
Manara et al (2019) | 1,008 | soil |
The list of datasets currently undergoing integration are;
Publication | Number of MAGs | Description |
---|---|---|
Wilkinson et al (2020) | 3,397 | African Boran rumen |
Chen et al (2021) | 1,358 | Pig gut |
Albanese et al (2021) | 263 | Cryptoendolithic community in Antarctica |
Levin et al (2021) | 1,209 | Gut of 180 wild species |
Jegousse et al (2021) | 219 | Icelandic marine water |
Robbins et al (2021) | ~1200 | Coral sponge |
Wibowo et al (2021) | 498 | Ancient human gut |
Collins et al (2021) | 111 | Deep sea fish gut |
Nayfach et al (2021) | 52,515 | Earth associated microbiota |
Becraft et al (2021) | 126 | Deep subsurface |
Krüger et al (2019) | 3101 | Algae blooms |
Lavrinienko et al (2020) | 254 | Bank vole gut |
Hitch, T.C.A., Riedel, T., Oren, A. et al. Automated analysis of genomic sequences facilitates high-throughput and comprehensive description of bacteria. ISME COMMUN. 1, 16 (2021). https://doi.org/10.1038/s43705-021-00017-z
We ask that anyone who uses Protologger cites not only our publication but the list of publications below which provide tools and databases which are integral for Protologgers working;