deploy: e0209a0

bokulich-lab · Oct 13, 2024 · 58c457a · 58c457a
1 parent be2a327
commit 58c457a
Show file tree

Hide file tree

Showing 5 changed files with 936 additions and 105 deletions.
diff --git a/_sources/chapters/functional-annotation.md b/_sources/chapters/functional-annotation.md
@@ -14,40 +14,128 @@ kernelspec:
 
 # Functional annotation
 
-Jupyter Book also lets you write text-based notebooks using MyST Markdown.
-See [the Notebooks with MyST Markdown documentation](https://jupyterbook.org/file-types/myst-notebooks.html) for more detailed instructions.
-This page shows off a notebook written in MyST Markdown.
+## Background
+Functional annotation of **metagenome-assembled genomes (MAGs)** involves the identification and classification of genes within these reconstructed genomes to understand their roles and potential functions. MAGs are recovered from DNA directly extracted from complex microbial communities, bypassing the need to culture the organisms.
 
-## An example cell
+This process provides insights into the genes that code for enzymes, transporters, and other proteins critical to the survival and function of the microbes in various ecosystems. Annotating these genomes allows for the study of their contributions to nutrient cycles, disease processes, or specialized ecological functions.
 
-With MyST Markdown, you can define code cells with a directive like so:
+This workflow outlines the step-by-step process for functional annotation of MAGs or contigs using tools like EggNOG and the Diamond aligner in `QIIME2`. Each step is explained with the relevant parameters for accurate functional insights.
+
+**FOR MICHAL: PASTE WORKFLOW IMAGE HERE?**
+
+```{note}
+Functional annotation can be performed on fully reconstructed **MAGs** or directly on **contigs** (the contiguous sequences assembled from sequencing reads). Annotating **contigs** can provide early insights into important functional genes even before complete genomes are assembled.
+
+In this tutorial, we will focus on functional annotation of our previously reconstructed MAGs (see **Recovery of MAGs section**)
+```
+```{warning}
+Functional annotation can be highly resource-intensive. Ensure that your system has sufficient CPU and memory resources before running these commands.
+```
+**For more information on each tool used in this workflow, refer to their official documentation:**
+
+- EggNOG-mapper: [https://github.com/eggnogdb/eggnog-mapper](https://github.com/eggnogdb/eggnog-mapper)
+- DIAMOND: [https://github.com/bbuchfink/diamond](https://github.com/bbuchfink/diamond)
+- QIIME 2: [https://github.com/qiime2](https://github.com/qiime2)
+
+```{note}
+ To examine your generated QIIME 2 visualizations, you can use QIIME 2 View (view.qiime2.org).
+```
+## Required databases
+In order to perform the functional annotation, we will need a couple of different reference databases. Below you will find instructions on how to download these databases using respective QIIME 2 actions.
+
+```{code-cell}
+qiime moshpit fetch-diamond-db \
+    --o-diamond-db eggnog-diamond-db.qza \
+    --verbose
+```
 
 ```{code-cell}
-print(2 + 2)
+qiime moshpit fetch-eggnog-db \
+    --o-eggnog-db eggnog-annot-db.qza \
+    --verbose
 ```
+Alternatively, you can use:
+- `qiime moshpit build-eggnog-diamond-db` to create a DIAMOND formatted reference database for thespecified taxon.
+- `qiime moshpit build-custom-diamond-db` to create a DIAMOND formatted reference database from a FASTA input file.
+
+## EggNOG search using diamond aligner
+Here our dereplicated MAGs are searched against the EggNOG database using the Diamond aligner to identify functional annotations.
 
-When your book is built, the contents of any `{code-cell}` blocks will be
-executed with your default Jupyter kernel, and their outputs will be displayed
-in-line with the rest of your content.
+```{code-cell}
+qiime moshpit eggnog-diamond-search \
+    --i-sequences dereplicated-mags-0.qza \
+    --i-diamond-db eggnog-diamond-db.qza \
+    --p-num-cpus 16 \
+    --p-db-in-memory \
+    --o-eggnog-hits eggnog-hits-dereplicated-mags-0.qza \
+    --o-table eggnog-hits-ft-dereplicated-mags-0.qza  \
+    --verbose
+```
+## Annotate orthologs against eggNOG database
+Orthologs in dereplicated MAGs are annotated against the EggNOG database, providing functional insights into the genes and gene products present in the MAGs.
 
-```{seealso}
-Jupyter Book uses [Jupytext](https://jupytext.readthedocs.io/en/latest/) to convert text-based files to notebooks, and can support [many other text-based notebook files](https://jupyterbook.org/file-types/jupytext.html).
+```{code-cell}
+qiime moshpit eggnog-annotate \
+    --i-eggnog-hits eggnog-hits-dereplicated-mags-0.qza \
+    --i-eggnog-db eggnog-annot-db.qza \
+    --p-num-cpus 16 \
+    --p-db-in-memory \
+    --o-ortholog-annotations eggnog-annotations-dereplicated-mags-0.qza \
+    --verbose
 ```
+## Extract annotations
+This method extract a specific annotation from the table generated by EggNOG and calculates its frequencies across all MAGs.
 
-## Create a notebook with MyST Markdown
+```{note}
+The `qiime moshpit extract-annotations` command allows us to extract specific types of functional annotations, such as **CAZymes**, **KEGG pathways**, **COG categories**, or other functional elements, and calculate their frequency across all dereplicated MAGs. 
 
-MyST Markdown notebooks are defined by two things:
+In this tutorial, we focus on demonstrating the extraction of **CAZymes**.
+```
+```{code-cell}
+qiime moshpit extract-annotations \
+    --i-ortholog-annotations eggnog-annotations-dereplicated-mags-0.qza \
+    --p-annotation caz \
+    --p-max-evalue 0.0001 \
+    --o-annotation-frequency caz-dereplicated-mags-0.qza \
+    --verbose
+```
 
-1. YAML metadata that is needed to understand if / how it should convert text files to notebooks (including information about the kernel needed).
-   See the YAML at the top of this page for example.
-2. The presence of `{code-cell}` directives, which will be executed with your book.
+## Multiply tables
+This steps simply calculates the dot product of the `eggnog-hits-ft-dereplicated-mags-0.qza` and `caz-dereplicated-mags-0.qza` feature tables. This is useful for combining the annotation data (e.g., **CAZymes**) with other features (e.g., MAG hits) to determine how specific functional annotations are distributed across MAGs. 
 
-That's all that is needed to get started!
+```{code-cell}
+qiime moshpit multiply-tables \
+    --i-table1 eggnog-hits-ft-dereplicated-mags-0.qza \
+    --i-table2 caz-dereplicated-mags-0.qza \
+    --o-result-table caz-ft-dereplicated-mags-0.qza \
+    --verbose
+```
 
-## Quickly add YAML metadata for MyST Notebooks
+## Let's have a look at our CAZymes functional diversity!
+We will start by calculating Bray-curtis beta diversity matrix.
 
-If you have a markdown file and you'd like to quickly add YAML metadata to it, so that Jupyter Book will treat it as a MyST Markdown Notebook, run the following command:
+```{code-cell}
+qiime diversity beta \
+  --i-table caz-ft-dereplicated-mags-0.qza \
+  --p-metric braycurtis \
+  --o-distance-matrix braycurtis-caz-dereplicated-mags.qza
+```
+
+Then, we will generate our PCoA from Bray-curtis matrix.
 
+```{code-cell}
+qiime diversity pcoa \
+  --i-distance-matrix braycurtis-caz-dereplicated-mags.qza  \
+  --o-pcoa braycurtis-caz-dereplicated-mags-pcoa.qza
 ```
-jupyter-book myst init path/to/markdownfile.md
+Visualization time!
+
+```{code-cell}
+qiime emperor plot \
+  --i-pcoa braycurtis-caz-dereplicated-mags-pcoa.qza \
+  --m-metadata-file metadata.qza \
+  --o-visualization braycurtis-caz-dereplicated-mags-pcoa.qzv
 ```
+```{tip}
+Try this visualization tip! We recommend visualizing the `braycurtis-caz-dereplicated-mags-pcoa.qzv` output in QIIME 2 View (view.qiime2.org). Once your visualization displays, click on the `Color` tab in the top right and select `scatter:seed` on the first tab to color your samples by seed. Then click on the `Animations` tab and choose `timepoint` as gradient and `seed` as trajectory. Now, press play!
+```