diff --git a/moshpit_docs/_toc.yml b/moshpit_docs/_toc.yml index 7226f46..0cd6c28 100644 --- a/moshpit_docs/_toc.yml +++ b/moshpit_docs/_toc.yml @@ -29,3 +29,9 @@ chapters: sections: - file: chapters/04_functional_annotation/mags title: Dereplicated MAGs +- file: chapters/05_interoperability/intro + sections: + - file: chapters/05_interoperability/import + title: Importing data + - file: chapters/05_interoperability/export + title: Exporting data diff --git a/moshpit_docs/chapters/02_mag_reconstruction/reconstruction.md b/moshpit_docs/chapters/02_mag_reconstruction/reconstruction.md index 8670ef9..b087c0c 100644 --- a/moshpit_docs/chapters/02_mag_reconstruction/reconstruction.md +++ b/moshpit_docs/chapters/02_mag_reconstruction/reconstruction.md @@ -11,6 +11,7 @@ kernelspec: language: python name: python3 --- +(mag-recovery)= # Recovery of MAGs In this part of the tutorial we will go thorugh the steps required to recover metagenome-assembled genomes (MAGs) from metagenomic data. The workflow is divided into several steps, from contig assembly to binning and quality control. diff --git a/moshpit_docs/chapters/05_interoperability/export.md b/moshpit_docs/chapters/05_interoperability/export.md new file mode 100644 index 0000000..1348713 --- /dev/null +++ b/moshpit_docs/chapters/05_interoperability/export.md @@ -0,0 +1,53 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- +(data-export)= +# Exporting data and connecting with other tools +QIIME 2 offers various ways of visualizing and processing your data further, but sometimes you may want to use other tools +that are not (yet) available through QIIME 2. This is, of course, possible and very easy to do: you can export your data +from any QIIME 2 artifact and use it with any of your other favourite tools, as long as the underlying format is compatible. +The formats that QIIME 2 supports are common and should be readable by most bioinformatics tools - most of the time, the +artifacts will contain data in the original format that the underlying tool uses. Below are some examples of how you can +export data from QIIME 2 and connect it with other tools. + +```{warning} +QIIME 2 does not yet support exporting data from the cache. This means that you will need to manually copy the data from the +cache directory to a location where you can access it with other tools. In our examples, the cache directory is located directly +in the working directory and that is where we will copy the data from. Keep in mind that you should never temper with the files +in the cache directory directly, as this may lead to broken artifacts and failed analyses. +``` + +## Visualizing Kraken 2 reports with Pavian +If you have used Kraken 2 to [classify your reads](kraken-reads), you can export the resulting reports from the corresponding +QIIME 2 artifact and visualize them with [Pavian](https://github.com/fbreitwieser/pavian) which will allow you to explore the +taxonomic composition of your samples in an interactive way. To export the Kraken 2 reports, you can use the following commands: +```bash +UUID=$(cat ./cache/keys/kraken_reports_reads | grep 'data' | awk '{print $2}') +mkdir exported_reports +cp -r ./cache/data/$UUID/data/* exported_reports/ +``` +This will find the UUID of the reports artifact, use it to locate the data within the cache directory, create a directory +for the exported data and copy the files from the cache into it. You can then use those files (within the `exported_reports` +directory) with Pavian. To give it a quick try, navigate to [Pavian's demo site](https://fbreitwieser.shinyapps.io/pavian/) +and upload the exported files. + +## Microbial pangenomics with Anvi'o +Another suite of tools you may be familiar with is the [Anvi'o](http://anvio.org/) platform. One of the workflows that Anvi'o +provides is the microbial pangenomics analysis, which can be used to explore the gene clusters within your samples. You +could export the MAGs obtained from the [binning step](mag-recovery) and use them as input to the `anvi-pan-genome` workflow, as +described [here](https://merenlab.org/2016/11/08/pangenomics-v2/). To export the MAGs, you can use the following command: +```bash +UUID=$(cat ./cache/keys/mags | grep 'data' | awk '{print $2}') +mkdir exported_mags +cp -r ./cache/data/$UUID/data/* exported_mags/ +``` diff --git a/moshpit_docs/chapters/05_interoperability/import.md b/moshpit_docs/chapters/05_interoperability/import.md new file mode 100644 index 0000000..0461c5f --- /dev/null +++ b/moshpit_docs/chapters/05_interoperability/import.md @@ -0,0 +1,89 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- +(data-import)= +# Importing data from other tools +The MOSHPIT pipeline allows you to start working directly with the NGS reads, which you can take through various analysis, +like contig assembly, binning, and annotation. However, if you have already performed some of these steps outside of QIIME 2, +you can import the results into an appropriate QIIME 2 artifact and continue from there. Below you can see some examples and +use cases where this may be relevant. + +## Working with exisiting contigs +In case you already have contigs assembled from your metagenomic data, you can import them into a `SampleData[Contigs]` +artifact. This should not differ much from the typical import process (see [here](https://docs.qiime2.org/2024.10/tutorials/importing/) +for more details on importing data), but the command may look like: +```bash +qiime tools cache-import \ + --cache ./cache \ + --key contigs \ + --type "SampleData[Contigs]" \ + --input-path ./ +``` +Some actions in the MOSHPIT pipeline assume that contig IDs are unique across your entire sample set. If this is not the case, +you may use the `qiime assembly rename-contigs` action to rename contigs with unique identifiers: +```bash +qiime assembly rename-contigs \ + --i-contigs ./cache:contigs \ + --p-uuid-type shortuuid \ + --o-renamed-contigs ./cache:contigs_renamed +``` +From here, you should be able to continue with the rest of the MOSHPIT pipeline as described in our tutorials. + +## Working with existing MAGs +You may also be interested in continuing your analysis with MAGs that you have already recovered using other tools. +In this case, you can import the MAGs into a `SampleData[MAGs]` (non-dereplicated) or `FeatureData[MAG]` (dereplicated) +artifact. Before you do that, you will need to rename each MAG's FASTA file using the [UUID4](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random)) +format: this is required to ensure that MAG IDs are unique across your entire sample set. Here is a sample Python script +which could be used for that purpose: +```python +import os +from uuid import uuid4 +path = 'path/to/your/mag/directory/' + +for file in os.listdir(path): + os.rename(os.path.join(path, file), os.path.join(path, f'{uuid4()}.fa'))) +``` +Once you have renamed the MAGs, you can import them into a QIIME 2 artifact: +```bash +qiime tools cache-import \ + --cache ./cache \ + --key mags \ + --type "SampleData[MAGs]" \ + --input-path ./ +``` +for MAGs-per-sample, or: +```bash +qiime tools cache-import \ + --cache ./cache \ + --key mags \ + --type "FeatureData[MAG]" \ + --input-path ./ +``` +for dereplicated MAGs. From here, you should be able to continue with the rest of the MOSHPIT pipeline as described in our tutorials. + +## Importing other data +If you have other data that you would like to import into QIIME 2, you can use the `qiime tools cache-import` command - no +additional steps should be required. For example, you can import a set of Kraken 2 reports into a `SampleData[Kraken2Report % Properties('reads')]` +like this: +```bash +qiime tools cache-import \ + --cache ./cache \ + --key kraken2_reports_reads \ + --type "SampleData[Kraken2Report % reads]" \ + --input-path ./ +``` + +```{note} +Remember: you can import any existing data into QIIME 2 artifacts, as long as it matches the format required by the respective +QIIME 2 semantic type. +``` diff --git a/moshpit_docs/chapters/05_interoperability/intro.md b/moshpit_docs/chapters/05_interoperability/intro.md new file mode 100644 index 0000000..39cf3d0 --- /dev/null +++ b/moshpit_docs/chapters/05_interoperability/intro.md @@ -0,0 +1,18 @@ +--- +jupytext: + formats: md:myst + text_representation: + extension: .md + format_name: myst + format_version: 0.13 + jupytext_version: 1.11.5 +kernelspec: + display_name: Python 3 + language: python + name: python3 +--- +(interoperability)= +# Interoperability with other tools +While most of the typical steps in a metagenomic analysis can be performed within QIIME 2, there are cases where you +might want to use other tools to perform certain tasks. In this chapter, we will show you how you can get some data in +and out of the QIIME 2 artifacts to continue your analysis workflow elsewhere.