Skip to content

Commit

Permalink
Add interoperability examples
Browse files Browse the repository at this point in the history
  • Loading branch information
misialq committed Nov 29, 2024
1 parent c9b3bae commit 71da24a
Show file tree
Hide file tree
Showing 5 changed files with 167 additions and 0 deletions.
6 changes: 6 additions & 0 deletions moshpit_docs/_toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -29,3 +29,9 @@ chapters:
sections:
- file: chapters/04_functional_annotation/mags
title: Dereplicated MAGs
- file: chapters/05_interoperability/intro
sections:
- file: chapters/05_interoperability/import
title: Importing data
- file: chapters/05_interoperability/export
title: Exporting data
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ kernelspec:
language: python
name: python3
---
(mag-recovery)=
# Recovery of MAGs
In this part of the tutorial we will go thorugh the steps required to recover metagenome-assembled genomes (MAGs) from
metagenomic data. The workflow is divided into several steps, from contig assembly to binning and quality control.
Expand Down
53 changes: 53 additions & 0 deletions moshpit_docs/chapters/05_interoperability/export.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,53 @@
---
jupytext:
formats: md:myst
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.5
kernelspec:
display_name: Python 3
language: python
name: python3
---
(data-export)=
# Exporting data and connecting with other tools
QIIME 2 offers various ways of visualizing and processing your data further, but sometimes you may want to use other tools
that are not (yet) available through QIIME 2. This is, of course, possible and very easy to do: you can export your data
from any QIIME 2 artifact and use it with any of your other favourite tools, as long as the underlying format is compatible.
The formats that QIIME 2 supports are common and should be readable by most bioinformatics tools - most of the time, the
artifacts will contain data in the original format that the underlying tool uses. Below are some examples of how you can
export data from QIIME 2 and connect it with other tools.

```{warning}
QIIME 2 does not yet support exporting data from the cache. This means that you will need to manually copy the data from the
cache directory to a location where you can access it with other tools. In our examples, the cache directory is located directly
in the working directory and that is where we will copy the data from. Keep in mind that you should never temper with the files
in the cache directory directly, as this may lead to broken artifacts and failed analyses.
```

## Visualizing Kraken 2 reports with Pavian
If you have used Kraken 2 to [classify your reads](kraken-reads), you can export the resulting reports from the corresponding
QIIME 2 artifact and visualize them with [Pavian](https://github.com/fbreitwieser/pavian) which will allow you to explore the
taxonomic composition of your samples in an interactive way. To export the Kraken 2 reports, you can use the following commands:
```bash
UUID=$(cat ./cache/keys/kraken_reports_reads | grep 'data' | awk '{print $2}')
mkdir exported_reports
cp -r ./cache/data/$UUID/data/* exported_reports/
```
This will find the UUID of the reports artifact, use it to locate the data within the cache directory, create a directory
for the exported data and copy the files from the cache into it. You can then use those files (within the `exported_reports`
directory) with Pavian. To give it a quick try, navigate to [Pavian's demo site](https://fbreitwieser.shinyapps.io/pavian/)
and upload the exported files.

## Microbial pangenomics with Anvi'o
Another suite of tools you may be familiar with is the [Anvi'o](http://anvio.org/) platform. One of the workflows that Anvi'o
provides is the microbial pangenomics analysis, which can be used to explore the gene clusters within your samples. You
could export the MAGs obtained from the [binning step](mag-recovery) and use them as input to the `anvi-pan-genome` workflow, as
described [here](https://merenlab.org/2016/11/08/pangenomics-v2/). To export the MAGs, you can use the following command:
```bash
UUID=$(cat ./cache/keys/mags | grep 'data' | awk '{print $2}')
mkdir exported_mags
cp -r ./cache/data/$UUID/data/* exported_mags/
```
89 changes: 89 additions & 0 deletions moshpit_docs/chapters/05_interoperability/import.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,89 @@
---
jupytext:
formats: md:myst
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.5
kernelspec:
display_name: Python 3
language: python
name: python3
---
(data-import)=
# Importing data from other tools
The MOSHPIT pipeline allows you to start working directly with the NGS reads, which you can take through various analysis,
like contig assembly, binning, and annotation. However, if you have already performed some of these steps outside of QIIME 2,
you can import the results into an appropriate QIIME 2 artifact and continue from there. Below you can see some examples and
use cases where this may be relevant.

## Working with exisiting contigs
In case you already have contigs assembled from your metagenomic data, you can import them into a `SampleData[Contigs]`
artifact. This should not differ much from the typical import process (see [here](https://docs.qiime2.org/2024.10/tutorials/importing/)
for more details on importing data), but the command may look like:
```bash
qiime tools cache-import \
--cache ./cache \
--key contigs \
--type "SampleData[Contigs]" \
--input-path ./<directory with contig FASTA files>
```
Some actions in the MOSHPIT pipeline assume that contig IDs are unique across your entire sample set. If this is not the case,
you may use the `qiime assembly rename-contigs` action to rename contigs with unique identifiers:
```bash
qiime assembly rename-contigs \
--i-contigs ./cache:contigs \
--p-uuid-type shortuuid \
--o-renamed-contigs ./cache:contigs_renamed
```
From here, you should be able to continue with the rest of the MOSHPIT pipeline as described in our tutorials.

## Working with existing MAGs
You may also be interested in continuing your analysis with MAGs that you have already recovered using other tools.
In this case, you can import the MAGs into a `SampleData[MAGs]` (non-dereplicated) or `FeatureData[MAG]` (dereplicated)
artifact. Before you do that, you will need to rename each MAG's FASTA file using the [UUID4](https://en.wikipedia.org/wiki/Universally_unique_identifier#Version_4_(random))
format: this is required to ensure that MAG IDs are unique across your entire sample set. Here is a sample Python script
which could be used for that purpose:
```python
import os
from uuid import uuid4
path = 'path/to/your/mag/directory/'

for file in os.listdir(path):
os.rename(os.path.join(path, file), os.path.join(path, f'{uuid4()}.fa')))
```
Once you have renamed the MAGs, you can import them into a QIIME 2 artifact:
```bash
qiime tools cache-import \
--cache ./cache \
--key mags \
--type "SampleData[MAGs]" \
--input-path ./<directory with MAG FASTA files per sample>
```
for MAGs-per-sample, or:
```bash
qiime tools cache-import \
--cache ./cache \
--key mags \
--type "FeatureData[MAG]" \
--input-path ./<directory with MAG FASTA files>
```
for dereplicated MAGs. From here, you should be able to continue with the rest of the MOSHPIT pipeline as described in our tutorials.

## Importing other data
If you have other data that you would like to import into QIIME 2, you can use the `qiime tools cache-import` command - no
additional steps should be required. For example, you can import a set of Kraken 2 reports into a `SampleData[Kraken2Report % Properties('reads')]`
like this:
```bash
qiime tools cache-import \
--cache ./cache \
--key kraken2_reports_reads \
--type "SampleData[Kraken2Report % reads]" \
--input-path ./<directory with Kraken 2 reports>
```

```{note}
Remember: you can import any existing data into QIIME 2 artifacts, as long as it matches the format required by the respective
QIIME 2 semantic type.
```
18 changes: 18 additions & 0 deletions moshpit_docs/chapters/05_interoperability/intro.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
---
jupytext:
formats: md:myst
text_representation:
extension: .md
format_name: myst
format_version: 0.13
jupytext_version: 1.11.5
kernelspec:
display_name: Python 3
language: python
name: python3
---
(interoperability)=
# Interoperability with other tools
While most of the typical steps in a metagenomic analysis can be performed within QIIME 2, there are cases where you
might want to use other tools to perform certain tasks. In this chapter, we will show you how you can get some data in
and out of the QIIME 2 artifacts to continue your analysis workflow elsewhere.

0 comments on commit 71da24a

Please sign in to comment.