Skip to content

Commit

Permalink
ENH: integration of BUSCO as visualizer (#60)
Browse files Browse the repository at this point in the history
* added assets from q2_checkm

* copypasted and commented plot fucntionality from checkm

* started addapting visualization action to busco

* Generate busco graphs.

* First draft of BUSCO plugin. Untested.

* black and flake8 fomratting, precommit hook

* Succesfull cache of BUSCO. Still untested.

* Made BUSCO render HTML and resized plot.

* move auxiliary functions to utils

* include seaborn in package build?

* begining test suit for busco

* Moves tests to busco folder. Ignore .vscode

* setup.py: exchanged checkm data for ci tests for buscos

* added busco to list of requires packages for conda installation

* correction to BUSCO parameters in plugin_setup.py

* Same as last commit

* typo in busco/utils.py

* started developing the test suite for busco

* Updates to visualization. Tooltips and gapps adressed.

* Update to parameter valid ranges.

* Update to plot description in assests html

* paths are absolute, nned for so.path.split

* range of BUSCO argument

* test_process_common_input_params new implementation

* busco/utils.py: Amends to docstings.

* random amends

* added data for busco tests. all_run_summeries

* compleated the test suite for busco

* indentation formatting change in ci.yml

* plugin_setup.py. added explanation to parameter

* set absolute paths for test data in busco tests

* Headre for busco/__init__.py

* Revert "black and flake8 fomratting, precommit hook"

This reverts commit 79b6d69.

* reformat busoc related files to flake8

* adding q2templates to meta.yaml

* added altair to mate.yaml

* new way of getting the path to assets

* changed the copytree function, hopeing that this one works

* added assests to setup.py

* flake8 error. trailing white space removed

* First working draft for secondary plot.

* irrelevant changes to notebook

* Second plot added to busco. working imoplementation.

* Update q2_moshpit/busco/utils.py

Typos in html plot description.

Co-authored-by: Michal Ziemski <[email protected]>

* Update q2_moshpit/busco/utils.py

Co-authored-by: Michal Ziemski <[email protected]>

* Update q2_moshpit/busco/utils.py

Typo in comment.

Co-authored-by: Michal Ziemski <[email protected]>

* Revert "indentation formatting change in ci.yml"

This reverts commit f81ada3.

* ignore notebooks

* indentation on parameter descriptions

* change to the notebook that i ned to save

* moved test from busco to test_utils

* Bunch of work on reformatting the tests. Some parallel work on the source code.

* seaborn -> matplotlib + busco arg parsing bug

* Render debugging statement in busco tests

* trailing white spaces

* fixed bug on test_process_common_inputs_mix_with_falsy_values

* Include manifest files in order for busco tests to work

* updated the dictionary to reflect changes in the html render

* Update integration test s.t viz output is possible.

* Update html base to work with base not tabbed

* Reduce the height of each bar in plot from 18 to 9

* Remove notebooks

* Spell out the name fraction in the bottom axis

* Remove commented out code in base.html.

* fix test_draw_busco_plots_for_render

* Change static func for setUpClass method in busco tests.

* assert_frame_equal in busco test_collect_summaries_and_save

* removed print statement from busco draw_n_busco_plots

* eliminate choices from busco_params "mode"

* parse df columns function in busco utils + test

* check zipfiles with is_zipfile function

* get rid of mock_run_busco, instead as test data

* Added _parse_busco_params and re ordered the code.

* Regret in last commit

* busco tests replace for self.get_data_path(")

* update docstring for _parse_busco_params in utils

* change command name to evaluate-busco

* assert calls of patches in busco tests

* trailing white spaces

* Show full uuid in  downloadable plots

* Additional busco parsing test.

* Added making columns to parse function and updated test.

* Add package data for mock run busco test

* Add mock.ANY to patch calls.

* ignore notebook files rather than the notebooks dir

* fixing spaces in param descriptions

* Put integration test in separate test file.

---------

Co-authored-by: Michal Ziemski <[email protected]>
  • Loading branch information
Sann5 and misialq authored Oct 9, 2023
1 parent 2b11966 commit 70afdfa
Show file tree
Hide file tree
Showing 27 changed files with 2,350 additions and 201 deletions.
7 changes: 6 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -25,7 +25,6 @@ share/python-wheels/
*.egg-info/
.installed.cfg
*.egg
MANIFEST

# PyInstaller
# Usually these files are written by a python script from a template
Expand Down Expand Up @@ -133,3 +132,9 @@ dmypy.json

# Mac OS
.DS_Store

# VS code settings
.vscode

# Ignore notebooks
**/*.ipynb
3 changes: 3 additions & 0 deletions ci/recipe/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -23,10 +23,13 @@ requirements:
- samtools
- qiime2 {{ qiime2_epoch }}.*
- q2-types-genomics {{ qiime2_epoch }}.*
- q2templates {{ qiime2_epoch }}.*
- eggnog-mapper >=2.1.10
- diamond
- tqdm
- xmltodict
- altair
- busco >=5.0.0

test:
requires:
Expand Down
3 changes: 2 additions & 1 deletion q2_moshpit/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .kraken2 import bracken, classification, database
from .metabat2 import metabat2
from . import eggnog
from . import busco


from ._version import get_versions
Expand All @@ -18,5 +19,5 @@

__all__ = [
'metabat2', 'bracken', 'classification', 'database',
'dereplicate_mags', 'eggnog'
'dereplicate_mags', 'eggnog', 'busco',
]
14 changes: 10 additions & 4 deletions q2_moshpit/_utils.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,9 +52,15 @@ def _process_common_input_params(processing_func, params: dict) -> List[str]:
"""
processed_args = []
for arg_key, arg_val in params.items():
# bool is a subclass of int so to only reject ints we need to do:
if type(arg_val) != int and not arg_val: # noqa: E721
continue
else:
# This if condition excludes arguments which are falsy
# (False, None, "", []), except for integers and floats.
if ( # noqa: E721
type(arg_val) == int or
type(arg_val) == float or
arg_val
):
processed_args.extend(processing_func(arg_key, arg_val))
else:
continue

return processed_args
20 changes: 20 additions & 0 deletions q2_moshpit/assets/busco/css/styles.css
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
#plot {
margin-top: 50px;
}

.vega-bind {
margin-bottom: 15px;
}

.vega-bind-name {
margin-right: 10px;
white-space: nowrap
}

.header-inline {
display: inline-block;
float: left;
margin-right: 10px;
margin-top: 8px;
margin-bottom: 8px;
}
138 changes: 138 additions & 0 deletions q2_moshpit/assets/busco/index.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
{% extends "base.html" %} {% block head %}
<title>Embedding Vega-Lite</title>
<script src="js/bootstrapMagic.js" type="text/javascript"></script>
<link href="css/styles.css" rel="stylesheet" />
<script type="text/javascript">
// temporary hack to make it look good with Bootstrap 5
removeBS3refs();
</script>
<script
src="https://cdn.jsdelivr.net/npm//vega@5"
type="text/javascript"
></script>
<script
src="https://cdn.jsdelivr.net/npm//[email protected]"
type="text/javascript"
></script>
<script
src="https://cdn.jsdelivr.net/npm//vega-embed@6"
type="text/javascript"
></script>
<link
crossorigin="anonymous"
href="https://cdn.jsdelivr.net/npm/[email protected]/dist/css/bootstrap.min.css"
integrity="sha256-YvdLHPgkqJ8DVUxjjnGVlMMJtNimJ6dYkowFFvp4kKs="
rel="stylesheet"
/>
{% endblock %} {% block content %}
<script
crossorigin="anonymous"
integrity="sha256-9SEPo+fwJFpMUet/KACSwO+Z/dKMReF9q4zFhU/fT9M="
src="https://cdn.jsdelivr.net/npm/[email protected]/dist/js/bootstrap.bundle.min.js"
></script>

<div class="row row-cols-1 row-cols-md-2 g-4">
<div class="col-lg-12">
<div class="card mt-3 h-100">
<h5 class="card-header">Plot description</h5>
<div class="card-body">
<p>
The left plot shows the results generated by BUSCO for <b>all bins</b> and
<b> samples</b>. "BUSCO attempts to provide a quantitative assessment
of the completeness in terms of the expected gene content of a genome
assembly, transcriptome, or annotated gene set. The results are
simplified into categories of Complete and single-copy, Complete and
duplicated, Fragmented, or Missing BUSCOs. BUSCO completeness results
make sense only in the context of the biology of your organism". Visit the
<a
href="https://busco.ezlab.org/busco_userguide.html#interpreting-the-results"
>
BUSCO User Guide </a
>
for more information.
</p>
<p>
Hoover over the graph to obtain information about the lineage dataset
used for each bin, and the number of genes in each BUSCO category.
</p>
<p>
The right barplot shows assembly statistics calculated for each bin using BBTools.
Specifically, it displays the statistics computed by the <b>stats.sh</b> procedure from BBMap.
View the
<a
href="https://github.com/BioInfoTools/BBMap/blob/master/sh/stats.sh"
>
source code and documentation
</a>
of stats.sh for more information.
</p>
<p>
Choose the assembly statistic that you wish to display from the drop-down manu below the graphs.
Hoover over the graph to show the numerical values that each bar represents.
</p>

<div style="align-items: center; display: flex">
<span class="header-inline">Downloads</span>
<div class="'col-lg-4">
<div
aria-label="Basic outlined example"
class="btn-group"
role="group"
>
<a
class="btn btn-outline-secondary"
href="all_batch_summeries.csv"
>BUSCO batch summary for all samples (csv)</a
>
<a class="btn btn-outline-secondary" href="BUSCO_plots.zip"
>BUSCO plots for all samples (zip)</a
>
</div>
</div>
</div>
</div>
</div>
</div>
</div>

<div class="row">
{% if vega_plots_overview is defined %}
<div class="col-lg-6">
<div id="plot"></div>
<div id="plot-controls"></div>
</div>
{% else %}
<p>Unable to generate the completeness plot</p>
{% endif %}
</div>

{% if vega_plots_overview is defined %}
<script id="spec" type="application/json">
{{
vega_plots_overview
}}
</script>

<script type="text/javascript">
$(document).ready(function () {

const spec = JSON.parse(document.getElementById("spec").innerHTML);

vegaEmbed("#plot", spec)
.then(function (result) {
result.view.logLevel(vega.Warn);
window.v = result.view;

// move the sliders to the right
const controls = document.getElementsByClassName("vega-bindings");
document.getElementById("plot-controls").appendChild(controls[0]);
})
.catch(function (error) {
// From 'js-error-handler.html'
handleErrors([error], $("#plot"));
});
});
</script>

{% endif %} {% endblock %} {% block footer %} {% set loading_selector =
'#loading' %} {% include 'js-error-handler.html' %} {% endblock %}
31 changes: 31 additions & 0 deletions q2_moshpit/assets/busco/js/bootstrapMagic.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
function removeBS3refs() {
// remove Bootstrap 3 CSS/JS reference
let head = document.getElementsByTagName("head")[0]
let links = head.getElementsByTagName("link")
for (let i = 0; i < links.length; i++) {
if (links[i].href.includes("q2templateassets/css/bootstrap")) {
links[i].remove()
}
}
let scripts = head.getElementsByTagName("script")
for (let i = 0; i < scripts.length; i++) {
if (scripts[i].src.includes("q2templateassets/js/bootstrap")) {
scripts[i].remove()
}
}
}

function adjustTagsToBS3() {
// adjust tags to BS3
let tabs = document.getElementsByClassName("nav nav-tabs")[0].children
for (let i = 0; i < tabs.length; i++) {
let isActive = tabs[i].className.includes("active")
tabs[i].className = "nav-item"
let link = tabs[i].getElementsByTagName("a")[0]
if (isActive) {
link.classList.add("active")
}
link.classList.add("nav-link")

}
}
11 changes: 11 additions & 0 deletions q2_moshpit/busco/__init__.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2022-2023, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------

from .busco import evaluate_busco

__all__ = ["evaluate_busco"]
121 changes: 121 additions & 0 deletions q2_moshpit/busco/busco.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
# ----------------------------------------------------------------------------
# Copyright (c) 2023, QIIME 2 development team.
#
# Distributed under the terms of the Modified BSD License.
#
# The full license is in the file LICENSE, distributed with this software.
# ----------------------------------------------------------------------------


import os
import tempfile
import q2_moshpit.busco.utils
from q2_moshpit.busco.utils import (
_parse_busco_params,
_render_html,
)
from q2_moshpit._utils import _process_common_input_params
from typing import List
from q2_types_genomics.per_sample_data._format import MultiMAGSequencesDirFmt


def evaluate_busco(
output_dir: str,
bins: MultiMAGSequencesDirFmt,
mode: str = "genome",
lineage_dataset: str = None,
augustus: bool = False,
augustus_parameters: str = None,
augustus_species: str = None,
auto_lineage: bool = False,
auto_lineage_euk: bool = False,
auto_lineage_prok: bool = False,
cpu: int = 1,
config: str = None,
contig_break: int = 10,
datasets_version: str = None,
download: List[str] = None,
download_base_url: str = None,
download_path: str = None,
evalue: float = 1e-03,
force: bool = False,
limit: int = 3,
help: bool = False,
list_datasets: bool = False,
long: bool = False,
metaeuk_parameters: str = None,
metaeuk_rerun_parameters: str = None,
miniprot: bool = False,
offline: bool = False,
quiet: bool = False,
restart: bool = False,
scaffold_composition: bool = False,
tar: bool = False,
update_data: bool = False,
version: bool = False,
) -> None:
"""
qiime2 visualization for the BUSCO assessment tool
<https://busco.ezlab.org/>.
Args:
see all possible inputs by running `qiime moshpit plot_busco`
Output:
plots.zip: zip file containing all of the busco plots
busco_output: all busco output files
qiime_html: html for rendering the output plots
"""

# Create dictionary with local variables
# (kwargs passed to the function or their defaults) excluding
# "output_dir" and "bins"
kwargs = {
k: v for k, v in locals().items() if k not in ["output_dir", "bins"]
}

# Filter out all kwargs that are None, False or 0.0
common_args = _process_common_input_params(
processing_func=_parse_busco_params, params=kwargs
)

# Creates output directory with path 'tmp'
with tempfile.TemporaryDirectory() as tmp:
# Run busco for every sample. Returns dictionary to report files.
# Result NOT included in final output
busco_results_dir = os.path.join(tmp, "busco_output")
path_to_run_summaries = q2_moshpit.busco.utils._run_busco(
output_dir=busco_results_dir,
mags=bins,
params=common_args,
)

# Collect result for each sample and save to file.
# Result included in final output (file for download)
all_summaries_path = os.path.join(
output_dir, "all_batch_summaries.csv"
)
all_summaries_df = q2_moshpit.busco.utils._collect_summaries_and_save(
all_summaries_path=all_summaries_path,
path_to_run_summaries=path_to_run_summaries,
)

# Draw BUSCO plots for all samples
# Result NOT included in final output
plots_dir = os.path.join(tmp, "plots")
paths_to_plots = q2_moshpit.busco.utils._draw_busco_plots(
path_to_run_summaries=path_to_run_summaries,
plots_dir=plots_dir
)

# Zip graphs for user download
# Result included in final output (file for download)
zip_name = os.path.join(output_dir, "busco_plots.zip")
q2_moshpit.busco.utils._zip_busco_plots(
paths_to_plots=paths_to_plots,
zip_path=zip_name
)

# Render qiime html report
# Result included in final output
_render_html(output_dir, all_summaries_df)
Loading

0 comments on commit 70afdfa

Please sign in to comment.