-
Notifications
You must be signed in to change notification settings - Fork 72
Home
clinker is a tool for visualising gene cluster similarity, (hopefully) replacing the need for you to manually create them from scratch in PowerPoint or Illustrator. The following guide should get you up and running with using the program and generating your own visualisations.
If you find clinker useful, please cite the pre-print:
Gilchrist, C.L.M., Chooi, Y.-H., 2020. clinker & clustermap.js: Automatic generation of gene cluster comparison figures. bioRxiv 2020.11.08.370650. https://doi.org/10.1101/2020.11.08.370650
clinker
requires Python >3.5. If you do not have Python installed, you can go to https://www.python.org/ and download the appropriate installer for your operating system. On Windows, make sure you enable the option to put Python on your system PATH (usually a checkbox in the final page of the installer) so that you can access Python packages like clinker
directly from the command line.
(Optional) To avoid conflicts with other installed Python packages, it is recommended to install clinker
within a virtual environment. To do this, first create a new virtual environment:
pip3 --method virtualenv my_env
Then activate it:
source my_env/bin/activate
Finally, install clinker
:
pip install clinker
This will install clinker
as well as all of its dependencies. If you have both Python 2 and 3 installed, you might have to specify pip3 instead of just pip, e.g.:
pip3 install clinker
clinker
depends on the following Python packages to work:
- BioPython (>=1.75): used when performing pairwise sequence alignments.
clinker
requires at least version 1.75, due to substitution matrices used in the sequence alignments being stored in a different location within the BioPython package. - SciPy (>=1.3.3) and NumPy (>=1.13.3): used when computing similarity scores of clusters and performing hierarchical clustering to determine the optimal display order. (Earlier versions should work fine, but are untested).
The most up-to-date versions of these packages are installed automatically when you install clinker
. If you have older versions of these packages, they can be updated by providing the --force-reinstall
argument to pip. For example:
pip3 install --force-reinstall clinker
clinker
takes GenBank files as input. These will typically just be a single locus (i.e. small region extracted from a larger genomic scaffold), however multi-record GenBank files are also supported, allowing you to visualise gene clusters that may be split over multiple loci (e.g. due to fragmented genome assembly).
The clinker
pipeline can be run as simply as:
clinker file1.gbk file2.gbk file3.gbk -p
This will read in your GenBank files (file1.gbk, file2.gbk, file3.gbk), align them, cluster them to determine display order, and generate the full clustermap.js visualisation in your web browser.
By default, the visualisation is dynamically served and you will have to interrupt clinker
(using Ctrl + C) to stop it. A static HTML document containing the visualisation can be generated instead by providing a file name to the -p/--plot
argument:
clinker file1.gbk file2.gbk file3.gbk -p my_plot.html
Once the visualisation is loaded in the web browser, you can play around with the settings in the sidebar to change its appearance and layout. Once you're happy with the figure, you can save an SVG image by clicking the save button.
A clinker
session can be saved/reloaded using the -s/--session
argument to avoid having to recompute gene cluster alignments:
clinker file1.gbk file2.gbk file3.gbk -s alignments.json
This is particularly useful if you want to add more clusters to an alignment. If a session file is loaded alongside new GenBank files, clinker
will add them to the session, only performing the necessary alignments with the new files. The session file is then re-written with the new alignments. For example:
clinker -s alignments.json file4.gbk file5.gbk
clinker
can be given either direct paths to input files or folders containing input files. For example, we could move our files 1–3 from the above examples into a folder and load them alongside 4–5 like so:
clinker input_folder/ file4.gbk file5.gbk -p
When given a folder, clinker
will automatically look for all files within that folder; if folders are found inside the given folder, clinker
will also look inside of those.
Another feature is the ability to use the order of input files instead of performing hierarchical clustering. This can be useful in situations where you would like to generate clinker
visualisations matching the order of a matrix or phylogenetic tree without having to manually rearrange them within the visualisation. This can be done using the -ufo/--use_file_order
flag:
clinker file3.gbk file1.gbk file2.gbk -ufo -p
If you have a long list of files, it is easier to create a text file containing the paths to each file in your desired order. For example, given a file containing:
file3.gbk
file2.gbk
file1.gbk
We could then use a little Bash scripting to load it:
clinker $(cat files.txt) -ufo -p
clinker
currently provides two options to change how alignments are performed. The first, -na/--no_align
, will skip aligning altogether, reading in your GenBank files and generating the visualisation directly. Since no alignments are performed, clinker
will not be able to colour the genes in your figure. However, this can be useful if to-scale cluster maps are all that is required.
The second, -i/--identity
, is a threshold for sequence identity that must be met for a gene-gene alignment to be saved. By default, this is set to 0.3 (30%).
By default, clinker
reports all alignment summaries to the terminal in human-readable format. However, clinker
can also easily generate delimited files. For example, to generate a comma-separated file (CSV) that can be imported into spreadsheet software, we can use the -dl/--delimiter
argument:
clinker *.gbk -o alignments.csv -dl “,”
Note that the -o/--output
argument can be used to save clinker
output directly to a file. If the -f/--force
flag is given, clinker
will overwrite pre-existing output files.
clinker
provides several other options to mutate this output: alignment column headers can be hidden using the flag -hl/--hide_link_headers
; cluster names hidden using the flag -ha/--hide_aln_headers
; and number of decimal places for score values set using -dc/--decimals
argument.
The clustermap.js
visualisation used by clinker
is designed to be very easy to customise. An overview of usage, as well as all changeable options, is provided in the visualisation sidebar. Briefly:
- Clusters can be rearranged vertically by dragging cluster names
- Loci can be moved or resized by hovering over them and dragging the box
- The visualisation can be anchored around a specific gene by clicking on it
- Clusters and similarity groups can be renamed by clicking on their text
- Similarity group colours can be changed by clicking on the circles in the legend
- Groups can be removed by right-clicking their label in the legend
- The scale bar can be resized by clicking its text and entering a new value (bp)
clinker
provides numerous settings that can be changed to alter the layout and appearance of the visualisation. These are all listed inside the sidebar; any changes you make to these options will directly update the visualisation.