Skip to content

kmer based feature extraction tool for bioinformatics, metagenomics, AI/ML and more

License

Notifications You must be signed in to change notification settings

anuradhawick/kmertools

Repository files navigation

kmertools: DNA Vectorisation Tool

License: GPL v3 Cargo tests Clippy check install with bioconda Conda - Version Conda Downloads PyPI Downloads codecov PyPI - Version

$$\   $$\                                   $$$$$$$$\                     $$\           
$$ | $$  |                                  \__$$  __|                    $$ |          
$$ |$$  / $$$$$$\$$$$\   $$$$$$\   $$$$$$\     $$ |    $$$$$$\   $$$$$$\  $$ | $$$$$$$\ 
$$$$$  /  $$  _$$  _$$\ $$  __$$\ $$  __$$\    $$ |   $$  __$$\ $$  __$$\ $$ |$$  _____|
$$  $$<   $$ / $$ / $$ |$$$$$$$$ |$$ |  \__|   $$ |   $$ /  $$ |$$ /  $$ |$$ |\$$$$$$\  
$$ |\$$\  $$ | $$ | $$ |$$   ____|$$ |         $$ |   $$ |  $$ |$$ |  $$ |$$ | \____$$\ 
$$ | \$$\ $$ | $$ | $$ |\$$$$$$$\ $$ |         $$ |   \$$$$$$  |\$$$$$$  |$$ |$$$$$$$  |
\__|  \__|\__| \__| \__| \_______|\__|         \__|    \______/  \______/ \__|\_______/ 

Overview

kmertools is a k-mer based feature extraction tool designed to support metagenomics and other bioinformatics analytics. This tool leverages k-mer analysis to vectorize DNA sequences, facilitating the use of these vectors in various AI/ML applications.

Features

  • Oligonucleotide Frequency Vectors: Generate frequency vectors for oligonucleotides.
  • Minimiser Binning: Efficiently bin sequences using minimisers to reduce data complexity.
  • Chaos Game Representation (CGR): Compute CGR vectors for DNA sequences based on k-mers or whole sequence transformation.
  • Coverage Histograms: Create coverage histograms to analyze the depth of sequencing reads.
  • Python Binding: You can import kmertools functionality using import pykmertools as kt

Installation

Option 1: from bioconda (recommended)

You can install kmertools from Bioconda at https://anaconda.org/bioconda/kmertools. Make sure you have conda installed.

# create conda environment and install kmertools
conda create -n kmertools -c bioconda kmertools

# activate environment
conda activate kmertools

Option 2: from PyPI

You can install kmertools from PyPI at https://pypi.org/project/pykmertools/.

pip install pykmertools

Option 3: from sources

You can install kmertools directly from the source by cloning the repository and using Rust's package manager cargo.

git clone https://github.com/your-repository/kmertools.git
cd kmertools
cargo build --release

Now add the binary to path (you may modify ~/.bashrc or ~/.zshrc)

# to add to current terminal
export PATH=$PATH:$(pwd)/target/release/

# to save to ~/.bashrc
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.bashrc
source ~/.bashrc

# to save to ~/.zshrc for Mac
echo "export PATH=\$PATH:$(pwd)/target/release/" >> ~/.zshrc
source ~/.zshrc

To install the python bindings run the following commands. You can use either pip or conda directories for this.

# pip
cd pip
maturin build --release
# conda
cd conda
maturin build --release

Now move to parent directory using cd .. and run the following command.

pip install target/wheels/pykmertools-<VERSION>-cp39-abi3-manylinux_2_34_x86_64.whl

Test the installation

After setting up, run the following command to print out the kmertools help message.

kmertools --help

Help

Please read our comprehensive Wiki.

Authors

Citation

If you use kmertools please cite as follows.

@software{Wickramarachchi_kmertools_DNA_Vectorisation,
  author = {Wickramarachchi, Anuradha and Mallawaarachchi, Vijini},
  title = {{kmertools: DNA Vectorisation Tool}},
  url = {https://github.com/anuradhawick/kmertools},
  version = {0.1.4}
}

Please refer to the Wiki for citations of relevant algorithms.

Support and contributions

Please get in touch via author websites or GitHub issues. Thanks!