-
Notifications
You must be signed in to change notification settings - Fork 73
Overview
dparks1134 edited this page Sep 7, 2014
·
9 revisions
CheckM is executed from the command line and consists of a series of commands in order to support a number of different analyses and workflows. These commands are organized into several related groups. The two most common workflows are to assess genomes using either lineage-specific or taxonomic-specific marker sets.
Lineage-specific marker set
- tree: place bins in the reference genome tree
- tree_qa: assess phylogenetic markers found in each bin
- lineage_set: infer lineage-specific marker sets for each bin
Taxonomic-specific marker set
- taxon_list: list available taxonomic-specific marker sets
- taxon_set: infer taxonomic-specific marker set
Apply marker set to genome bins
- analyze: identify marker genes in bins
- qa: assess bins for contamination and completeness
Common workflows (combines above commands)
- lineage_wf: runs tree, lineage_set, analyze, qa
- taxonomy_wf: runs taxon_set, analyze, qa
Bin QA plots
- bin_qa_plot: bar plot of bin completeness, contamination, and strain heterogeneity
Reference distribution plots
- gc_plot: create GC histogram and delta-GC plot
- coding_plot: create coding density (CD) histogram and delta-CD plot
- tetra_plot: create tetranucleotide distance (TD) histogram and delta-TD plot
- dist_plot: create image with GC, CD, and TD distribution plots together
General plots
- nx_plot: create Nx-plots
- len_plot: cumulative sequence length plot
- len_hist: sequence length histogram
- marker_plot: plot position of marker genes on sequences
- par_plot: parallel coordinate plot of GC and coverage
- gc_bias_plot: plot bin coverage as a function of GC
Sequence subspace plots
- cov_pca: PCA plot of coverage profiles
- tetra_pca: PCA plot of tetranucleotide signatures
Bin exploration and modification:
- unique: ensure no sequences are assigned to multiple bins
- merge: identify bins with complementary sets of marker genes
- outliers: [Experimental] identify outliers in bins relative to reference distributions
- modify: [Experimental] modify sequences in a bin
Utility functions
- unbinned: identify unbinned sequences
- coverage: calculate coverage of sequences
- tetra: calculate tetranucleotide signature of sequences
- profile: calculate percentage of reads mapped to each bin
- join_tables: join tab-separated value tables containing bin information
- ssu_finder: identify SSU (16S/18S) rRNAs in sequences
- bin_compare: compare two sets of bins (e.g., from alternative binning methods)
For more information on any of these commands type: > checkm COMMAND –h