From 552fbd7bd4da460715de56a031c8ff12f309c5d6 Mon Sep 17 00:00:00 2001 From: Robert Petit Date: Tue, 30 Apr 2024 18:38:41 -0600 Subject: [PATCH] update readme and adjust for latest camlhmp --- CHANGELOG | 5 + README.md | 216 +++++++++++++++++++++++++++++++++++++++++++- bin/sccmec | 8 ++ bin/sccmec-bioconda | 8 ++ data/sccmec.yaml | 19 ++-- 5 files changed, 248 insertions(+), 8 deletions(-) create mode 100644 CHANGELOG create mode 100755 bin/sccmec create mode 100755 bin/sccmec-bioconda diff --git a/CHANGELOG b/CHANGELOG new file mode 100644 index 0000000..d789a7c --- /dev/null +++ b/CHANGELOG @@ -0,0 +1,5 @@ +# Changelog + +## v1.0.0 rpetit3/sccmec "MRSA" 2024/04/30 + +- Initial release diff --git a/README.md b/README.md index 2cfaeae..e22b5ad 100644 --- a/README.md +++ b/README.md @@ -1,2 +1,216 @@ +sccmec - A tool for typing SCCmec cassettes in assemblies + # sccmec -A tool for typing SCCmec cassettes in assemblies + +`sccmec` is a tool for typing SCCmec cassettes in assemblies. It was designed to be easy to +use. Unlike its predecessor, [staphopia-sccmec](https://github.com/staphopia/staphopia-sccmec), +`sccmec` is much simpler to maintain and update. This is because of [camlhmp](https://github.com/rpetit3/camlhmp) +which allows a organization to be defined in a YAML file. + +## Contributing + +If you would like to become a curator for `sccmec`, please let me know! This could be in the +form of adding new SCCmec types, updating existing ones, or adjusting thresholds. I'm open +to any and all suggestions! + +## Supported SCCmec Types + +The following SCCmec types are supported by `sccmec`. + +| Type | Citation | +|------|----------| +| I | [Katayama et al. 2000](https://doi.org/10.1128/aac.44.6.1549-1555.2000) | +| II | [Katayama et al. 2000](https://doi.org/10.1128/aac.44.6.1549-1555.2000), [Ito et al. 2001](https://doi.org/10.1128%2FAAC.45.5.1323-1336.2001) | +| III | [Katayama et al. 2000](https://doi.org/10.1128/aac.44.6.1549-1555.2000) | +| IV | [Ma et al. 2002](https://doi.org/10.1128%2FAAC.46.4.1147-1152.2002) | +| V | [Ito et al. 2004](https://doi.org/10.1128/aac.48.7.2637-2651.2004) | +| VI | [Oliveira et al. 2006](https://doi.org/10.1128%2FAAC.00629-06) | +| VII | [Berglund et al. 2008](https://doi.org/10.1128%2FAAC.00087-08) | +| VIII | [Zhang et al. 2009](https://doi.org/10.1128%2FAAC.01118-08) | +| IX | [Li et al. 2011](https://doi.org/10.1128%2FAAC.01475-10) | +| X | [Li et al. 2011](https://doi.org/10.1128%2FAAC.01475-10) | +| XI | [Garcรญa-รlvarez et al. 2011](https://doi.org/10.1128/aac.01692-15) | +| XII | [Wu et al. 2015](https://doi.org/10.1128/JCM.39.2.607-612.2001) | +| XIII | [Baig et al. 2018](https://doi.org/10.1016/j.meegid.2018.03.013) | +| XIV | [Urushibara et al. 2020](https://doi.org/10.1093/jac/dkz406) | +| XV | [Wang et al. 2022](https://doi.org/10.1093/jac/dkab500) | + +## Installation + +You can install `sccmec` using `conda`: + +```bash +conda create -n sccmec -c conda-forge -c bioconda sccmec +conda activate sccmec +sccmec --help +``` + +__Note:__ `sccmec` is just a wrapper around [camlhmp-blast](https://github.com/rpetit3/camlhmp?tab=readme-ov-file#camlhmp-blast) +with the defaults for `--yaml` and `--targets` already set. Please don't let this confuse you +when you see all the camels! + +## Usage + +```bash + Usage: camlhmp-blast [OPTIONS] + + ๐Ÿช camlhmp-blast ๐Ÿช - Classify assemblies with a camlhmp schema using BLAST + +โ•ญโ”€ Options โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ +โ”‚ --version -V Show the version and exit. โ”‚ +โ”‚ * --input -i TEXT Input file in FASTA format to classify [required] โ”‚ +โ”‚ * --yaml -y TEXT YAML file documenting the targets and types | +| [default: bin/../data/sccmec.yaml] [required] โ”‚ +โ”‚ * --targets -t TEXT Query targets in FASTA format | +| [default: bin/../data/sccmec.fasta] [required] โ”‚ +โ”‚ --outdir -o PATH Directory to write output [default: ./] โ”‚ +โ”‚ --prefix -p TEXT Prefix to use for output files [default: camlhmp] โ”‚ +โ”‚ --min-pident INTEGER Minimum percent identity to count a hit [default: 95] โ”‚ +โ”‚ --min-coverage INTEGER Minimum percent coverage to count a hit [default: 95] โ”‚ +โ”‚ --force Overwrite existing reports โ”‚ +โ”‚ --verbose Increase the verbosity of output โ”‚ +โ”‚ --silent Only critical errors will be printed โ”‚ +โ”‚ --help Show this message and exit. โ”‚ +โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ +``` + +As mentioned above, `sccmec` is just a wrapper around `camlhmp-blast`. Except, please note that +the `--yaml` and `--targets` options are already set to the SCCmec defaults. This means you only +need to provide the `--input` option with your assembly file. + +### Example Usage + +Here's an example of how to use `sccmec` using an assembly file (both uncompressed and GZip +compressed are supported): + +```bash +sccmec --input tests/fasta/type-Va-AB121219.fasta.gz --prefix type-v --force +Running camlhmp with following parameters: + --input tests/fasta/type-Va-AB121219.fasta.gz + --yaml bin/../data/sccmec.yaml + --targets bin/../data/sccmec.fasta + --outdir ./ + --prefix type-v + --min-pident 95 + --min-coverage 95 + +Starting camlhmp for SCCmec Typing... +Running blastn... +Processing hits... +Final Results... + SCCmec Typing +โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ณโ”โ”โ”โ”โ”โ”โ”โ”โ”โ”“ +โ”ƒ sample โ”ƒ type โ”ƒ targets โ”ƒ schema โ”ƒ version โ”ƒ comment โ”ƒ +โ”กโ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ•‡โ”โ”โ”โ”โ”โ”โ”โ”โ”โ”ฉ +โ”‚ type-v โ”‚ V โ”‚ ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 โ”‚ sccmec โ”‚ 1.0.0 โ”‚ โ”‚ +โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ +Writing outputs... +Final predicted type written to ./type-v.tsv +Results against each type written to ./type-v.details.tsv +blastn results written to ./type-v.blastn.tsv +``` + +If needed you could adjust the `--min-pident` and `--min-coverage` options to be more or less +depending on your needs. + +Once the tool has completed, you will find three output files in the current directory which +described below. + +### Output Files + +`camlhmp-blast` will generate three output files: + +| File Name | Description | +|------------------------|-------------------------------------------------| +| `{PREFIX}.tsv` | A tab-delimited file with the predicted type | +| `{PREFIX}.blast.tsv` | A tab-delimited file of all blast hits | +| `{PREFIX}.details.tsv` | A tab-delimited file with details for each type | + +#### Example {PREFIX}.tsv + +```tsv +sample type targets schema version comment +saureus V ccrC1,IS431,IS431_1,IS431_2,mecA,mecR1 sccmec 1.0.0 +``` + +| Column | Description | +|---------|--------------------------------------------------| +| sample | The sample name as determined by `--prefix` | +| type | The predicted type | +| targets | The targets for the given type that had a hit | +| schema | The schema used to determine the type | +| version | The version of the schema used | +| comment | A small comment about the result | + +#### Example {PREFIX}.blast.tsv + +```tsv +qseqid sseqid pident qcovs qlen slen length nident mismatch gapopen qstart qend sstart send evalue bitscore +ccrC1 AB121219.1 100.000 100 1623 28612 1623 1623 0 0 1 1623 16132 17754 0.0 2998 +IS431_1 AB121219.1 100.000 100 791 28612 791 791 0 0 1 791 8221 9011 0.0 1461 +IS431_1 AB121219.1 99.704 100 675 28612 675 673 2 0 1 675 2693 3367 0.0 1236 +IS431_1 AB121219.1 98.519 100 675 28612 675 665 10 0 1 675 8951 8277 0.0 1192 +... +``` + +This is the standard BLAST output with `-outfmt 6` + +#### Example {PREFIX}.details.tsv + +```tsv +sample type status targets missing schema version comment +type-v I False IS431,mecA,mecR1 ccrA1,ccrB1,IS1272 sccmec 1.0.0 +type-v II False IS431,mecA,mecR1 ccrA2,ccrB2,mecI sccmec 1.0.0 +type-v III False IS431,mecA,mecR1 ccrA3,ccrB3,mecI sccmec 1.0.0 +type-v IV False IS431,mecA,mecR1 ccrA2,ccrB2,IS1272 sccmec 1.0.0 +type-v V True ccrC1,IS431_1,mecA,mecR1,IS431_2 sccmec 1.0.0 +type-v VI False IS431,mecA,mecR1 ccrA4,ccrB4,IS1272 sccmec 1.0.0 +type-v VII False ccrC1,IS431_1,mecA,mecR1,IS431_2 IS12960D sccmec 1.0.0 +type-v VIII False IS431,mecA,mecR1 ccrA4,ccrB4,mecI sccmec 1.0.0 Excluded target ccrC1 found, failing type VIII +type-v IX False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB1 sccmec 1.0.0 +type-v X False IS431_1,mecA,mecR1,IS431_2 ccrA1,ccrB6 sccmec 1.0.0 +type-v XI False mecA,mecR1 ccrA1,ccrB3,blaZ,mecI sccmec 1.0.0 +type-v XII False IS431_1,mecA,mecR1,IS431_2 ccrC2 sccmec 1.0.0 +type-v XIII False IS431,mecA,mecR1 ccrC2,mecI sccmec 1.0.0 +type-v XIV False ccrC1,IS431,mecA,mecR1 mecI sccmec 1.0.0 +type-v XV False IS431,mecA,mecR1 ccrA1,ccrB6,mecI sccmec 1.0.0 +``` + +This file provides a detailed view of the results. The columns are: + +| Column | Description | +|---------|----------------------------------------------------| +| sample | The sample name as determined by `--prefix` | +| type | The predicted type | +| status | The status of the type (True if failed) | +| targets | The targets for the given type that had a match | +| missing | The targets for the given type that were not found | +| schema | The schema used to determine the type | +| version | The version of the schema used | +| comment | A small comment about the result | + +## Citations + +If you use `sccmec` in your research, please cite the following: + +* __[camlgmp](https://github.com/rpetit3/camlhmp)__ +๐ŸชClassification through yAML Heuristic Mapping Protocol ๐Ÿช +Petit III RA [camlhmp: Classification through yAML Heuristic Mapping Protocol](https://github.com/rpetit3/camlhmp) (GitHub) + +* __[BLAST](https://blast.ncbi.nlm.nih.gov/Blast.cgi)__ +Basic Local Alignment Search Tool +*Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, Madden TL [BLAST+: architecture and applications](http://dx.doi.org/10.1186/1471-2105-10-421). BMC Bioinformatics 10, 421 (2009)* + +## Naming + +I considered thinking of a fun name for this tool, but sometimes it's best to get straight +to the point! So, here we are with `sccmec`. + +## License + +I'm not a lawyer and MIT has always been my go-to license. So, MIT it is! + +## Curators + +- [Robert A. Petit III](https://www.robertpetit.com) + diff --git a/bin/sccmec b/bin/sccmec new file mode 100755 index 0000000..9c87b01 --- /dev/null +++ b/bin/sccmec @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +version="1.0.0" +sccmec_dir=$(dirname $0) + +CAML_YAML="${sccmec_dir}/../data/sccmec.yaml" \ +CAML_TARGETS="${sccmec_dir}/../data/sccmec.fasta" \ + camlhmp-blast \ + "${@:1}" diff --git a/bin/sccmec-bioconda b/bin/sccmec-bioconda new file mode 100755 index 0000000..39a09db --- /dev/null +++ b/bin/sccmec-bioconda @@ -0,0 +1,8 @@ +#!/usr/bin/env bash +version="1.0.0" +sccmec_dir=$(dirname $0) + +CAML_YAML="${sccmec_dir}/../share/sccmec/sccmec.yaml" \ +CAML_TARGETS="${sccmec_dir}/../share/sccmec/sccmec.fasta" \ + camlhmp-blast \ + "${@:1}" diff --git a/data/sccmec.yaml b/data/sccmec.yaml index 33a01cc..5d516e9 100644 --- a/data/sccmec.yaml +++ b/data/sccmec.yaml @@ -2,14 +2,17 @@ --- # metadata provides information about the schema metadata: + id: "sccmec" # id of the schema name: "SCCmec Typing" # name of the schema - description: "A partial schema for SCCmec typing" # description of the schema - version: "0.1" # version of the schema + description: "A schema for SCCmec typing" # description of the schema + version: "1.0.0" # version of the schema curators: # A list of curators of the schema - "Robert Petit" + # engine provides information about the tool and parameters used engine: tool: blastn # The tool used to generate the data + # targets provides a list of sequence targets (primers, genes, proteins, etc...) targets: - "ccrA1" @@ -32,6 +35,7 @@ targets: - "mecI" - "mecR1" - "blaZ" + # aliases allow for grouping of targets under a common name aliases: - name: "ccr Type 1" # name of the alias @@ -58,10 +62,11 @@ aliases: targets: ["IS431_1", "mecA", "mecR1", "IS431_2"] # mec class C1 and C2 have same targets - name: "mec Class E" targets: ["blaZ", "mecA", "mecR1", "mecI"] -# profiles includes the final typing designations based on targets and aliases -profiles: - - name: "I" # name of the profile - targets: # list of targets that are part of the profile + +# types includes the final typing designations based on targets and aliases +types: + - name: "I" # name of the types + targets: # list of targets that are part of the types - "ccr Type 1" - "mec Class B" - name: "II" @@ -80,7 +85,7 @@ profiles: targets: - "ccr Type 5" - "mec Class C" - excludes: + excludes: # list of targets that if present will exclude the type - "ccr Type 3" - "mecI" - "IS12960D"