Skip to content

Commit

Permalink
Shapemapper version 2.3 Update
Browse files Browse the repository at this point in the history
Added ability for ShapeMapper to isolate, process, and analyze N1/3
and N7-G mutations simultaneously. Additionally, made some minor changes
to QC thresholds and functionality. See internals/changelog.md for
in depth explanation of all changes to functionality.
  • Loading branch information
lucaskearns committed Nov 22, 2024
1 parent 4b84d35 commit 732b4d0
Show file tree
Hide file tree
Showing 41 changed files with 3,605 additions and 1,934 deletions.
4 changes: 2 additions & 2 deletions LICENSE.txt
Original file line number Diff line number Diff line change
@@ -1,13 +1,13 @@
Copyright 2023
Copyright 2024

Contributors
Steven Busan
Anthony Mustoe
David Mitchell III
Patrick Irving
Thomas Miller
Lucas Kearns


Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
Expand Down
44 changes: 36 additions & 8 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ This is a github-flavored markdown file not meant to be easily readable.
![](docs/images/header_profile.png)
**ShapeMapper2**
===============
*Copyright 2019 Steven Busan; 2022 Anthony Mustoe*. This project is licensed under the terms of the
*Copyright 2019 Steven Busan; 2024 Anthony Mustoe*. This project is licensed under the terms of the
MIT license.

ShapeMapper automates the calculation of RNA chemical probing reactivities
Expand Down Expand Up @@ -40,15 +40,15 @@ Installation
------------
ShapeMapper will only run on 64-bit Linux systems (Mac and Windows are not currently supported).

- Download latest [release](https://github.com/Weeks-UNC/shapemapper2/releases/download/2.2.0/shapemapper2-2.2.tar.gz)
- On most systems, typing `wget https://github.com/Weeks-UNC/shapemapper2/releases/download/2.2.0/shapemapper2-2.2.tar.gz`
- Download latest [release](https://github.com/Weeks-UNC/shapemapper2/releases/download/2.2/shapemapper2-2.3.tar.gz)
- On most systems, typing `wget https://github.com/Weeks-UNC/shapemapper2/releases/download/2.2/shapemapper2-2.3.tar.gz`
will download the file on the commandline.
- Be sure to download from the `shapemapper2-2.2.tar.gz` link, _not_ the source code-only links, which
- Be sure to download from the `shapemapper2-2.3.tar.gz` link, _not_ the source code-only links, which
do not include executables.

- Extract release tarball using

`tar -xvf shapemapper2-2.2.tar.gz`
`tar -xvf shapemapper2-2.3.tar.gz`

- Add shapemapper executable to PATH (optional - google this if you don't know how)

Expand All @@ -58,10 +58,12 @@ ShapeMapper will only run on 64-bit Linux systems (Mac and Windows are not curre
and `shapemapper_temp`

- To run all unit and end-to-end tests, run `internals/test/run_all_tests.sh`.
This should take about 5-15 minutes. (optional)
This should take about 25-30 minutes. (optional)

- Occassionally a single module failure detection test may fail. We attribute this error message to idiosyncracies in computational environment as opposed to an issue with shapemapper. This error message may be safely ignored.

- Alternatively, you can build ShapeMapper if the provided binaries do not run on your platform.
Building is relatively straightforward using conda. See <a href="https://github.com/Weeks-UNC/shapemapper2/blob/master/docs/building.md">building</a>.
Building is relatively straightforward using conda. See <a href="https://github.com/Weeks-UNC/shapemapper2/blob/v23-release/docs/building.md">building</a>.

<!-- #### -->

Expand Down Expand Up @@ -285,6 +287,25 @@ shapemapper <parameters> <inputs> | --version | --help
--serial Run pipeline components one at a time and write all intermediate files
to disk. Useful for debugging, but not generally recommended, as this will
use large amounts of disk space. Default=False
--N7 Add N7 information to data visualization. Adds a graph of mutation rates
and reactivities specific to N7 data in profiles.pdf. Prior to usage,
ensure proper protocol was followed to generate valid N7 data.
--output-temp
Preserves temp files. Default=False.
--pernt-norm-factor-threshold
Set the number of NTs needed for effective per-nt normalization factor
calculation. May need to change in the case of short RNAs. Default=20
--ignore_low_N7
Bypass N7 quality control filters.
--bypass_filters
Bypass N7 quality control filters and set threshold for NTs needed for effective
per-nt normalization factor calculation to 1.
(Equivalent to "--ignore_low_N7 --theshold 1")
```

&nbsp;&nbsp;&nbsp;&nbsp;
Expand Down Expand Up @@ -343,6 +364,9 @@ see [FAQ](docs/FAQ.md)
### DMS mode
see [DMSmode](docs/dmsmode.md)

### N7-G related functionality
see [N7-G](docs/N7-G.md)

### Low-quality profile warning message
If ShapeMapper gives a red warning message about possible low-quality
reactivity profiles, read the log file to see which quality control
Expand Down Expand Up @@ -396,7 +420,11 @@ Smola MJ, Rice GM, Busan S, Siegfried NA, Weeks KM. Selective 2'-hydroxyl acylat

#### For DMS-specific analyses, please cite:

David Mitchell, III et al, Mutation signature filtering enables high-fidelity RNA structure probing at all four nucleobases with DMS, Nucleic Acids Research, 2023;, gkad522,
David Mitchell, III et al, Mutation signature filtering enables high-fidelity RNA structure probing at all four nucleobases with DMS, Nucleic Acids Research, 2023;, gkad522,
[link](https://doi.org/10.1093/nar/gkad522)

#### For msDMS_MaP (N7-G) analyses, please cite:

Irfana Saleem, Thomas Miller, Lucas Kearns, David Mitchell, Ritwika Bose, Chase Weidman, Anthony Mustoe. Title to be determined. Journal to be determined. 202X.

&nbsp;&nbsp;&nbsp;&nbsp;
9 changes: 9 additions & 0 deletions docs/FAQ.md
Original file line number Diff line number Diff line change
Expand Up @@ -165,6 +165,15 @@ Siegfried NA, Busan S, Rice GM, Nelson JA, Weeks KM. RNA motif discovery by SHAP
Smola MJ, Rice GM, Busan S, Siegfried NA, Weeks KM. Selective 2'-hydroxyl acylation analyzed by primer extension and mutational profiling (SHAPE-MaP) for direct, versatile and accurate RNA structure analysis. _Nat Protoc_. 2015, 10(11):1643-69.
[link](http://www.ncbi.nlm.nih.gov/pubmed/21979276)

#### For DMS-specific analyses, please cite:

David Mitchell, III et al, Mutation signature filtering enables high-fidelity RNA structure probing at all four nucleobases with DMS, Nucleic Acids Research, 2023;, gkad522,
[link](https://doi.org/10.1093/nar/gkad522)

#### For msDMS_MaP (N7-G) analyses, please cite:

Irfana Saleem, Thomas Miller, Lucas Kearns, David Mitchell, Ritwika Bose, Chase Weidman, Anthony Mustoe. Title to be determined. Journal to be determined. 202X.

---

&nbsp;&nbsp;&nbsp;&nbsp;
Expand Down
71 changes: 71 additions & 0 deletions docs/N7-G.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
<!---
NOTE:
If you're reading this, instead try opening README.html in a web browser
or view this file from within the github repository website.
This is a github-flavored markdown file not meant to be easily readable.
-->

N7 mode
========

Overview
--------
The --N7 option was added in v2.3 to allow processing and analysis of N7-G
modifications induced under the experimental conditions specified for dms
treatment - [DMSmode](docs/dmsmode.md). Traditionally, measurement and analysis
of N7-G modifications on RNA has involved harsh biochemical processing separate
from conventional N1/3 modification analysis. We have developed msDMS-MaP to
allow simultaneous detection and analysis of both N1/3 and N7-G modifications.
We have shown that N7-G reactivity is informative about RNA tertiary and quaternary
structure. The N7-G reactivity analysis complements the traditional N1/3 reactivity
analysis which is conventionally interpreted in the context of secondary structure.

Under the aformentioned experimental conditions, these N7-G modifications manifest
as G>A mutations. Shapemapper isolates and processes these modifications in a
channel separate from N1/3 modifications. When the --N7 flag is used, N7-G related
information will be written to a profile.txtga file (and one or more .mutga files
if the --output-parsed-mutations flag is used). Additionally, the N7-G data will
be visualized alongside N1/3 data in the profile.pdf output file.


Experimental conditions
------------------
see [DMSmode](docs/dmsmode.md)


Normalization
-------------
Please see [place publication here] for a detailed description of N7-G normalization.

Due to the way these sites are normalized, the raw rate has as inverse correlation
with normalized reactivity. In other words, an N7-G position with a low raw rate will
have a high normalized reactivity. We term sites with high normalized reactivity "protected".
Additionally, N7-G reactivity normalization incorporates a log2 transformation. Thus,
successive increments of 1 correspond to a "doubling" of the N7-G normalized reactivity.
For example a normalized reactivity of 2 is twice as protected as a normalized reactivity of 1
due to the preceding log2 transformation.

Based on prior experiments, we have set thresholds to determine how protected each N7-G
position is. Cutoffs have been set at 1.6 and 2.3 corresponding to bases which
are protected and highly protected respectively. In the profiles.pdf data visualization
unprotected bases (N reactivity < 1.6) are colored black, protected bases
(1.6 <= N reactivity < 2.3) are colored pink, and highly protected bases (N reactivity >= 2.3)
are colored purple.


Further Analysis
------------------
Additional analysis of N7-G data may be performed in [RingMapper](https://github.com/Weeks-UNC/RingMapper), [DanceMapper](https://github.com/MustoeLab/DanceMapper), and [ArcPlot](https://github.com/MustoeLab/StructureAnalysisTools).

Each of these packages has functionality specific to N7-G data processing.


Citation and reference
----------------------
Please cite Saleem and Miller et al, Journal To Be Determined, 202X, for publications using the --N7 option

Please cite Mitchell et al, Nucleic Acids Research, 2023, for publications using the --dms option

Bicine buffering conditions were first described in Mustoe et al, PNAS 2019

5 changes: 5 additions & 0 deletions docs/analysis_steps.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,11 @@ Steps are listed in order of execution, with the exception of
reference sequence correction, which is a larger preprocessing
stage that includes many of the other listed stages.

Note, analysis steps will vary if --N7 or --dms options are used.
See [N7-G](docs/N7-G.md) and [DMSmode](docs/dmsmode.md) for further details.



---
&nbsp;&nbsp;&nbsp;&nbsp;

Expand Down
57 changes: 57 additions & 0 deletions docs/changelog.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,62 @@ This is a github-flavored markdown file not meant to be easily readable.
Version history
---------------

###Version 2.3
August 30, 2024

ADDED:

Added capacity for Shapemapper to process and output N7-G modification
information (see *profile.txtga, *.mutga files).

Added ability for renderfigures to visualize raw and normalized N7-G reactivity rate.

Added --N7 Flag. Enables toggling of N7 Data processing / output

Added --pernt-norm-factor-threshold flag. Sets minimum number of nucleotides with quality
reactivity information needed for per-nt normalization.

Temp file now deleted upon run completion by default. Add --output-temp to
arguments to avoid this.

c++ code modified to ignore mutations involving positions within 3 NT of forward
and reverse primer sites. These nucleotides are effectively set to "no data".

Added an N7-G quality control filter detecting low N7-G reactivity. If trigerred,
N7-G output files will be deleted and N7-G reactivity will not be visualized. This
may be bypassed with the --ignore_low_N7 flag.

Added --bypass_filters flag. Equivalent to passing --ignore_low_N7, and
--pernt-norm-factor-threshold 1.

CHANGED:

Bowtie2 wrapper output filtering changed to remove Warning about reads
failing to align because they are < 2 characters long or beacause length was
<= seed mismatches.

If normalized reactivtities are found to be infinite, they are set to np.nan instead.

high mutation threshold in N1/3 dms quality control check has been modified.
(A: 0.02 -> 0.05; C 0.02 -> 0.05; U 0.005 -> 0.01)

Updated normalization scheme to account for purine and pyrimidine in N7-G
reactivity normalization. Additionally, highly protected bases (bases in
which background is higher than the modified treatment) are set to a normalized
value of 3.32.

FIXED:

Fixed inability of --N7 runs to be performed in parallel when using bowtie2 aligner.

Fixed error arising in per nucleotide normalization when the 90th and 95th
percetile of reactivities are the same.

Fixed error arising when fastq files had additional text besides "+" in the
third field.

Fixed error that prevented --amplicon and --correct-seq from being used
simultaneously.
### 2.2.1 (August 2022)
- Added dmsmode option for DMS-specific data processing workflow to obtain
highly specific probing signals at all four nucleobases. Read dmsmode
Expand All @@ -22,6 +78,7 @@ Version history
mode is selected, --serial mode is automatically turned on
- Updated build instructions to make it easier to build binaries


### 2.1.5 (August 2019)
No future updates are planned beyond this final release.

Expand Down
6 changes: 3 additions & 3 deletions docs/dmsmode.md
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Probing conditions
------------------
Appropriate buffering conditions are required to support robust U and G modification.
Optimized Bicine buffer conditions are described in Mustoe et al, PNAS 2019, and
Mitchell et al, XX.
Mitchell et al, Nucleic Acids Research, 2023,.

ShapeMapper --dms will automatically check for sufficient modification at U and G.
If modification rates at one or both nucleotides are too low (for example, because
Expand All @@ -42,7 +42,7 @@ from downstream analyses. In this scenario, other nucleotides types (particularl
contain usable data and will be reported.


MaP enzmye choice
MaP enzyme choice
-----------------
The pipeline is optimized for datasets collected using the MarathonRT enzyme, which we
recommend. But --dms analysis also benefits datasets collected using TGIRT-III and
Expand Down Expand Up @@ -83,7 +83,7 @@ generated by ShapeMapper --dms. These parameter files are distributed in... [TOD
Citation and reference
----------------------

Please cite Mitchell et al, XXX, for publications using the --dms option
Please cite Mitchell et al, Nucleic Acids Research, 2023, for publications using the --dms option

Bicine buffering conditions were first described in Mustoe et al, PNAS 2019

10 changes: 8 additions & 2 deletions docs/file_formats.md
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,10 @@ amplicon primer pair read depths.

<a name="profile-format"> </a>

### `<name>_<RNA>_profile.txt`
### `<name>_<RNA>_profile.txt and <name>_<RNA>_profile.txtga`

Note - profile.txtga file will only contain information about
N7-G sites. see [N7-G](docs/N7-G.md).

Tab-delimited text columns. First line is column names.

Expand Down Expand Up @@ -184,7 +187,10 @@ Format: SAM

Commandline option: <kbd>--output-parsed-mutations</kbd>

Filename: `<name>_<sample>_<RNA>_parsed.mut`
Filename: `<name>_<sample>_<RNA>_parsed.mut and <name>_<sample>_<RNA>_parsed.mutga`

Note - parsed.mutga files only contains information pertaining to N7-G modification.
see [N7-G](docs/N7-G.md).

#### Format:

Expand Down
4 changes: 3 additions & 1 deletion internals/bin/bowtie2_wrapper.sh
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,9 @@
# (This works, but puts stderr into stdout.
# Tried redirecting stderr using process substitution, but that results in
# bowtie2-align getting called with a final arg 2/dev/fd/63 and crashing)
bowtie2 "$@" 2>&1 | sed '/^Warning: skipping mate #/d'

#28 April 2023 - appened sed expressions to remove odd bowtie2 seed mismatch glitch stemming from
bowtie2 "$@" 2>&1 | sed '/^Warning: skipping mate #/d' | sed '/ because it was < 2 characters long$/d' | sed '/ because length (1) <= # seed mismatches (0)$/d'
exit ${PIPESTATUS[0]}


Loading

0 comments on commit 732b4d0

Please sign in to comment.