-
Notifications
You must be signed in to change notification settings - Fork 14
/
README.Rmd
189 lines (127 loc) · 10.1 KB
/
README.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
---
output: github_document
---
<!-- README.md is generated from README.Rmd. Please edit that file -->
```{r, include = FALSE}
knitr::opts_chunk$set(
collapse = TRUE,
comment = "#>",
fig.path = "man/figures/README-",
out.width = "100%"
)
```
![Stay proActiv!](man/figures/proActiv_design.png)
![Stay proActiv!](man/figures/proActiv_name.png)
## proActiv: Estimation of Promoter Activity from RNA-Seq data
<!-- badges: start -->
[![GitHub release (latest by date)](https://img.shields.io/github/v/release/GoekeLab/proActiv)](https://github.com/GoekeLab/proActiv/releases/)
[![Maintained?](https://img.shields.io/badge/Maintained%3F-Yes-brightgreen)](https://github.com/GoekeLab/proActiv/graphs/contributors)
[![Install](https://img.shields.io/badge/Install-Github-brightgreen)](#installation)
<!-- badges: end -->
proActiv is an R package that estimates promoter activity from RNA-Seq data. proActiv uses aligned reads and genome annotations as input, and provides absolute and relative promoter activity as output. The package can be used to identify active promoters and alternative promoters. Details of the method are described in [Demircioglu et al][Reference].
HTML documentation of proActiv, including a complete step-by-step workflow and a function manual, is available at https://goekelab.github.io/proActiv/.
Additional data on differential promoters in tissues and cancers from TCGA, ICGC, GTEx, and PCAWG is available at https://jglab.org/data-and-software/.
### Content
- [Installation]
- [Quick Start]
- [Creating a Promoter Annotation object]
- [Complete Analysis Workflow: Analyzing Alternative Promoters]
- [Limitations]
- [Release History]
- [Reference]
- [Contributors]
### Installation
proActiv can be installed from Bioconductor:
```{r, eval=FALSE}
if (!requireNamespace("BiocManager", quietly = TRUE))
install.packages("BiocManager")
BiocManager::install("proActiv")
```
### Quick Start
proActiv estimates promoter activity from RNA-Seq data. Promoter activity is defined as the total amount of transcription initiated at each promoter. proActiv takes as input either BAM files or junction files (TopHat2 or STAR), and a promoter annotation object of the relevant genome. An optional argument `condition` can be supplied, describing the experimental condition corresponding to each input file. Here we demonstrate proActiv with STAR junction files (Human genome GRCh38 GENCODE v34) as input. These files are taken from the [SGNEx project](https://github.com/GoekeLab/sg-nex-data) but restricted to the chr1:10,000,000-30,000,000 region, and can be found at `extdata/vignette`:
- `extdata/vignette/SGNEx_A549_Illumina_replicate1-run1.subset.SJ.out.tab.gz`
- `extdata/vignette/SGNEx_A549_Illumina_replicate3-run1.subset.SJ.out.tab.gz`
- `extdata/vignette/SGNEx_A549_Illumina_replicate5-run1.subset.SJ.out.tab.gz`
- `extdata/vignette/SGNEx_HepG2_Illumina_replicate2-run1.subset.SJ.out.tab.gz`
- `extdata/vignette/SGNEx_HepG2_Illumina_replicate4-run1.subset.SJ.out.tab.gz`
- `extdata/vignette/SGNEx_HepG2_Illumina_replicate5-run1.subset.SJ.out.tab.gz`
```{r QuickStart, eval=FALSE}
library(proActiv)
## List of STAR junction files as input
files <- list.files(system.file('extdata/vignette/junctions',
package = 'proActiv'), full.names = TRUE)
## Vector describing experimental condition
condition <- rep(c('A549','HepG2'), each=3)
## Promoter annotation for human genome GENCODE v34
promoterAnnotation <- promoterAnnotation.gencode.v34.subset
result <- proActiv(files = files,
promoterAnnotation = promoterAnnotation,
condition = condition)
```
`result` is a summarizedExperiment object which can be accessed as follows:
- `assays(results)` returns raw/normalized promoter counts, absolute/relative promoter activity and gene expression data
- `rowData(results)` returns promoter metadata and summarized absolute promoter activity by conditions
proActiv can also be run with BAM files as input, but an additional parameter `genome` must be supplied:
```{r, eval=FALSE}
## From BAM files - genome parameter must be provided
files <- list.files(system.file('extdata/testdata/bam', package = 'proActiv'), full.names = TRUE)
result <- proActiv(files = files,
promoterAnnotation = promoterAnnotation.gencode.v34.subset,
genome = 'hg38')
```
### Creating a Promoter Annotation object
In order to quantify promoter activity, proActiv uses a set of promoters based on genome annotations. proActiv allows the creation of a promoter annotation object for any genome from a TxDb object or from a GTF file with the `preparePromoterAnnotation` function. Users have the option to either pass the file path of the GTF/GFF or TxDb to be used, or use the TxDb object directly as input. proActiv includes pre-calculated promoter annotations for the human genome (GENCODE v34). However, due to size constraints, the annotation is restricted to the chr1:10,000,000-30,000,000 region. Users can build full annotations by downloading GTF files from [GENCODE](https://www.gencodegenes.org) page and following the steps below.
Here, we demonstrate creating the subsetted promoter annotation for the Human genome (GENCODE v34) with both GTF and TxDb:
```{r CreateAnnotation, eval=FALSE}
## From GTF file path
gtf.file <- system.file('extdata/vignette/annotations/gencode.v34.annotation.subset.gtf.gz',
package = 'proActiv')
promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(file = gtf.file,
species = 'Homo_sapiens')
## From TxDb object
txdb.file <- system.file('extdata/vignette/annotations/gencode.v34.annotation.subset.sqlite',
package = 'proActiv')
txdb <- loadDb(txdb.file)
promoterAnnotation.gencode.v34.subset <- preparePromoterAnnotation(txdb = txdb,
species = 'Homo_sapiens')
```
The `PromoterAnnotation` object has 3 slots:
- `intronRanges` : Intron ranges, giving the corresponding transcripts of each intron
- `promoterIdMapping` : An ID mapping between transcripts, promoter IDs and gene IDs
- `promoterCoordinates` : Promoter coordinates (TSS) and internal promoter state, along with the 3' coordinate of the first exon
### Complete Analysis Workflow: Analyzing Alternative Promoters
Most human genes have multiple promoters that control the expression of distinct isoforms. The use of these alternative promoters enables the regulation of isoform expression pre-transcriptionally. Importantly, alternative promoters have been found to be important in a wide number of cell types and diseases. proActiv includes a workflow to identify and visualize alternative promoter usage between conditions. This workflow is described in detail [here](https://goekelab.github.io/proActiv/articles/proActiv.html). Documentation of
all functions can be found [here](https://goekelab.github.io/proActiv/reference/index.html).
## Release History
**Release 1.1.18**
Date: 7th April 2021
Changes in version 1.1.18:
- Gene expression data is now stored in the `assays` of the summarizedExperiment object returned by `proActiv` to facilitate easier filtering of the summarizedExperiment object. The metadata slot is now empty.
- Plotting promoter activity: Implementation of `boxplotPromoters` function to plot boxplots of absolute promoter activity, relative promoter activity, and gene expression.
- Identification of alternative promoters: Implementation of `getAlternativePromoters`, used to identify promoters that may exhibit alternative usage.
**Release 1.0.0**
Release date: 28th October 2020
Released with Bioconductor 3.12
**Release 0.99.0**
Release date: 21st August 2020
Changes in version 0.99.0:
- Workflow: The wrapper function `proActiv` performs all steps to estimate promoter activity
and calculates promoter metadata. A `condition` argument can be supplied for `proActiv` to
summarize promoter counts and activity across conditions. These results are returned as a
SummarizedExperiment object.
- BAM file usage: In addition to junction files, `proActiv` now allows BAM files as input.
However, users should note that this function is not fully optimized and may have long run-time.
- Promoter annotation: Improved efficiency in generating promoter annotations without the need for parallelization with the `preparePromoterAnnotations` function. The promoter annotation object is also trimmed to preserve essential information for running `proActiv`, in order to comply with Bioconductor guidelines concerning package size.
- Plotting promoter activity: The plotting function `plotPromoters` visualizes promoter activity across conditions. It accepts the SummarizedExperiment object returned by `proActiv` along with a gene of interest and gene annotations as arguments. This allows users to visualize promoter activity and identify instances of alternative promoter usage.
- Vignette: proActiv now comes with a vignette, documenting a complete step-by-step workflow in identifying active and alternative promoter usage. This includes guidance on running `proActiv`, creating promoter annotations and identifying alternative promoter usage. Various visualizations of promoter activity are also offered.
**Initial Release 0.1.0**
Release date: 19th May 2020
This release corresponds to the proActiv version used by [Demircioglu et al.][Reference]
## Limitations
proActiv will not provide promoter activity estimates for promoters which are not uniquely identifiable from splice junctions (single exon transcripts, promoters which overlap with internal exons).
## Reference
If you use proActiv, please cite:
[Demircioğlu, Deniz, et al. "A Pan-cancer Transcriptome Analysis Reveals Pervasive Regulation through Alternative Promoters." *Cell* 178.6 (2019): 1465-1477.](https://www.cell.com/cell/fulltext/S0092-8674(19)30906-7)
## Contributors
proActiv is developed and maintained by [Deniz Demircioglu](https://github.com/dnzdmrcgl), [Joseph Lee](https://github.com/jleechung), and [Jonathan Göke](https://github.com/jonathangoeke).
![Stay proActiv!](man/figures/proActiv_logoName.png)