forked from YuLab-SMU/clusterProfiler-book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
13-dplyr.Rmd
123 lines (87 loc) · 2.9 KB
/
13-dplyr.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
# dplyr verbs for clusterProfiler {#chapter13}
[clusterProfiler.dplyr](https://github.com/YuLab-SMU/clusterProfiler.dplyr) package.
```{r include=F}
library(DOSE)
library(clusterProfiler.dplyr)
library(ggplot2)
library(forcats)
library(enrichplot)
theme_set(theme_grey())
select <- clusterProfiler.dplyr:::select.enrichResult
```
```{r}
library(DOSE)
data(geneList)
de = names(geneList)[1:100]
x = enrichDO(de)
```
## filter
```{r}
library(clusterProfiler.dplyr)
filter(x, p.adjust < .05, qvalue < 0.2)
```
## arrange
```{r}
mutate(x, geneRatio = parse_ratio(GeneRatio)) %>%
arrange(desc(geneRatio))
```
## select
```{r}
select(x, -geneID) %>% head
```
## mutate
```{r dot-richfactor, fig.width=7, fig.height=7}
y <- mutate(x, richFactor = Count / as.numeric(sub("/\\d+", "", BgRatio)))
y
library(ggplot2)
library(forcats)
library(enrichplot)
ggplot(y, showCategory = 20,
aes(richFactor, fct_reorder(Description, richFactor))) +
geom_segment(aes(xend=0, yend = Description)) +
geom_point(aes(color=p.adjust, size = Count)) +
scale_color_viridis_c(guide=guide_colorbar(reverse=TRUE)) +
scale_size_continuous(range=c(2, 10)) +
theme_minimal() +
xlab("rich factor") +
ylab(NULL) +
ggtitle("Enriched Disease Ontology")
```
A very similar concept is Fold Enrichment, which is defined as the ratio of two proportions, (k/n) / (M/N). Using `mutate` to add the fold enrichment variable is also easy:
```r
mutate(x, FoldEnrichment = parse_ratio(GeneRatio) / parse_ratio(BgRatio))
```
## slice
We can use `slice` to choose rows by their ordinal position in the enrichment result. Grouped result use the ordinal position with the group.
In the following example, a GSEA result of Reactome pathway was sorted by the absolute values of NES and the result was grouped by the sign of NES. We then extracted first 5 rows of each groups. The result was displayed in Figure \@ref(fig:sliceNES).
(ref:sliceNESscap) Choose pathways by ordinal positions.
(ref:sliceNEScap) **Choose pathways by ordinal positions.**
```{r sliceNES, fig.width=9, fig.height=5, fig.cap="(ref:sliceNEScap)", fig.scap="(ref:sliceNESscap)"}
library(ReactomePA)
data(geneList)
x <- gsePathway(geneList)
library(clusterProfiler.dplyr)
y <- arrange(x, abs(NES)) %>%
group_by(sign(NES)) %>%
slice(1:5)
library(forcats)
library(ggplot2)
library(ggstance)
library(enrichplot)
ggplot(y, aes(NES, fct_reorder(Description, NES), fill=qvalues), showCategory=10) +
geom_barh(stat='identity') +
scale_fill_continuous(low='red', high='blue', guide=guide_colorbar(reverse=TRUE)) +
theme_minimal() + ylab(NULL)
```
## summarise
```{r p-bar, fig.width=8, fig.height=5}
pi=seq(0, 1, length.out=11)
mutate(x, pp = cut(pvalue, pi)) %>%
group_by(pp) %>%
summarise(cnt = n()) %>%
ggplot(aes(pp, cnt)) + geom_col() +
theme_minimal() +
xlab("p value intervals") +
ylab("Frequency") +
ggtitle("p value distribution")
```