-
Notifications
You must be signed in to change notification settings - Fork 34
/
08-applying-rnaseq.Rmd
87 lines (70 loc) · 3.42 KB
/
08-applying-rnaseq.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
---
layout: page
title: "RNA-seq analysis in R"
subtitle: "Analysis of Pasilla Knock-down Experiment in Drosophila"
author: "Belinda Phipson and Jovana Maksimovic"
date: "21, 23 September 2016"
minutes: 120
---
**Author: Belinda Phipson and Jovana Maksimovic**
## Data files needed
* counts_Drosophila.txt
* SampleInfo_Drosophila.txt
Available from [https://figshare.com/s/e08e71c42f118dbe8be6](https://figshare.com/s/e08e71c42f118dbe8be6)
## Libraries needed
* limma
* edgeR
* org.Dm.eg.db (annotation for Drosophila)
* gplots
* RColorBrewer
**If you have your own dataset, make sure you have a counts matrix and the sample information. You can look for the annotation package for the species you're studying on the Bioconductor website.**
[http://bioconductor.org/packages/3.3/BiocViews.html#___AnnotationData](http://bioconductor.org/packages/3.3/BiocViews.html#___AnnotationData) (Search for "org." to see the organism packages)
## Introduction
The RNA-Seq data we will be analysing today comes from this published paper:
Brooks, A.N., Yang, L., Duff, M.O., Hansen, K.D., Park, J.W., Dudoit, S.,
Brenner, S.E. and Graveley, B.R. (2011) Conservation of an rna regulatory
map between drosophila and mammals. Genome Research, 21(2), 193-202.
http://www.ncbi.nlm.nih.gov/pubmed/20921232
This is a publicly available dataset, deposited in the
Short Read Archive. The RNA-sequence data are available from GEO under accession nos. GSM461176-GSM461181. The authors combined RNAi and RNASeq
to identify exons regulated by Pasilla, the Drosophila
melanogaster ortholog of mammalian NOVA1 and NOVA2.
They showed that the RNA regulatory map of Pasilla and
NOVA1/2 is highly conserved between insects and mammals.
NOVA1 and NOVA2 are best known for being involved
in alternative splicing. Cells from S2-DRSC, which is an
embryonic cell line, were cultured and subjected to a treatment
in order to knock-down Pasilla. The four untreated and
three treated RNAi samples were used in the
analysis. The treated samples had Pasilla knocked
down by approximately 60% compared to the untreated
samples. Some of the samples had undergone paired
end sequencing while other samples were sequenced from one end only.
The reads were aligned to the Drosophila reference genome,
downloaded from Ensembl, using the tophat aligner.
The reads were summarised at the gene-level using
htseq-count, a function from the tool [HTSeq](http://wwwhuber.embl.de/users/anders/HTSeq/doc/overview.html).
For the purpose of today's workshop, we will be analysing the gene level counts.
## Tasks
* Reading the data into R
* Read in targets file (if you have your own dataset, make a simple tab delimited file with sample information in excel to read into R)
* Filtering out lowly expressed genes
* Quality control
+ Check library sizes
+ Check boxplots of log2 cpm
+ Check MDSplots
* Hierarchical clustering with heatmap.2
* Normalisation
* Testing for differential expression
+ Set up appropriate design matrix
+ voom transform the data
+ Fit linear model and get DE genes
+ Add annotation (different from day 1 - we will show you how)
* Plots after DE
+ Check top gene that is DE between Treated and Untreated
+ Check expression of *pasilla* (ensemble gene id: FBgn0261552)
+ Mean-difference plot (plotMD)
+ Volcano plot
* Testing relative to a threshold
* Gene set testing with goana
* Try some interactive plots from Glimma and save the topTable output in a spreadsheet.