-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathempirical_data_analysis.Rmd
100 lines (72 loc) · 3.96 KB
/
empirical_data_analysis.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
---
title: "Empirical data analysis"
author: "Shai Pilosof"
date: "April 16, 2020"
output: github_document
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE, cache=TRUE)
knitr::opts_knit$set(root.dir = "/Users/Shai/GitHub/ecomplab/CRISPR_networks/")
knitr:::read_chunk('/Users/Shai/GitHub/ecomplab/CRISPR_networks/empirical_data_analysis.R')
```
```{r LOAD PACKAGES, include=FALSE, echo=FALSE}
library(tidyverse)
library(magrittr)
library(sqldf)
library(cowplot)
library(igraph)
library(bipartite)
library(infomapecology)
```
```{r Load functions, echo=FALSE}
<<load_functions>>
```
```{r connect to data base}
db <- dbConnect(SQLite(), dbname = '/Users/Shai/GitHub/ecomplab/CRISPR_networks/data/CRISPR_database_NEE.sqlite')
```
# Analysis
We analyze the **weighted-nested structure of immunity networks** and the **modular structure of host-spacer networks** in three data sets: Yellowstone, Pseudomonas with virus cluster 3 and Russia 2010. In the figures below the rows correspond to these 3 data sets, respectively.
```{r Load analysis, echo=FALSE, include=FALSE, eval=TRUE}
<<analysis>>
```
## Immunity networks
In immunity networks we test for significance in nestedness by shuffling the values of interactions (model `r00_samp`). We find that all significantly modular.
```{r plots immunity}
plot_grid(Yellowstone$p4, NULL,
Pseudomonas$p4, Pseudomonas$p3,
Russia2010$p4, Russia2010$p3,
ncol=2, nrow=3, align = 'vh', labels = c('a','',letters[2:5]), label_size = 16, scale = 0.95)
```
## Host-spacr networks
In host spacer networks we test for significance in modularity by shuffling the host-spacer network with an equiprobable algorithm. We find that all the networks are significantly modular.
```{r plots host-spacer}
plot_grid(Yellowstone$p1, Yellowstone$p2,
Pseudomonas$p1, Pseudomonas$p2,
Russia2010$p1, Russia2010$p2,
ncol=2, nrow=3, align = 'vh', labels = letters[1:6], label_size = 16, scale = 0.95)
```
### Phylogenetic signal within modules
We look for phylogenetic signal in the host-spacer network of the Russia 2010 data set. The tree:
```{r phylogenetic_analysis, echo=FALSE, include=FALSE, eval=TRUE, warning=FALSE, message=FALSE}
<<phylogenetic_analysis>>
```
```{r plot tree}
plot(tree)
```
But not all the strains appear in the tree and in the network.
```{r strains_tree_network, echo=TRUE, include=TRUE, eval=TRUE, warning=FALSE, message=FALSE}
phylo_dist_signif$nodes_in_modules # nodes_in_modules
phylo_dist_signif$nodes_in_tree # nodes_in_tree
phylo_dist_signif$overlapping # Overlapping
```
There are only `r length(phylo_dist_signif$overlapping)` overlapping strains out of `r length(phylo_dist_signif$nodes_in_modules)` in the network.
First we calculate the mean phylogenetic distance between hosts within modules. Note that not all modules are represented. This is because some of the modules have a single strain (distance requires minimum 2), and because not all the strains in the original network are included due to low overlap with strains in the tree.
We then create permuted modules of the same size but with different host strains and recalculate the mean PD again within modules per permutation. The null hypothesis is that the permuted distance is smaller than the observed for each module (i.e., no signal). If we **reject** this hypothesis then there is phylogenetic signal because the observed PD beteween hosts within each module would be smaller than expected by chance (closely related hosts share a module). We obtain a distribution per module and estimate significane per module with Bonferroni correction (reducing the threshold for significance by dividing 0.05 by the number of modules).
```{r within_modules, echo=TRUE, include=TRUE, eval=TRUE, warning=FALSE, message=FALSE}
left_join(phylo_dist_signif$result_within_moduels, phylo_dist_signif$D_obs) %>%
select(m, mod_size, Signif_Bonferroni)
phylo_dist_signif$plt_within_modules
```
```{r disconnect from data base}
dbDisconnect(db)
```