diff --git a/docs/404.html b/docs/404.html index 184d7db..240352d 100644 --- a/docs/404.html +++ b/docs/404.html @@ -79,7 +79,7 @@
@@ -153,7 +153,7 @@library(metagam)
Create 5 datasets from standard gam example #1. The noise parameter is scaled, such that the first GAM has lowest noise and the last has highest noise. In a dominance plot, we plot the relative contributions of each dataset on the prediction. By the way we generated the data, we expect largest influence by dataset 1 and lowest by dataset 5.
-## simulate datasets -set.seed(123) -datasets <- lapply(1:num.datasets, function(x) mgcv::gamSim(scale = x, verbose = FALSE))
+## simulate datasets
+set.seed(123)
+datasets <- lapply(1:num.datasets, function(x) mgcv::gamSim(scale = x, verbose = FALSE))
Delete all data in the first dataset that has values lower than .2 on the dimension x2
. Thus, we expect a low (i.e., zero) contribution of the first dataset on low values of dimensions x2
.
df <- datasets[[1]] -df[df$x2<0.2,] <- NA -datasets[[1]] <- df
+df <- datasets[[1]]
+df[df$x2<0.2,] <- NA
+datasets[[1]] <- df
Next, delete all values of the second dataset for large values (x2
>.8).
df <- datasets[[2]] -df[df$x2 > 0.8, ] <- NA -datasets[[2]] <- df
+df <- datasets[[2]]
+df[df$x2 > 0.8, ] <- NA
+datasets[[2]] <- df
Then, we analyze the term s(x2)
.
meta_analysis <- metagam(models, grid_size = 500, terms = "s(x2)", intercept = TRUE)
Then, we analyze the term s(x2)
.
+meta_analysis <- metagam(models, grid_size = 500, terms = "s(x2)", intercept = TRUE)
## Loading required package: Matrix
+## Loading 'metafor' package (version 2.4-0). For an overview
+## and introduction to the package please type: help(metafor).
Finally, we create a dominance plot that tells us how much the points on axis x2
are influenced by the individual GAMs. Since plot_dominance
returns a ggplot object, we can modify the colors using, e.g. the viridis package. We see that the influence of the GAMs is graded according to the simulated noise levels. Second, we see that on the left-hand side, the influence of the first dataset is almost zero whereas the influence of the second dataset is almost zero on the righthand side.
library(viridis)
## Loading required package: viridisLite
-plot_dominance(meta_analysis) + - scale_fill_viridis(discrete = TRUE)
+plot_dominance(meta_analysis) +
+ scale_fill_viridis(discrete = TRUE)
library(metagam)
We start by simulating 5 datasets using the gamSim()
function from mgcv. We use the response \(y\) and the explanatory variable \(x_{2}\), but add an additional shift \(\beta x_{2}^{2}\) where \(\beta_{2}\) differs between datasets, yielding heterogenous data.
library(mgcv) ++ dat +})+library(mgcv) #> Loading required package: nlme -#> This is mgcv 1.8-31. For overview type 'help("mgcv-package")'. -set.seed(1233) -shifts <- c(0, .5, 1, 0, -1) -datasets <- lapply(shifts, function(x) { +#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'. +set.seed(1233) +shifts <- c(0, .5, 1, 0, -1) +datasets <- lapply(shifts, function(x) { ## Simulate data - dat <- gamSim(scale = .1, verbose = FALSE) + dat <- gamSim(scale = .1, verbose = FALSE) ## Add a shift - dat$y <- dat$y + x * dat$x2^2 + dat$y <- dat$y + x * dat$x2^2 ## Return data - dat -})
Next, we analyze all datasets, and strip individual participant data.
-models <- lapply(datasets, function(dat){ - b <- gam(y ~ s(x2, bs = "cr"), data = dat) - strip_rawdata(b) -})
+models <- lapply(datasets, function(dat){
+ b <- gam(y ~ s(x2, bs = "cr"), data = dat)
+ strip_rawdata(b)
+})
Next, we meta-analyze the models. Since we only have a single smooth term, we use type = "response"
to get the response function. This is equivalent to using type = "iterms"
and intercept = TRUE
.
meta_analysis <- metagam(models, type = "response")
+meta_analysis <- metagam(models, type = "response")
+#> Loading required package: Matrix
+#> Loading 'metafor' package (version 2.4-0). For an overview
+#> and introduction to the package please type: help(metafor).
Next, we plot the separate estimates together with the meta-analytic fit. We clearly see that dataset 3, which had a positive shift \(\beta=1 x_{2}^2\), lies above the others for \(x_{2}\) close to 1, and opposite for dataset 5.
-plot(meta_analysis)
+plot(meta_analysis)
The plotting function for the meta-analysis object is a ggplot2-object, and can thus be altered using standard ggplot syntax. To learn more about customization of ggplot2-objects, please see the ggplot2 documentation.
-plot(meta_analysis) + - ggplot2::scale_colour_brewer(palette = "Set1")
+plot(meta_analysis) +
+ ggplot2::scale_colour_brewer(palette = "Set1")
We can investigate this further using a heterogeneity plot, which visualizes Cochran’s Q-test (Cochran (1954)) as a function of \(x_{2}\). By default, the test statistic (Q), with 95 % confidence bands, is plotted. We can see that the confidence band for Q is above 0 for \(x_{2}\) larger than about 0.7.
-plot_heterogeneity(meta_analysis)
+plot_heterogeneity(meta_analysis)
We can also plot the \(p\)-value of Cochran’s Q-test. The dashed line shows the value \(0.05\). The \(p\)-value plot is in full agreement with the Q-statistic plot above: There is evidence that the underlying functions from each dataset are different for values from about 0.7 and above.
-plot_heterogeneity(meta_analysis, type = "p")
+plot_heterogeneity(meta_analysis, type = "p")
Assume some data of interest are located in three different cohorts. In order to increase statistical power and hence be more able to detect relationships in the data, we would ideally fit a GAM to all three datasets combined, using a model on the form y ~ s(x0) + s(x1) + s(x2)
, where y
is an outcome of interest and x1
and x2
are explanatory variables. The smooth functions s()
allow the outcome to vary nonlinearly as a function of each explanatory variable. When all three datasets are not available in a single location, we cannot fit a GAM using this mega-analytic approach. The metagam package provides a flexible solution to this problem, which here will be illustrated.
We start by simulation three datasets using the gamSim()
function from mgcv.
library(metagam) -library(mgcv) ++set.seed(123) +datasets <- lapply(1:3, function(x) gamSim(scale = 3, verbose = FALSE))+library(metagam) +library(mgcv) ## simulate three datasets -set.seed(123) -datasets <- lapply(1:3, function(x) gamSim(scale = 3, verbose = FALSE))
In each data location, we assume a GAM with the generic form y~s(x0)+s(x1)+s(x2)
is fit to the data. Notably, model parameters like knot locations, number of basis functions, and smoothing method does not need to be identical in each separate fit. Instead, the parameters can be optimized independently to fit the data in each location.
Here is an example:
-## Data location 1 -fit1 <- gam(y ~ s(x0, k = 8, bs = "cr") + s(x1, bs = "cr") + s(x2, bs = "cr"), - data = datasets[[1]]) ++fit3 <- gam(y ~ s(x0, bs = "cr") + s(x1, bs = "cr") + s(x2, bs = "cr"), + data = datasets[[3]], method = "ML")+## Data location 1 +fit1 <- gam(y ~ s(x0, k = 8, bs = "cr") + s(x1, bs = "cr") + s(x2, bs = "cr"), + data = datasets[[1]]) ## Data location 2, use P-splines for the first and third term -fit2 <- gam(y ~ s(x0, bs = "ps") + s(x1, k = 20, bs = "cr") + s(x2, bs = "bs"), - data = datasets[[2]]) +fit2 <- gam(y ~ s(x0, bs = "ps") + s(x1, k = 20, bs = "cr") + s(x2, bs = "bs"), + data = datasets[[2]]) ## Data location 3, use maximum likelihood for smoothing -fit3 <- gam(y ~ s(x0, bs = "cr") + s(x1, bs = "cr") + s(x2, bs = "cr"), - data = datasets[[3]], method = "ML")
The gam
objects fit1
, fit2
, and fit3
contain individual participant data in various forms, and hence there are many cases in which these should not be shared. The function strip_rawdata()
from metagam removes all such rawdata. We here illustrate how this function can be applied at each data location in order to obtain a model fit that can be shared.
## Data location 1 -fit_no_raw1 <- strip_rawdata(fit1) ++fit_no_raw3 <- strip_rawdata(fit3)+## Data location 1 +fit_no_raw1 <- strip_rawdata(fit1) ## Data location 2 -fit_no_raw2 <- strip_rawdata(fit2) +fit_no_raw2 <- strip_rawdata(fit2) ## Data location 3 -fit_no_raw3 <- strip_rawdata(fit3)
Now assume that the objects fit_no_raw1
, fit_no_raw2
, and fit_no_raw3
have been gathered in a single location. First, we can inspect each of the objects.
summary(fit_no_raw1) ++#> GCV = 10.013 Scale est. = 9.645 n = 400+summary(fit_no_raw1) #> GAM stripped for individual participant data with strip_rawdata(). #> For meta-analysis of smooth terms, use the following identifiers: s(x0), s(x1), s(x2). #> @@ -154,37 +159,43 @@
#> #> Approximate significance of smooth terms: #> edf Ref.df F p-value -#> s(x0) 3.746 4.565 5.782 8.48e-05 *** +#> s(x0) 3.746 4.565 5.782 9.26e-05 *** #> s(x1) 2.466 3.070 49.388 < 2e-16 *** #> s(x2) 7.493 8.416 33.457 < 2e-16 *** #> --- #> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 #> #> R-sq.(adj) = 0.537 Deviance explained = 55.3% -#> GCV = 10.013 Scale est. = 9.645 n = 400
We can now perform a meta-analysis of these fits using the metagam()
function. We gather them in a list:
models <- list(cohort1 = fit_no_raw1, - cohort2 = fit_no_raw2, - cohort3 = fit_no_raw3)
+models <- list(cohort1 = fit_no_raw1,
+ cohort2 = fit_no_raw2,
+ cohort3 = fit_no_raw3)
It is typically most convenient to analyze a single smooth term at a time. We start with the term s(x0)
, and set grid_size=100
to get 100 equally spaced values of x0
within the range of values encountered in the three model fits. The summary method prints out some information as well as meta-analytic p-values for the term.
metafit <- metagam(models, terms = "s(x0)") -summary(metafit) ++#> |Test |s(x0) | +#> |:---------------------|:---------| +#> |Stouffer's sum of z |1.499e-05 | +#> |Edgington's sum of p |1.176e-04 | +#> |Wilkinson's maximum p |2.721e-04 | +#> |Wilkinson's minimum p |2.777e-04 | +#> |logit p method |2.190e-05 | +#> |Fisher's sum of logs |2.038e-05 |+metafit <- metagam(models, terms = "s(x0)") +#> Loading required package: Matrix +#> Loading 'metafor' package (version 2.4-0). For an overview +#> and introduction to the package please type: help(metafor). +summary(metafit) #> Meta-analysis of GAMs from 3 cohorts, using method FE. #> #> Smooth terms analyzed: s(x0) . #> #> Meta-analytic p-values of smooth terms: #> -#> Test s(x0) -#> ---------------------- ---------- -#> Stouffer's sum of z 1.435e-05 -#> Edgington's sum of p 1.186e-04 -#> Wilkinson's maximum p 2.717e-04 -#> Wilkinson's minimum p 2.545e-04 -#> logit p method 2.070e-05 -#> Fisher's sum of logs 1.906e-05
The default plotting function shows the fits on the separate datasets together with the meta-analytic fit.
-plot(metafit)
+plot(metafit)
Dominance plots and heterogeneity plots can also be created. These are described in separate vignettes.
library(metagam)
This vignette demonstrates how to meta-analyze multivariate smooth terms. In particular, we will focus on tensor interaction terms (Wood (2006)). We start by loading mgcv and generate five example datasets with somewhere between 100 and 1000 observations.
-library(mgcv) ++#> This is mgcv 1.8-33. For overview type 'help("mgcv-package")'. +set.seed(123) +datasets <- lapply(1:5, function(x) gamSim(eg = 2, n = sample(100:1000, 1), + verbose = FALSE)$data)+library(mgcv) #> Loading required package: nlme -#> This is mgcv 1.8-31. For overview type 'help("mgcv-package")'. -set.seed(123) -datasets <- lapply(1:5, function(x) gamSim(eg = 2, n = sample(100:1000, 1), - verbose = FALSE)$data)
Here are the first few rows from the first dataset:
- +y | @@ -160,8 +164,10 @@
---|
method | -Method of meta analysis, passed on to |
+ Method of meta analysis, passed on to |
---|---|---|
intercept | @@ -218,27 +218,30 @@