-
Notifications
You must be signed in to change notification settings - Fork 10
/
Copy pathlec16.Rmd
473 lines (346 loc) · 25.1 KB
/
lec16.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
---
title: "Principal components and factor analysis"
output:
html_document:
fig_caption: no
number_sections: yes
toc: yes
toc_float: false
collapsed: no
---
```{r set-options, echo=FALSE}
options(width = 105)
knitr::opts_chunk$set(dev='png', dpi=300, cache=FALSE)
pdf.options(useDingbats = TRUE)
klippy::klippy(position = c('top', 'right'))
```
```{r load, echo=FALSE, cache=FALSE}
load(".Rdata")
```
<p><span style="color: #00cc00;">NOTE: This page has been revised for Winter 2021, but may undergo further edits.</span></p>
# Introduction #
*Principal components analysis (PCA)* is a widely used multivariate analysis method, the general aim of which is to reveal systematic covariations among a group of variables. The analysis can be motivated in a number of different ways, including (in geographical contexts) finding groups of variables that measure the same underlying dimensions of a data set, describing the basic anomaly patterns that appear in spatial data sets, or producing a general index of the common variation of a set of variables.
- Example: Davis' boxes ([data](https://pjbartlein.github.io/GeogDataAnalysis/images/Davis_PCA_data.png), [plot](https://pjbartlein.github.io/GeogDataAnalysis/images/Davis_PCA_boxes_png), [scatter](https://pjbartlein.github.io/GeogDataAnalysis/images/Davis_PCA_scatter.png), [components](https://pjbartlein.github.io/GeogDataAnalysis/images/Davis_PCA_scores.png)), (Davis, J.C., 2001, *Statistics and Data Analysis in Geology*, Wiley)
# Derivation of principal components and their properties
The formal derivation of principal components analysis requires the use of matix algebra.
- [[Matrix algebra]](https://pjbartlein.github.io/GeogDataAnalysis/topics/matrix.pdf)
- [[Derivation of principal components]](https://pjbartlein.github.io/GeogDataAnalysis/topics/pca.html)
Because the components are derived by solving a particular optimization problem, they naturally have some "built-in" properties that are desirable in practice (e.g. maximum variability). In addition, there are a number of other properties of the components that can be derived:
- *variances* of each component, and the *proportion of the total variance* of the original variables are are given by the eigenvalues;
- component *scores* may be calculated, that illustrate the value of each component at each observation;
- component *loadings* that describe the correlation between each component and each variable may also be obtained;
- the *correlations among the original variables* can be reproduced by the *p*-components, as can that part of the correlations "explained" by the first q components.
- the *original data can be reproduced* by the *p* components, as can those parts of the original data "explained" by the first *q* components;
- the components can be "*rotated*" to increase the interpretability of the components.
[[Back to top]](lec16.html)
# PCA Examples #
## PCA of a two-variable matrix ##
A classic data set for illustrating PCA is one that appears in John C. Davis's 2002 book *Statistics and data analysis in geology*, Wiley (UO Library, QE48.8 .D38 2002). The data consist of 25 boxes or blocks with random dimensions (the long, intermediate and short axes of the boxes), plus some derived variables, like the length of the longest diagonal that can be contained within a box.
Here's the data: [[boxes_csv]](https://pjbartlein.github.io/GeogDataAnalysis/data/csv/boxes_csv)
In this example, a simple two-variable (long-axis length and diagonal length) data set is created using Davis' artificial data.
```{r pca02}
#boxes_pca -- principal components analysis of Davis boxes data
boxes_matrix <- data.matrix(cbind(boxes[,1],boxes[,4]))
dimnames(boxes_matrix) <- list(NULL, cbind("long","diag"))
```
Matrix scatter plot of the data (which in this case is a single panel), and the correlation matrix:
```{r pca03, fig.asp=1}
plot(boxes_matrix)
cor(boxes_matrix)
```
PCA using the `princomp()` function from the `stats` package. The `loadings()` function extracts the *loadings* or the correlations between the input variables and the new components, and the the `biplot()` function creates a *biplot* a single figure that plots the loadings as vectors and the *component scores* as points represented by the observation numbers.
```{r pca04}
boxes_pca <- princomp(boxes_matrix, cor=T)
boxes_pca
summary(boxes_pca)
print(loadings(boxes_pca), cutoff=0.0)
biplot(boxes_pca)
```
Note the angle between the vectors--the correlation between two variables is equal to the cosine of the angle between the vectors (*θ*), or *r = cos(θ)*. Here the angle is `r acos(cor(boxes_matrix[,1],boxes_matrix[,2]))/((2*pi)/360)`, which is found by the following R code: `acos(cor(boxes_matrix[,1],boxes_matrix[,2]))/((2*pi)/360)`.
The components can be drawn on the scatter plot as follows,
```{r pca05}
# get parameters of component lines (after Everitt & Rabe-Hesketh)
load <-boxes_pca$loadings
slope <- load[2,]/load[1,]
mn <- apply(boxes_matrix,2,mean)
intcpt <- mn[2]-(slope*mn[1])
# scatter plot with the two new axes added
par(pty="s") # square plotting frame
xlim <- range(boxes_matrix) # overall min, max
plot(boxes_matrix, xlim=xlim, ylim=xlim, pch=16, col="purple") # both axes same length
abline(intcpt[1],slope[1],lwd=2) # first component solid line
abline(intcpt[2],slope[2],lwd=2,lty=2) # second component dashed
legend("right", legend = c("PC 1", "PC 2"), lty = c(1, 2), lwd = 2, cex = 1)
# projections of points onto PCA 1
y1 <- intcpt[1]+slope[1]*boxes_matrix[,1]
x1 <- (boxes_matrix[,2]-intcpt[1])/slope[1]
y2 <- (y1+boxes_matrix[,2])/2.0
x2 <- (x1+boxes_matrix[,1])/2.0
segments(boxes_matrix[,1],boxes_matrix[,2], x2, y2, lwd=2,col="purple")
```
This plot illustrates the idea of the first (or "principal" component) providing an optimal summary of the data--no other line drawn on this scatter plot would produce a set of projected values of the data points onto the line with less variance. The first component also has an application in reduced major axis (RMA) regression in which both x- and y-variables are assumed to have errors or uncertainties, or where there is no clear distinction between a predictor and a response.
## A second example using the large-cites data set ##
A second example of a simple PCA analysis can be illustrated using the large-cities data set [[cities.csv]](https://pjbartlein.github.io/GeogDataAnalysis/data/csv/cities.csv)
Create a data matrix that omits the city names and look at the data and the correlation matrix.
```{r pca06, fig.asp=1}
cities_matrix <- data.matrix(cities[,2:12])
rownames(cities_matrix) <- cities[,1] # add city names as row labels
```
### Examining the correlation matrix ###
```{r pca06b, fig.asp=1}
plot(cities[,2:12], pch=16, cex=0.6)
cor(cities[,2:12])
```
Matrix scatter plots, particularly those for a data set with a large number of variables are some times difficult to interpret. Two alternative plots are available: 1) a generalized depiction of the correlation matrix using the `corrplot()` function, and 2) a plot of the correlations as a network of links ("edges") between variables ("nodes") provided by the `qgraph()` function in the package of the same name.
The `corrplot()` function displays the correlation matrix using a set of little ellipses that provide a generalized dipiction of the strength and sign of a correlation.
```{r pca07}
library(corrplot)
corrplot(cor(cities[,2:12]), method="ellipse")
```
An alternative is simply fill each cell with an appropriate color and shade.
```{r pca08}
corrplot(cor(cities[,2:12]), method="color")
```
Rectangular areas of similar sign and magnitude of the correlation identify groups of variables that tend to covary together across the observations. For example, the three population variables are postively correlated with one another, and are inversely correlated with `Growth`, `Food` and `PersRoom`.
Another way of depicting the correlations is as a network of line segments, which are drawn to illustrate the strength and sign of the correlations between each pair of variables. This plotting procedure uses the `qgraph` package for plotting networks, defined here by the correlations among variables (and later, among variables and components).
```{r pca09, messages=FALSE}
library(qgraph)
qgraph(cor(cities[,2:12]))
```
Note that in the above plot, the variable names are abbreviated using just three characters. Most of the time this is enough.
A modification of the basic `qgraph()` plot involves arranging the nodes in a way that locates more highly correlated variables closer to one another (a "force-embedded" layout), specified by the `layout="spring"` argument. This approach uses the "Fruchterman-Reingold" algorithm that invokes the concept of spring tension pulling the nodes of the more highly correlated variables toward one another. The sign of the correlations are indicated by color: positive correlations are green, negative magenta (or red).
```{r pca10}
qgraph(cor(cities[,2:12]), layout="spring", shape="rectangle", posCol="darkgreen", negCol="darkmagenta")
```
### PCA of the cities data ###
Here's the principal components analysis of the cities data:
```{r pca11}
cities_pca <- princomp(cities_matrix, cor=T)
cities_pca
summary(cities_pca)
screeplot(cities_pca)
loadings(cities_pca)
biplot(cities_pca, col=c("black","red"), cex=c(0.7,0.8))
```
In this case, there were two "important components" and a third that was pretty important, as evidenced by the break in slope of the "screeplot". The biplot diplays both the *loadings* (correlations between the original variables and the components) as lablelled vectors, and the component *scores* as either symbols, or as here when the matrix has rownames, as labels.
The biplot and table of component loadings indicate that the first component includes variables that (more-or-less) trade off developed-world cities against developing-world ones. (Note the opposing directions of the vectors that are sub-parallel to the x-axis.) The second component (population) is noted by vectors and loadings that are (more-or-less) at right angles to the first set, and sub-parallel to the y-axis. (But note that the vector for `Pop.1980` is actually more parallel to the x-axis than the y.)
An alternative visualization of the principal component and their relationship with the original variables is provided by the `qgraph()` function. The original variables are indicted by three-character abbreviations, and the components by numbered nodes.
To implement this, it would be convenient to create a function that creates a `qgraph` loadings plot. This is done in two stages, the first creating a standard `qplot`, and the second applying the `layout = "spring"` argument. Here is the function:
```{r, eval=FALSE, echo=TRUE}
# define a function for producing qgraphs of loadings
qgraph_loadings_plot <- function(loadings_in, title) {
ld <- loadings(loadings_in)
qg_pca <- qgraph(ld, title=title,
posCol = "darkgreen", negCol = "darkmagenta", arrows = FALSE,
labels=attr(ld, "dimnames")[[1]])
qgraph(qg_pca, title=title,
layout = "spring",
posCol = "darkgreen", negCol = "darkmagenta", arrows = FALSE,
node.height=0.5, node.width=0.5, vTrans=255, edge.width=0.75, label.cex=1.0,
width=7, height=5, normalize=TRUE, edge.width=0.75)
}
```
The function has two arguments, a principle components object, from which loadings can be extracted, plus a title.
The `princomp()` function returns all *p* components, but the intial analysis suggests that only two, or possibly three components are meaningful. To manage the number of components that are "extracted" (as well as to implement other aspects of the analysis), we can use the `psych` package:
```{r}
library(psych)
```
The `qgraph` plot is then created by reextracting only two components in this case, and then sending the object created by te `principal()` function to the `qgraph_loadings_plot()` function we created above.
```{r pca12}
# unrotated components
cities_pca_unrot <- principal(cities_matrix, nfactors=2, rotate="none")
cities_pca_unrot
summary(cities_pca_unrot)
qgraph_loadings_plot(cities_pca_unrot, "Unrotated component loadings")
```
The first plot shows the two components in the middle and the original variables arranged in an ellipse or circle around the components. Positive loadings are shown in green, negative in magenta, with the magnitude of the loadings shown by the width of the network "edges". The second plot uses the "spring" method to rearrange the plot so that the length of the edges and arrangement of components and variables reflects the magnitude of the loadings.
## "Rotation" of principal components ##
The interpretation of the components (which is governed by the loadings--the correlations of the original varialbles with the newly created components) can be enhanced by "rotation" which could be thought of a set of coordinated adjustments of the vectors on a biplot. There is not single optimal way of doing rotations, but probably the most common approach is "varimax" rotation in which the components are adjusted in a way that makes the loadings either high positive (or negative) or zero, while keeping the components uncorrelated or orthogonal. One side-product of rotation is that the first, or principal components is no longer optimal or the most efficient single-variable summary of the data set, but losing that property is often worth the incraese in interpretability. The `principal()` function in the `psych` package also implements rotation of principal components.
Note the location and linkages of `Pop.1980` in the plot.
Here's the result with rotated components:
```{r pca16}
# rotated components
cities_pca_rot <- principal(cities_matrix, nfactors=2, rotate="varimax")
cities_pca_rot
summary(cities_pca_rot)
biplot.psych(cities_pca_rot, labels=rownames(cities_matrix), col=c("black","red"), cex=c(0.7,0.8),
xlim.s=c(-3,3), ylim.s=c(-2,4))
```
... and the `qgraph()` plot of the rotated-component results:
```{r pca17}
qgraph_loadings_plot(cities_pca_rot, "Rotated component loadings")
```
Notice that now all three population variables are now "most highly loaded on" the second component.
[[Back to top]](lec16.html)
# Factor analyis (FA) and PCA #
*Factor analysis (FA)* can be thought of as a parallel analysis to PCA, and in some ways PCA and be viewed as a special case of FA. Despite their names being used indiscriminantly, the two alaysis do have differing underlying models:
- PCA: maximum variance, maximum simultaneous resemblance motivations
- Factor Analysis: variables are assembled from two major components common "factors" and "unique" factors, e.g.
**X** = **m** + **Lf** + **u**, where **X** is a maxrix of data, **m** is the (vector) mean of the variables, **L** is a *p* x *k* matrix of factor loadings **f** and **u** are random vectors representing the underlying common and unique factors.
The model underlying factor analysis is:
*data = common factors + unique factors*
The common factors in factor analysis are much like the first few principal components, and are often defined that way in initial phases of the analysis.
The practical difference between the two analyses now lies mainly in the decision whether to rotate the principal components to emphasize the "simple structure" of the component loadings:
- easier interpretation
- in geographical data: regionalization
## Example of a factor analysis ##
Here's a factor analysis of the large-cities data set using the `factanal()` function:
```{r fa02}
# cities_fa1 -- factor analysis of cities data -- no rotation
cities_fa1 <- factanal(cities_matrix, factors=2, rotation="none", scores="regression")
cities_fa1
biplot(cities_fa1$scores[,1:2], loadings(cities_fa1), cex=c(0.7,0.8))
```
Notice that the biplot looks much the same as that for PCA (as to the loadings, which have the same interpretation--as correlations between the factors and the original variables). A new element of the factor analysis output is the "Uniquenesses" table, which, as it says, describes the uniqueness of individual variables, where values near 1.0 indicate variables that are tending to measure unique properties in the data set, while values near 0.0 indicate variables that are duplicated in a sense by other variables in the data set.
Here's the `qgraph` plot:
```{r fa03}
qgraph_loadings_plot(cities_fa1, "Factor analysis")
```
Note the "external" line segments that are scaled to the uniqueness values of each variable, and represent sources of variability extraneous to (or outside of) that generated by the factors.
Here is a "rotated" factor analysis:
```{r fa04}
# cities_fa2 -- factor analysis of cities data -- varimax rotation
cities_fa2 <- factanal(cities_matrix, factors=2, rotation="varimax", scores="regression")
cities_fa2
biplot(cities_fa2$scores[,1:2], loadings(cities_fa2), cex=c(0.7,0.8))
```
... and the `qgraph` plot:
```{r fa05}
library(qgraph)
qgraph_loadings_plot(cities_fa2, "Factor analysis -- rotated factors")
```
[[Back to top]](lec16.html)
# A second example: Controls of mid-Holocene aridity in Eurasia #
The data set here is a "stacked" data set of output from thirteen "PMIP3" simulations of mid-Holocene climate (in particular, the long-term mean differences between the mid-Holocene simulations and those for the pre-industrial period) for a region in northern Eurasia. The objective of the analysis was to examine the systematic relationship among a number of different climate variables as part of understanding the mismatch between the simulations and paleoenvironmental observations (reconstructions), where the simulations were in general drier and warmer than the climate reconstructed from the data. (See Bartlein, P.J., S.P. Harrison and K. Izumi, 2017, Underlying causes of Eurasian mid-continental aridity in simulations of mid-Holocene climate, *Geophysical Research Letters* 44:1-9, [http://dx.doi.org/10.1002/2017GL074476](http://dx.doi.org/10.1002/2017GL074476))
The variables include:
- `Kext`: Insolation at the top of the atmosphere
- `Kn`: Net shortwave radiation
- `Ln`: Net longwave ratiation
- `Qn`: Net radiation
- `Qe`: Latent heating
- `Qh`: Sensible heating
- `Qg`: Substrate heating
- `bowen`: Bowen ratio (`Qh/Qn`)
- `alb`: Albedo
- `ta`: 850 mb temperature
- `tas`: Near-surface (2 m) air temperature
- `ts`: Surface (skin) temperature
- `ua`: Eastward wind component, 500 hPa level
- `va`: Northward wind component, 500 hPa level
- `omg`: 500 hPa vertical velocity
- `uas`: Eastward wind component, surface
- `vas`: Northward wind component, surface
- `slp`: Mean sea-level pressure
- `Qdv`: Moisture divergence
- `shm`: Specific humidity
- `clt`: Cloudiness
- `pre`: Precipitation rate
- `evap`: Evaporation rate
- `pme`: P-E rate
- `sm`: Soil moisture
- `ro`: Runoff
- `dS`: Change in moisture storage
- `snd`: Snow depth
The data set was assebled by stacking the monthly long-term mean differneces from each model on top of one another, creating a 13 x 12 row by 24 column array. This arrangement of the data will reveal the common variations in the seasonal cycles of the long-term mean differences.
## Read and transform the data ##
Load necessary packages.
```{r, message=FALSE}
# pca of multimodel means
library(corrplot)
library(qgraph)
library(psych)
```
Read the data:
```{r eval=TRUE, echo=FALSE}
# read data
datapath <- "/Users/bartlein/Documents/geog495/data/csv/"
csvfile <- "aavesModels_ltmdiff_NEurAsia.csv"
input_data <- read.csv(paste(datapath, csvfile, sep=""))
mm_data_in <- input_data[,3:30]
names(mm_data_in)
summary(mm_data_in)
```
```{r read RF, eval=FALSE, echo=TRUE}
# read data
datapath <- "/Users/bartlein/Documents/geog495/data/csv/"
csvfile <- "aavesModels_ltmdiff_NEurAsia.csv"
input_data <- read.csv(paste(datapath, csvfile, sep=""))
mm_data_in <- input_data[,3:30]
names(mm_data_in)
summary(mm_data_in)
```
There are a few fix-ups to do. Recode a few missing (NA) values of snow depth to 0.0:
```{r}
# recode a few NA's to zero
mm_data_in$snd[is.na(mm_data_in$snd)] <- 0.0
```
Remove some uncessary or redundant variable:
```{r}
# remove uneccesary variables
dropvars <- names(mm_data_in) %in% c("Kext","bowen","sm","evap")
mm_data <- mm_data_in[!dropvars]
names(mm_data)
```
## Correlations among variables ##
It's useful to look at the correlations among the long-term mean differences among the variables. This could be done using a matrix scatterplot (`plot(mm_data, pch=16, cex=0.5`), but there are enough variables (24) to make that difficult to interpret. Another approach is to look at a `corrplot()` image:
```{r}
cor_mm_data <- cor(mm_data)
corrplot(cor_mm_data, method="color")
```
The plaid-like appearance of the plot suggests that there are several groups of variables whose variations of long-term mean differences throughout the year are similar.
The correlations can also be illustrated by plotting the correlations as a network graph using the `qgraph()` function, with the strength of the correlations indicated by the width of the lines (or "edges"), and the sign by the color (green = positive and magenta = negative).
```{r}
qgraph(cor_mm_data, title="Correlations, multi-model ltm differences over months",
# layout = "spring",
posCol = "darkgreen", negCol = "darkmagenta", arrows = FALSE,
node.height=0.5, node.width=0.5, vTrans=255, edge.width=0.75, label.cex=1.0,
width=7, height=5, normalize=TRUE, edge.width=0.75 )
```
A useful variant of this plot is provided by using the strength of the correlations to arrange the nodes (i.e. the variables). This is done by using the "Fruchterman-Reingold" algorithm that invokes the concept of spring tension pulling the nodes of the more highly correlated variables toward one another.
```{r}
qgraph(cor_mm_data, title="Correlations, multi-model ltm differences over months",
layout = "spring", repulsion = 0.75,
posCol = "darkgreen", negCol = "darkmagenta", arrows = FALSE,
node.height=0.5, node.width=0.5, vTrans=255, edge.width=0.75, label.cex=1.0,
width=7, height=5, normalize=TRUE, edge.width=0.75 )
```
We can already see that there are groups of variables that are highly correlated with one another. For example the three temperatures variables (`tas` (2-m air temperature), `ts` (surface or "skin" temperature), `ta` (850mb temperature)), are plotted close to one another, as are some of the surface energy-balance variables (e.g. `Qn` and `Kn`, net radiation and net shortwave radiation).
## PCA of the PMIP 3 data
Do a principal components analysis of the long-term mean differneces using the `principal()` function from the `psych` package. Initially, extract eight components:
```{r}
nfactors <- 8
mm_pca_unrot <- principal(mm_data, nfactors = nfactors, rotate = "none")
mm_pca_unrot
```
The analysis suggests that only five components are as "important" as any of the original (standardized) variables, so repeate the analysis extracting just five components:
```{r}
nfactors <- 5
mm_pca_unrot <- principal(mm_data, nfactors = nfactors, rotate = "none")
mm_pca_unrot
loadings(mm_pca_unrot)
```
This result could also be confirmed by looking at the "scree plot":
```{r}
scree(mm_data, fa = FALSE, pc = TRUE)
```
## qgraph plot of the principal components ##
The first plot below shows the components as square nodes, and the orignal variables as circular nodes. The second modifies that first plot by applying the "spring" layout.
```{r}
qgraph_loadings_plot(mm_pca_unrot, title="Unrotated component loadings, all models over all months")
```
This result suggests (as also does the scree plot) that there is one very important (i.e. "principal" component) that is highly correlated with (or "loads heavily on") a number of variables, with subsequent components correlated with mainly with a single variable.
## Rotated components ##
The interpretability of the componets can often be improved by "rotation" of the components, which amounts to slightly moving the PCA axes relative to the original variable axes, while still maintaining the orthogonality (or "uncorrelatedness") of the components. This has the effect of reducing the importance of the first component(s) (because the adjusted axes are no longer optimal), but this trade-off is usually worth it.
Here are the results:
```{r}
nfactors <- 4
mm_pca_rot <- principal(mm_data, nfactors = nfactors, rotate = "varimax")
mm_pca_rot
qgraph_loadings_plot(mm_pca_rot, "Rotated component loadings, all models over all months")
```
[[Back to top]](lec16.html)
# Readings #
- Chapter 25, Multivariate Statistics, in Crawley, M.J. (2013) *The R Book*, Wiley. To get to the book, visit [http://library.uoregon.edu](http://library.uoregon.edu), login if necessary, and search for the 2013 edition of the book. Here's a direct link, once you're logged on: [[Link]](http://onlinelibrary.wiley.com/book/10.1002/9781118448908)
- Everitt and Hothorn (2011) *An Introduction to Applied Multivariate Analysis with R*, ch. 3 and 5. To get to the book, visit [http://library.uoregon.edu](http://library.uoregon.edu), search for the book, and login if necessary. Here's a direct link, once you're logged on: [[Link]](https://na01.alma.exlibrisgroup.com/view/action/uresolver.do?operation=resolveService&package_service_id=23665350240001852&institutionId=1852&customerId=1840)
- Peng (2016) *Exploratory Data Analysis with R*, ch. 13. See readings tab for a link to the e-book.
[[Back to top]](lec16.html)