-
Notifications
You must be signed in to change notification settings - Fork 0
/
HW.Rmd
502 lines (379 loc) · 21.1 KB
/
HW.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
---
title: "Code practice. From t-tests to data modeling"
output:
html_document:
theme: lumen
pdf_document: default
highlight: tango
---
Linguistic data: Quantitative analysis and vizualization
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Universal linguistic hierarchies: a case of Modern Greek (Standard and Cypriot dialects)
Data ([responces](https://raw.githubusercontent.com/LingData2019/LingData2020/master/data/greek-word-order-mono-acceptability-coded-rt.txt), [quesionnaire](https://raw.githubusercontent.com/LingData2019/LingData2020/master/data/greek-word-order-mono_socio.txt)) adapted from the survey:
Leivada, Evelina; Westergaard, Marit, 2019, [Universal linguistic hierarchies are not innately wired](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6679903/#fn-1). PeerJ, v.7.
Source of data: TROLLing repository:
Leivada, Evelina; Westergaard, Marit, 2019, "Replication Data for: Universal linguistic hierarchies are not innately wired", https://doi.org/10.18710/NTLLUF, DataverseNO, V1
#### Disclaimer
Tables and figures produced by your code can look slightly different from what you see in the article.
This concerns absolute numbers, the size of columns, the color of lines, themes, etc. Still, reproduce the visualization type (e.g. barplot, violing plot) and the order of elements/groups
just as it was plotted by Leivada and Westergaard.
#### Constructions with two adjectives
In English, the order of two adjectives in phrases like:
```
a big black bag # ok
*a black big bag # unacceptable, ungrammatically ill-formed, or semantically anomalous
```
is powered by the semantic class of adjective (e.g. the `color` adjective closer to the noun than the `size` adjective).
A syntactic hierarchy of closeness to the noun in Chomsky's Universal Grammar
suggests the following order and is claimed to be innate and universal (= valid for all languages).
```
Subjective Comment > Evidential > Size > Length
> Height > Speed > Depth > Width > Temperature > Wetness > Age
> Shape > Color > Nationality/Origin > Material
# (adapted from Scott, 2002: 114)
```
The goal of Leivada & Westergaard research is identify what happens when people process orderings that either comply with the hierrarchy or violate it.
#### Method
In the first experiment, 140 neurotypical, adult speakers completed a timed forced choice task that featured stimuli showing a combination of two adjectives and a concrete noun (e.g., *I bought a square black table*). Two types of responses were collected:
(i) acceptability judgments on a 3-point Likert scale that featured the options
1. wrong,
2. neither correct nor wrong,
3. correct;
(ii) reaction times (RT).
The task featured three conditions: 1. size adjective > nationality adjective, 2. color adjective > shape adjective, 3. subjective comment adjective > material adjective. Each condition had two orders. In the congruent order, the adjective pair was ordered in agreement with what is traditionally accepted as dictated by the universal hierarchy. In the incongruent order, the ordering was reversed, thus the hierarchy was violated.
In the second experiment, 30 bidialectals (native speakers of Standard and Cypriot Greek) were tested in both language varieties, 36 observations per participant, 18 for each variety.
Two kinds of [fillers](https://www.hlp.rochester.edu/resources/BCS152-Tutorial/Fillers.html) were used in both experiments, FillerAcceptable and FillerUnacceptable -- sentences that included well-formed and ungrammatical structures, respectively. In both tasks the ratio of fillers to actual test structures was 2:1.
#### Data
```{r}
library(tidyverse)
mono_socio <- read_csv2("https://raw.githubusercontent.com/LingData2019/LingData2020/master/data/greek-word-order-mono_socio.txt")
mono <- read_csv2("https://raw.githubusercontent.com/LingData2019/LingData2020/master/data/greek-word-order-mono-acceptability-coded-rt.txt")
```
see also [reading key for the data](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6679903/bin/peerj-07-7438-s001.txt)
```{r}
mono_socio
```
## 1. Data overview
### 1.1
Use `mono_socio` dataframe to answer the following questions:
1. How many participants are mentioned in this dataframe?
2. How many of them are males and females?
3. Which education levels are mentioned in the dataframe?
4. How many participants of each education levels are present?
5. How many left- and right-randed participants are present?
The following functions from tidyverse can be usefult for this problem: `filter`, `group_by`, `count` and `distinct`. (Another approach is to use `pivot_wider`.)
```{r}
mono_socio %>%
pull(ParticipantID) %>%
n_distinct()
```
```{r}
mono_socio %>%
filter(QuestionCategory == "sex") %>%
group_by(Responce) %>%
summarise(n = n())
```
```{r}
mono_socio %>%
filter(QuestionCategory == "education") %>%
group_by(Responce) %>%
summarise(n = n())
```
```{r}
mono_socio %>%
filter(QuestionCategory == "handedness") %>%
group_by(Responce) %>%
summarise(n = n())
```
Compare you overview with that reported in Table 1 of the article. Sometimes replication data provided by authors does not allow one to reproduce their results.
**The dataset mono_socio corresponds to the contents of the table 2. The figures are given correctly, except for the education level, as the types of values are different in the table and in the dataset, while the number of college students does not correspond to the number of participants with the secondary education type in the table (3 against 6).**
Let's look at another dataframe, `mono`, that contains results of experiment 1.
```{r}
mono
```
### 1.2
Create a plot that shows the RT distribution in experiment 1 (all participants and conditions taken together). What kind of plot would you choose? Use ggplot() for this problem.
```{r}
library(ggplot2)
library(ggpubr)
ggline(mono, x = "TypeOfStimuli", y = "RT", color = "WordOrder",
add = c("mean_se")) +
theme(axis.text.x = element_text(size = 8))
```
```{r}
mono %>%
ggplot(aes(RT)) +
geom_histogram(binwidth = 80)
```
Can we say that RT approximately follows normal distribution? Which features of RT distribution contradicts this assumption? (E.g. long left tail, long right tail, outliers, skewness, etc.)
**Outliers on the right + the histogram is skewed to the left + the distribution is not centered around the median.**
### 1.3
Normalise data applying the logarithm with base 10 (RTlog = log10(RT)). Use `mutate`.
```{r}
mono <- mono %>%
mutate(RTlog = log10(RT))
```
### 1.4
Create a density plot that shows the RTlog distribution.
```{r}
ggplot(mono) +
geom_density(aes(x = RTlog)) +
geom_rug(aes(x = RTlog, y = 0), position = position_jitter(height = 0))
```
Can we say that RTlog approximately follows normal distribution? What features of RTlog distribution contradicts this assumption? (E.g. long left tail, long right tail, outliers, skewness, etc.)
**The distribution is slightly skewed to the left**
### 1.5
Give a summary of `RTlog` distribution (min, max, mean, median, standard deviation)
```{r}
# hint: sd()
mono %>%
summarise(minRT = min(RTlog),
maxRT = max(RTlog),
meanRT = mean(RTlog),
medianRT = median(RTlog),
sdRT = sd(RTlog))
```
### 1.6
To filter out outliers, remove from the table the following observations:
* responses RT of which is below 600 ms (i.e., when a button is pressed too fast, without allowing enough time for actual consideration of the presented stimuli)
* responses RTlog of which deviates from the mean value of RTlog for more than 3 standard deviations
* fillers (both acceptable and unacceptable)
Convert relevant variables to factors and save fitered data as `mono1`.
```{r}
sdRT <- mono %>%
summarise(sdRT = sd(RTlog)) %>%
pull(sdRT)
meanRT <- mono %>%
summarise(mrt = mean(RTlog)) %>%
pull(mrt)
tripleSD <- sdRT * 3
mono1 <- mono %>%
filter(RT > 600, RTlog < meanRT + tripleSD, RTlog > meanRT - tripleSD) %>%
filter(startsWith(TypeOfStimuli, "Filler") != TRUE)
mono1 <- mono1 %>%
select(ParticipantID, TypeOfStimuli, WordOrder, AcceptabilityJ = ResponseAcceptabilityJudgement, RTlog) %>%
mutate(ParticipantID = as.factor(ParticipantID),
TypeOfStimuli = as.factor(TypeOfStimuli),
WordOrder = as.factor(WordOrder),
AcceptabilityJ = as.factor(AcceptabilityJ))
```
### 1.7
Calculate the number of observations in `mono1`.
```{r}
mono1 %>%
summarise(n_observations = n())
```
### 1.8
Reproduce Figure 1 from the article using `ggplot`.
Hint: You can make a summary and use `geom_col()` (see example [here](https://r-graphics.org/recipe-colors-mapping)).
Use either facet_wrap or facet_grid to make six plots.
Note that we figures created in 1.8-1.10 may look different from what plotted in the article.
```{r}
library(grid)
library(gridExtra)
mono1 %>%
ggplot(aes(x = TypeOfStimuli, fill = AcceptabilityJ))+
geom_bar(stat = "count", position=position_dodge()) +
facet_grid(. ~ WordOrder) +
theme(axis.text.x = element_text(size = 6))
```
### 1.9
Reproduce Figure 2 from the article using ggplot.
```{r}
mono1 %>%
ggplot(aes(x = WordOrder, fill = AcceptabilityJ))+
geom_bar(stat = "count", position=position_dodge())
```
### 1.10
Reproduce Figure 7 from the article using ggplot.
```{r}
mono1 %>%
ggplot(aes(x = AcceptabilityJ, y = RTlog, fill = WordOrder))+
geom_violin(trim=FALSE, position = position_dodge(width = 0.9)) +
geom_boxplot(width = 0.2, position = position_dodge(width = 0.9))
```
### 1.11
For the same data, draw a lineplot for group means and standard errors using `ggline()`:
```{r}
mono1 %>%
group_by(AcceptabilityJ, WordOrder) %>%
summarise(st = sd(RTlog), mean=mean(RTlog)) %>%
ggline(., x = "AcceptabilityJ", y = "mean", color = "WordOrder", add = c("st")) +
theme(axis.text.x = element_text(size = 8))
```
## 2. Difference in reaction time
Let us test are there any difference in the reaction time between congruent and incongruent orders. Reaction time is a numeric variable so we can use t-test to compare means. One option is to use two-sample t-test. However, as we have data for congruent and incongruent orders for *the same participants*, it is better to use *paired t-test* here. In paired t-test, for each participant, we will find difference of their reaction time in congruent and incongruent orders, and compare these differences with 0 using 1-sample t-test. To make sure that our data satisfy assumptions of t-test (values that we compare are independent samples from some approximately normal distributions), we will find mean logarithm of reaction time for each participant (across ovservations in all conditions), and consider them as our new sample.
### 2.1 Summarising
Use `group_by` and `summarise` to find mean logarithm of reaction time for each participant and each word order. Put this dataframe to `mean_rtlog_long` variable. It should be like
```
# A tibble: 280 x 3
ParticipantID WordOrder RTlog
<fct> <fct> <dbl>
1 00e0b159cf5b9abcc73b92506d8b1c38 Congruent 3.24
2 00e0b159cf5b9abcc73b92506d8b1c38 Incongruent 3.47
3 021a49cde484f8fa18439f026ec99459 Congruent 3.22
4 021a49cde484f8fa18439f026ec99459 Incongruent 3.21
...
```
```{r}
mean_rtlog_long <- mono1 %>%
group_by(ParticipantID, WordOrder) %>%
summarise(RTlog = mean(RTlog))
```
### 2.2. Pivoting
Use `pivot_wider` to spread values of `RTlog` in `mean_rtlog_long` into two columns: `Congruent` and `Incongruent`. Put new dataframe in variable `mean_rtlog`. It should look like
```
# A tibble: 140 x 3
ParticipantID Congruent Incongruent
<fct> <dbl> <dbl>
1 00e0b159cf5b9abcc73b92506d8b1c38 3.24 3.47
2 021a49cde484f8fa18439f026ec99459 3.22 3.21
3 02810ff2a65eae2b3e54ac57d906309d 3.46 3.36
```
```{r}
mean_rtlog <- mean_rtlog_long %>%
pivot_wider(names_from = WordOrder, values_from = RTlog)
```
### 2.3. Two-sample t-test
Let us try to apply two-sample t-test to our data. Consider values in columns `Congruent` and `Incongruent` as two independent samples. Our null hypothesis is that these two samples are from populations with equal means. Alternative hypothesis: population mean for incongruate word order is larger (people need more time to ’parse’ it). Use `t.test` function to perform a test. Don't forget to specify `alternative`.
```{r}
t.test(mean_rtlog$Incongruent, mean_rtlog$Congruent, alternative = 'greater')
```
Would you reject null hypothesis (under 5% significance level) according to this test?
**Definitely not**
What claim about logarithms of reaction time for Congruent and Incongruent stimuli can you make according to this test?
**The difference of means is insignificant, which means that the values are distributed equally in both groups, given that both distributions are normal.**
### 2.4. Paired t-test: manually
To use paired t-test, let us find difference between logarithms of reaction time for each participant. Use `mutate` and add variable `diff` with aforementioned meaning to dataframe `mean_rtlog`. Save result as `mean_rtlog` again. Then compare mean of `diff` with 0 using 1-sample t-test. (Use appropriate alternative.)
```{r}
mean_rtlog <- mean_rtlog %>%
mutate(diff = Incongruent - Congruent)
rows <- nrow(mean_rtlog)
t.test(mean_rtlog$diff, rep(0, rows), alternative="greater")
```
Whould you reject null hypothesis?
**Absolutely**
What claim about logarithms of reaction time for Congruent and Incongruent stimuli can you make now?
**The values in the Incongruent group are generally higher, assuming that the data is normally distributed in both groups.**
How can you interpret difference with the result of 2.3?
**Resorting to comparing the column difference to zeros limits the variance, which influences the independent sample t.test**
#### 2.5. Paired t-test out of the box
In fact, we can avoid manual calculation of difference and perform paired t-test using `t.test` function with parameter `paired = True`. Apply this function to your data and make sure you get the same result as in 2.4.
```{r}
t.test(mean_rtlog$Incongruent, mean_rtlog$Congruent, alternative = 'greater', paired=TRUE)
```
## 3. Difference between conditions
Now we will consider reaction time for Incongruent word ordering only. Let us check are there any statistically significant difference in logarithm of reaction time for different conditions (types of stimuli).
### 3.1 Data preparation
Filter only observation with `Incongruent` word order, then find average logarithm of reaction time for each participant and each type of stimuli. Save new dataframe as `incong_rtlog` variable. It should look like the following table:
```
# A tibble: 420 x 3
ParticipantID TypeOfStimuli RTlog
<fct> <fct> <dbl>
1 00e0b159cf5b9abcc73b92506d8b1c38 Shape-Color 3.34
2 00e0b159cf5b9abcc73b92506d8b1c38 Size-Nationality 3.20
3 00e0b159cf5b9abcc73b92506d8b1c38 SubjectiveComment-Material 3.19
4 021a49cde484f8fa18439f026ec99459 Shape-Color 3.20
```
```{r}
incong_rtlog <- mono1 %>%
filter(WordOrder == "Incongruent") %>%
group_by(ParticipantID, TypeOfStimuli) %>%
summarise(RTlog = mean(RTlog))
```
### 3.2 Statistical testing
Use appropriate statistical test to answer the following question: are there any statistically significant difference in logarithm of reaction time for different conditions (types of stimuli)? Choose the test and provide justification for your choice. Provide your code, results and interpretation. What is your answer to the question?
```{r}
summary(aov_model <- aov(RTlog ~ TypeOfStimuli, incong_rtlog))
```
** A p-value of 0.0125 shows that some statistically significant difference between the groups migth be present**
```{r}
pairwise.t.test(incong_rtlog$RTlog, incong_rtlog$TypeOfStimuli, p.adjust.method = "BH")
```
**A pairwise t-test with adjusted p shows that the differences in pairs 'Size-Nationality + SubjectiveComment-Material' and 'Shape-Color + SubjectiveComment-Material' is statistically significant, given the alpha of 0.05. This fact can be interpreted as an indication, that different mistakes are perceived in different manners, e.g. some of them delay the reaction more than the other do. **
### 3.3 Post-hoc analysis: which differences are significant?
If we compare means for several (more than two) groups and reject null hypothesis that corresponding population means are equal to each other, the next natural question is to find all pairs of groups which difference is statistically significant. As we discussed at the lecture, pairwise t-tests cannot be used here without appropriate corrections. Instead, one can use Tukey Honest Significant Differences. It reports adjusted confidence intervals for differences between group means for each pair of groups as well as p-values for null hypothesis ’difference is equal to zero’.
Apply `TukeyHSD` to the result of 3.2.
```{r}
TukeyHSD(aov_model)
```
Interpret the results of your analysis in 3.2 and 3.3 here. Do not forget to report p-values obtained. Report which pair of conditions has statistically significant difference between logarithms of reaction time.
```
The pair "SubjectiveComment-Material-Shape-Color" has a p-value of 0.0151255; the pair "SubjectiveComment-Material-Size-Nationality" has a value of 0.0581629. Like the pairwise t-test, the TukeyHSD shows the statistical significance of difference inside the named group pairs.
```
### 4. Multivariate linear regression
#### 4.1
Using the `mono1` data, fit and compare two models that predict RTlog:
* using Acceptability Judgements as predictor, and
* using Acceptability Judgements and TypeOfStimuli as predictors
```{r}
summary(glm(RTlog ~ AcceptabilityJ, data = mono1))
```
```{r}
summary(glm(RTlog ~ AcceptabilityJ + TypeOfStimuli, data = mono1))
```
**Both models report that values "Neither" and "Wrong" belonging to the parameter "AcceptabilityJ" are significant. This state does not change with the introduction of additional factors. The second model shows that combinations "SubjectiveComment-Material" and "Size-Nationality" are correlated with the response time and are likely to determine it. **
#### 4.2
Add interaction of two predictors in the model.
```{r}
summary(glm(RTlog ~ AcceptabilityJ + TypeOfStimuli + AcceptabilityJ:TypeOfStimuli, data = mono1))
```
#### 4.3
Which of the models fits data the best?
```
The model with interaction does not add any variables that strongly influence the prediction, judging by the p-values. This means that the combinations of the two factors are redundant and can be ignored. This means that the second model is the most appropriate one.
```
### 5. Binary classification
#### 5.1
It can happen that the some parts of data are not provided by the authors. Let us assume that WordOrder is a variable one want to predict (Data: mono1).
Suggest at least one type of models to predict this dependent variable. Run the code and find the minimal optimal model (model with predictors that show the statistical significance).
```{r}
library(party)
fit <- glm(WordOrder ~ AcceptabilityJ + TypeOfStimuli + RTlog, mono1, family = "binomial")
summary(fit)
```
#### 5.2
Interpret the summary of this model. Write down your conclusions.
```
The previous tests showed that the acceptability judgement correlates strongly with the reaction time. As the word order directly determines the reaction time, the acceptability values can be used to predict this factor as well. As was the case with the reaction time prediction, the values "Neither" and "Wrong" are the most suitable ones. On the interpretation level this fact may show, that the acceptability judgement is a derivative of the correct or incorrect word order, much like the reaction time.
```
### R code cookbook
**Violin plot**
-- is similar to box plots, except that they also show the approximation of probability density of the data at different values. Typically, violin plots will include a marker for the median of the data and a box indicating the interquartile range, as in standard box plots.
```{r}
ToothGrowth$dose <- as.factor(ToothGrowth$dose)
p <- ggplot(ToothGrowth, aes(x=dose, y=len)) +
geom_violin()
p + stat_summary(fun.y=mean, geom="point", shape=23, size=2)
p + stat_summary(fun.y=median, geom="point", size=2, color="red")
p + geom_boxplot(width=0.1)
# violin plot with dot plot
p + geom_dotplot(binaxis='y', stackdir='center', dotsize=1)
# violin plot with jittered points
# 0.2 : degree of jitter in x direction
p + geom_jitter(shape=16, position=position_jitter(0.2))
```
**Line plot**
You can create a line plot of mean +/- error using the function ggline()[in ggpubr].
Basic line plots of means +/- se with jittered points:
```{r}
library(ggpubr)
ggline(diamonds[1:300,], x = "color", y = "price",
add = c("mean_se", "jitter"))
```
Split and color by group:
```{r}
ggline(diamonds, x = "color", y = "price",
add = "mean_se",
color = "cut", palette = "jco")
diamonds
```
**Using fill_palette**
```{r}
ggplot(iris, aes(Species, Sepal.Length))+
geom_boxplot(aes(fill = Species), color = "white", alpha = 0.5)+
fill_palette("jco")
```
See more examples of ggpubr plots: [here](https://rpkgs.datanovia.com/ggpubr/index.html).