diff --git "a/docs/2024-11/\344\270\255\344\273\213\345\217\230\351\207\217\346\210\226\350\200\205\345\233\240\345\217\230\351\207\217\346\230\257\345\233\240\345\255\220\345\217\230\351\207\217\347\232\204\344\270\255\344\273\213\345\210\206\346\236\220.md" "b/docs/2024-11/\344\270\255\344\273\213\345\217\230\351\207\217\346\210\226\350\200\205\345\233\240\345\217\230\351\207\217\346\230\257\345\233\240\345\255\220\345\217\230\351\207\217\347\232\204\344\270\255\344\273\213\345\210\206\346\236\220.md" deleted file mode 100644 index de149a7a..00000000 --- "a/docs/2024-11/\344\270\255\344\273\213\345\217\230\351\207\217\346\210\226\350\200\205\345\233\240\345\217\230\351\207\217\346\230\257\345\233\240\345\255\220\345\217\230\351\207\217\347\232\204\344\270\255\344\273\213\345\210\206\346\236\220.md" +++ /dev/null @@ -1,15 +0,0 @@ ---- -title: "中介变量或者因变量是因子变量的中介分析" -date: 2024-11-04T00:47:46Z -draft: ["false"] -tags: [ - "fetched", - "R 学习践行者" -] -categories: ["Acdemic"] ---- -中介变量或者因变量是因子变量的中介分析 by R 学习践行者 ------- -
在中介变量或因变量为类别变量的中介分析中,需要使用 Logistic 回归取代通常的线性回归;此时因为a、b、c’系数存在尺度上的问题所以需要进行系数转换,使得回归系数变成具有相同的尺度。
下面使用示例展示这种类型的中介分析过程,示例中自变量是多分类变量,中介变量是连续变量,因变量是二分类变量。
library(mediation)
data(jobs)
我们仍然depress1作为自变量,econ_hard作为中介变量,depress2作为因变量;同时把depress1和depress2转换成因子变量:
hist(jobs$depress2)
jobs$depress1<-ifelse(
jobs$depress1<1.5,"d1",
ifelse(jobs$depress1>=1.5 & jobs$depress1<2,"d2",
ifelse(jobs$depress1>=2 & jobs$depress1<2.5,
"d3","d4")))
jobs$depress1<-factor(jobs$depress1,levels = c("d1","d2","d3","d4"))
jobs$depress2<-ifelse(jobs$depress2>2,1,0)
table(jobs$depress2)
0 1
655 244
# 转换多类别变量 X 为虚拟变量
X_dummy <- model.matrix(~ depress1 - 1, data = jobs) # 创建虚拟变量矩阵
jobs<- cbind(jobs, X_dummy) # 合并虚拟变量至原数据集中
1.拟合模型
第一个模型:
model.0 <- glm(depress2 ~ depress1d2+depress1d3+depress1d4,
data=jobs,
family = binomial(link = "logit"))
summary(model.0)
Call:
glm(formula = depress2 ~ depress1d2 + depress1d3 + depress1d4,
family = binomial(link = "logit"), data = jobs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0534 0.1822 -11.272 < 2e-16 ***
depress1d2 0.7300 0.2531 2.884 0.00393 **
depress1d3 1.4504 0.2258 6.424 1.32e-10 ***
depress1d4 2.1163 0.2416 8.760 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1051.2 on 898 degrees of freedom
Residual deviance: 952.3 on 895 degrees of freedom
AIC: 960.3
Number of Fisher Scoring iterations: 4
第二个模型:
model.1 <- lm(econ_hard~ depress1d2 + depress1d3 + depress1d4,
data=jobs)
summary(model.1)
Call:
lm(formula = econ_hard ~ depress1d2 + depress1d3 + depress1d4,
data = jobs)
Residuals:
Min 1Q Median 3Q Max
-2.35006 -0.66874 -0.00874 0.66126 2.38124
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.61876 0.05410 48.410 < 2e-16 ***
depress1d2 0.36431 0.08610 4.231 2.56e-05 ***
depress1d3 0.71998 0.08052 8.942 < 2e-16 ***
depress1d4 0.73130 0.09181 7.965 4.99e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.9354 on 895 degrees of freedom
Multiple R-squared: 0.1038, Adjusted R-squared: 0.1008
F-statistic: 34.55 on 3 and 895 DF, p-value: < 2.2e-16
第三个模型:
model.2 <- glm(depress2~ econ_hard+depress1d2 +
depress1d3 + depress1d4,
data=jobs,
family = binomial(link = "logit"))
summary(model.2)
Call:
glm(formula = depress2 ~ econ_hard + depress1d2 + depress1d3 +
depress1d4, family = binomial(link = "logit"), data = jobs)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.63074 0.29810 -8.825 < 2e-16 ***
econ_hard 0.21455 0.08527 2.516 0.0119 *
depress1d2 0.65628 0.25527 2.571 0.0101 *
depress1d3 1.30567 0.23268 5.612 2.01e-08 ***
depress1d4 1.97533 0.24750 7.981 1.45e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 1051.22 on 898 degrees of freedom
Residual deviance: 945.92 on 894 degrees of freedom
AIC: 955.92
Number of Fisher Scoring iterations: 4
根据输出的结果我们可以绘制出一下这张图:
我们可以看到未进行系数转换,总效应与直接效应+间接效应之和差异较大。
2.回归系数标准
#对自变量b2水平的b、c'、c系数标准化
b2_var<-var(jobs$depress1d2)
b2<-0.73^2*b2_var+pi^2/3
SD1_b2<-sqrt(b2)
m_var<-var(jobs$econ_hard)
b2_m_cor<-cov(jobs$depress1d2,jobs$econ_hard)
b2_1<-0.65628^2*b2_var+0.21455^2*m_var+2*0.65628*0.21455*b2_m_cor+pi^2/3
SD2_b2<-sqrt(b2_1)
#对b、c'、c系数标准化
b2_b<-0.21455/SD2_b2
b2_c1<-0.65628/SD2_b2
b2_c<-0.73/SD1_b2
b2_b;b2_c1;b2_c
[1] 0.116264
[1] 0.3556361
[1] 0.3970391
#对自变量b3水平的b、c'、c系数标准化
b3_var<-var(jobs$depress1d3)
b3<-1.4504^2*b3_var+pi^2/3
SD1_b3<-sqrt(b3)
m_var<-var(jobs$econ_hard)
b3_m_cor<-cov(jobs$depress1d3,jobs$econ_hard)
b3_1<-1.30567^2*b3_var+0.21455^2*m_var+2*1.30567*0.21455*b3_m_cor+pi^2/3
SD2_b3<-sqrt(b3_1)
#对b、c'、c系数标准化
b3_b<-0.21455/SD2_b3
b3_c1<-1.30567/SD2_b3
b3_c<-1.4504/SD1_b3
b3_b;b3_c1;b3_c
[1] 0.1112067
[1] 0.6767617
[1] 0.7531668
#对自变量b4水平的b、c'、c系数标准化
b4_var<-var(jobs$depress1d4)
b4<-2.1163^2*b4_var+pi^2/3
SD1_b4<-sqrt(b4)
m_var<-var(jobs$econ_hard)
b4_m_cor<-cov(jobs$depress1d4,jobs$econ_hard)
b4_1<-1.97533^2*b4_var+0.21455^2*m_var+2*1.97533*0.21455*b4_m_cor+pi^2/3
SD2_b4<-sqrt(b4_1)
#对b、c'、c系数标准化
b4_b<-0.21455/SD2_b4
b4_c1<-1.97533/SD2_b4
b4_c<-2.1163/SD1_b4
b4_b;b4_c1;b4_c
[1] 0.107921
[1] 0.9936122
[1] 1.065822
经过系数标准化后,虽然总效应不完全等于直接效应+间接效应,但是两者差异已经非常小了,如下图所示。因此我们可以采用整个流程并使用自助法计算直接效应(标准化的c’)、间接效应(a*b(标准化的))、总效应(标准化的c)的置信区间。
3.使用RMediation包检验Za*Zb