Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

中介变量或者因变量是因子变量的中介分析 #5875

Closed
ixxmu opened this issue Nov 4, 2024 · 1 comment
Closed

中介变量或者因变量是因子变量的中介分析 #5875

ixxmu opened this issue Nov 4, 2024 · 1 comment

Comments

@ixxmu
Copy link
Owner

ixxmu commented Nov 4, 2024

https://mp.weixin.qq.com/s/jHg1lo_AuFG9WyENLj8x-Q

@ixxmu
Copy link
Owner Author

ixxmu commented Nov 4, 2024

中介变量或者因变量是因子变量的中介分析 by R 学习践行者

    在中介变量或因变量为类别变量的中介分析中,需要使用 Logistic 回归取代通常的线性回归;此时因为a、b、c’系数存在尺度上的问题所以需要进行系数转换,使得回归系数变成具有相同的尺度。

    下面使用示例展示这种类型的中介分析过程,示例中自变量是多分类变量,中介变量是连续变量,因变量是二分类变量。

library(mediation)
data(jobs)

    我们仍然depress1作为自变量,econ_hard作为中介变量,depress2作为因变量;同时把depress1和depress2转换成因子变量:

hist(jobs$depress2)

jobs$depress1<-ifelse(
jobs$depress1<1.5,"d1",
ifelse(jobs$depress1>=1.5 & jobs$depress1<2,"d2",
ifelse(jobs$depress1>=2 & jobs$depress1<2.5,
"d3","d4")))

jobs$depress1<-factor(jobs$depress1,levels = c("d1","d2","d3","d4"))

jobs$depress2<-ifelse(jobs$depress2>2,1,0)

table(jobs$depress2)
  
0 1
655 244
# 转换多类别变量 X 为虚拟变量
X_dummy <- model.matrix(~ depress1 - 1, data = jobs) # 创建虚拟变量矩阵
jobs<- cbind(jobs, X_dummy) # 合并虚拟变量至原数据集中

1.拟合模型

第一个模型:

model.0 <- glm(depress2 ~ depress1d2+depress1d3+depress1d4, 
data=jobs,
family = binomial(link = "logit"))
summary(model.0)
  
Call:
glm(formula = depress2 ~ depress1d2 + depress1d3 + depress1d4,
family = binomial(link = "logit"), data = jobs)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.0534 0.1822 -11.272 < 2e-16 ***
depress1d2 0.7300 0.2531 2.884 0.00393 **
depress1d3 1.4504 0.2258 6.424 1.32e-10 ***
depress1d4 2.1163 0.2416 8.760 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1051.2 on 898 degrees of freedom
Residual deviance: 952.3 on 895 degrees of freedom
AIC: 960.3

Number of Fisher Scoring iterations: 4

第二个模型:

model.1 <- lm(econ_hard~ depress1d2 + depress1d3 + depress1d4,
data=jobs)
summary(model.1)
  
Call:
lm(formula = econ_hard ~ depress1d2 + depress1d3 + depress1d4,
data = jobs)

Residuals:
Min 1Q Median 3Q Max
-2.35006 -0.66874 -0.00874 0.66126 2.38124

Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 2.61876 0.05410 48.410 < 2e-16 ***
depress1d2 0.36431 0.08610 4.231 2.56e-05 ***
depress1d3 0.71998 0.08052 8.942 < 2e-16 ***
depress1d4 0.73130 0.09181 7.965 4.99e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 0.9354 on 895 degrees of freedom
Multiple R-squared: 0.1038, Adjusted R-squared: 0.1008
F-statistic: 34.55 on 3 and 895 DF, p-value: < 2.2e-16

第三个模型:

model.2 <- glm(depress2~ econ_hard+depress1d2 + 
depress1d3 + depress1d4,
data=jobs,
family = binomial(link = "logit"))
summary(model.2)
  
Call:
glm(formula = depress2 ~ econ_hard + depress1d2 + depress1d3 +
depress1d4, family = binomial(link = "logit"), data = jobs)

Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -2.63074 0.29810 -8.825 < 2e-16 ***
econ_hard 0.21455 0.08527 2.516 0.0119 *
depress1d2 0.65628 0.25527 2.571 0.0101 *
depress1d3 1.30567 0.23268 5.612 2.01e-08 ***
depress1d4 1.97533 0.24750 7.981 1.45e-15 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

Null deviance: 1051.22 on 898 degrees of freedom
Residual deviance: 945.92 on 894 degrees of freedom
AIC: 955.92

Number of Fisher Scoring iterations: 4

    根据输出的结果我们可以绘制出一下这张图:

     我们可以看到未进行系数转换,总效应与直接效应+间接效应之和差异较大。


2.回归系数标准

#对自变量b2水平的b、c'、c系数标准化
b2_var<-var(jobs$depress1d2)
b2<-0.73^2*b2_var+pi^2/3
SD1_b2<-sqrt(b2)

m_var<-var(jobs$econ_hard)
b2_m_cor<-cov(jobs$depress1d2,jobs$econ_hard)

b2_1<-0.65628^2*b2_var+0.21455^2*m_var+2*0.65628*0.21455*b2_m_cor+pi^2/3

SD2_b2<-sqrt(b2_1)

#对b、c'、c系数标准化
b2_b<-0.21455/SD2_b2
b2_c1<-0.65628/SD2_b2
b2_c<-0.73/SD1_b2

b2_b;b2_c1;b2_c
  [1] 0.116264
  [1] 0.3556361
  [1] 0.3970391
#对自变量b3水平的b、c'、c系数标准化
b3_var<-var(jobs$depress1d3)
b3<-1.4504^2*b3_var+pi^2/3
SD1_b3<-sqrt(b3)

m_var<-var(jobs$econ_hard)
b3_m_cor<-cov(jobs$depress1d3,jobs$econ_hard)

b3_1<-1.30567^2*b3_var+0.21455^2*m_var+2*1.30567*0.21455*b3_m_cor+pi^2/3

SD2_b3<-sqrt(b3_1)

#对b、c'、c系数标准化
b3_b<-0.21455/SD2_b3
b3_c1<-1.30567/SD2_b3
b3_c<-1.4504/SD1_b3

b3_b;b3_c1;b3_c
  [1] 0.1112067
  [1] 0.6767617
  [1] 0.7531668
#对自变量b4水平的b、c'、c系数标准化
b4_var<-var(jobs$depress1d4)
b4<-2.1163^2*b4_var+pi^2/3
SD1_b4<-sqrt(b4)

m_var<-var(jobs$econ_hard)
b4_m_cor<-cov(jobs$depress1d4,jobs$econ_hard)

b4_1<-1.97533^2*b4_var+0.21455^2*m_var+2*1.97533*0.21455*b4_m_cor+pi^2/3

SD2_b4<-sqrt(b4_1)

#对b、c'、c系数标准化
b4_b<-0.21455/SD2_b4
b4_c1<-1.97533/SD2_b4
b4_c<-2.1163/SD1_b4

b4_b;b4_c1;b4_c
  [1] 0.107921
  [1] 0.9936122
  [1] 1.065822

    经过系数标准化后,虽然总效应不完全等于直接效应+间接效应,但是两者差异已经非常小了,如下图所示。因此我们可以采用整个流程并使用自助法计算直接效应(标准化的c’)、间接效应(a*b(标准化的))、总效应(标准化的c)的置信区间。


3.使用RMediation包检验Za*Zb

@ixxmu ixxmu changed the title archive_request 中介变量或者因变量是因子变量的中介分析 Nov 4, 2024
@ixxmu ixxmu closed this as completed Nov 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant