Different p-values with different starting seed numbers #48

shabnamhossein · 2024-06-05T02:36:46Z

Hello,

I have a question about the reliability of the p-values reported for the comparison of the individual edges between two networks. I have 10 nodes and about 100 subjects that go through a treatment, and I want to compare a specific edge between the two networks at the two time points. If I run the NCT function more than once (or use different seeds) I get different p-values for this comparison; some of them are below 0.05 and some are not (after correcting for multiple comparison). Can you help me understand why that might be the case? Here is my code for estimating the networks and using NCT for them. The data is of a depression questionnaire and ordinal.

Network1 <-estimateNetwork(data_ketamine_baseline, default = "EBICglasso", threshold = FALSE, corMethod = "spearman")
Network2 <-estimateNetwork(data_ketamine_24hr, default = "EBICglasso", threshold = FALSE, corMethod = "spearman")
set.seed(500)
nct_N1N2 <- NCT(Network1, Network2, it = 1000, paired = TRUE, weighted = TRUE,
abs = FALSE, test.edges = TRUE, edges = "all", p.adjust.methods = "fdr")

The specific p-vaue I am interested in (nct_N1N2$einv.pvals[1,3]) can range from ~0.04 to ~0.2.

Thanks,
Shabnam

pinusm · 2024-06-05T09:37:57Z

Try increasing the number of iterations to 5K or 10K. That should reduce the variability in the resulting p-values.
Note that the variability is due to random sampling, so different p-values with different random seeds is the expected behavior, especially when the number of iterations is low (you can try decreasing the number of iterations to 10, or 100, and experiment).

shabnamhossein · 2024-06-05T17:16:34Z

Thanks for the quick response. That makes sense especially given my low sample size. Given this variability, what is your suggestion for reporting these p values though? In most manuscripts I have seen so far that NCT has been used, people do not report this variability and report just one p value with a random seed they have chosen. I wonder if a better way of reporting these p values in a manuscript would be to take the average of them over 100, for example, run of NCT function (with 5K iterations)? or is the median a better estimate of this p value to report?

Thanks,
Shabnam

KarolineHuth · 2024-06-06T07:04:16Z

Hi Shabnam,

As @pinusm suggested, I would increase the number of iterations. And instead of reporting a mean/median of 100x1000 bootstraps, I'd suggest just running it for 100000 iterations and reporting the p-value from the long iteration along with the seed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Different p-values with different starting seed numbers #48

Different p-values with different starting seed numbers #48

shabnamhossein commented Jun 5, 2024

pinusm commented Jun 5, 2024

shabnamhossein commented Jun 5, 2024

KarolineHuth commented Jun 6, 2024

Different p-values with different starting seed numbers #48

Different p-values with different starting seed numbers #48

Comments

shabnamhossein commented Jun 5, 2024

pinusm commented Jun 5, 2024

shabnamhossein commented Jun 5, 2024

KarolineHuth commented Jun 6, 2024