Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different p-values with different starting seed numbers #48

Open
shabnamhossein opened this issue Jun 5, 2024 · 3 comments
Open

Different p-values with different starting seed numbers #48

shabnamhossein opened this issue Jun 5, 2024 · 3 comments

Comments

@shabnamhossein
Copy link

Hello,

I have a question about the reliability of the p-values reported for the comparison of the individual edges between two networks. I have 10 nodes and about 100 subjects that go through a treatment, and I want to compare a specific edge between the two networks at the two time points. If I run the NCT function more than once (or use different seeds) I get different p-values for this comparison; some of them are below 0.05 and some are not (after correcting for multiple comparison). Can you help me understand why that might be the case? Here is my code for estimating the networks and using NCT for them. The data is of a depression questionnaire and ordinal.

Network1 <-estimateNetwork(data_ketamine_baseline, default = "EBICglasso", threshold = FALSE, corMethod = "spearman")
Network2 <-estimateNetwork(data_ketamine_24hr, default = "EBICglasso", threshold = FALSE, corMethod = "spearman")
set.seed(500)
nct_N1N2 <- NCT(Network1, Network2, it = 1000, paired = TRUE, weighted = TRUE,
abs = FALSE, test.edges = TRUE, edges = "all", p.adjust.methods = "fdr")

The specific p-vaue I am interested in (nct_N1N2$einv.pvals[1,3]) can range from ~0.04 to ~0.2.

Thanks,
Shabnam

@pinusm
Copy link

pinusm commented Jun 5, 2024

Try increasing the number of iterations to 5K or 10K. That should reduce the variability in the resulting p-values.
Note that the variability is due to random sampling, so different p-values with different random seeds is the expected behavior, especially when the number of iterations is low (you can try decreasing the number of iterations to 10, or 100, and experiment).

@shabnamhossein
Copy link
Author

Thanks for the quick response. That makes sense especially given my low sample size. Given this variability, what is your suggestion for reporting these p values though? In most manuscripts I have seen so far that NCT has been used, people do not report this variability and report just one p value with a random seed they have chosen. I wonder if a better way of reporting these p values in a manuscript would be to take the average of them over 100, for example, run of NCT function (with 5K iterations)? or is the median a better estimate of this p value to report?

Thanks,
Shabnam

@KarolineHuth
Copy link
Collaborator

Hi Shabnam,

As @pinusm suggested, I would increase the number of iterations. And instead of reporting a mean/median of 100x1000 bootstraps, I'd suggest just running it for 100000 iterations and reporting the p-value from the long iteration along with the seed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants