Reproducibility of gcm.evaluate_causal_model #1288

NMahajan85 · 2024-12-02T07:13:19Z

Ask your question
While assessing the gcm quality with gcm.evaluate_causal_model post fitting and using gcm.auto.assign_causal_mechanisms gives different results with rerun of gcm.evaluate_causal_model. The DAG is rejected in some cases while it is not rejected in other cases. I guess the use of permutations might have something to do with this. Is there a way to set a seed to get reproducible results?
Also how does one interpret that DAG is informative?

+-------------------------------------------------------------------------------------------------------+
| Falsification Summary |
+-------------------------------------------------------------------------------------------------------+
| The given DAG is informative because 0 / 100 of the permutations lie in the Markov |
| equivalence class of the given DAG (p-value: 0.00). |
| The given DAG violates 6/18 LMCs and is better than 80.0% of the permuted DAGs (p-value: 0.20). |
| Based on the provided significance level (0.2) and because the DAG is informative, |
| we do not reject the DAG. |
+-------------------------------------------------------------------------------------------------------+
{'Page Views': (np.float64(1.0), np.False_, 0.05), 'Sold Units': (np.float64(1.0), np.False_, 0.05), 'Revenue': (np.float64(1.0), np.False_, 0.05), 'Profit': (np.float64(1.0), np.False_, 0.05), 'Unit Price': (np.float64(0.9626791405827025), np.False_, 0.05), 'Ad Spend': (np.float64(1.0), np.False_, 0.05), 'Operational Cost': (np.float64(1.0), np.False_, 0.05)}
Evaluating causal mechanisms...: 100%|██████████| 8/8 [00:00<00:00, 7981.55it/s]
Test permutations of given graph: 100%|██████████| 100/100 [00:23<00:00, 4.30it/s]
overall_kl_divergence - 1.0752892331747705
+-------------------------------------------------------------------------------------------------------+
| Falsification Summary |
+-------------------------------------------------------------------------------------------------------+
| The given DAG is informative because 0 / 100 of the permutations lie in the Markov |
| equivalence class of the given DAG (p-value: 0.00). |
| The given DAG violates 6/18 LMCs and is better than 68.0% of the permuted DAGs (p-value: 0.32). |
| Based on the provided significance level (0.2) and because the DAG is informative, |
| we reject the DAG. |
+-------------------------------------------------------------------------------------------------------+
{'Page Views': (np.float64(1.0), np.False_, 0.05), 'Sold Units': (np.float64(1.0), np.False_, 0.05), 'Revenue': (np.float64(1.0), np.False_, 0.05), 'Profit': (np.float64(1.0), np.False_, 0.05), 'Unit Price': (np.float64(0.9626791405817909), np.False_, 0.05), 'Ad Spend': (np.float64(1.0), np.False_, 0.05), 'Operational Cost': (np.float64(1.0), np.False_, 0.05)}
Evaluating causal mechanisms...: 100%|██████████| 8/8 [00:00<00:00, 4003.15it/s]
Test permutations of given graph: 100%|██████████| 100/100 [00:24<00:00, 4.04it/s]
overall_kl_divergence - 1.3110770186869007

Expected behavior
Shouldn't there be any reproducibility in results? Even on running the example notebooks I get different results that show in the notebook output.

Version information:

DoWhy version 0.1
python version 3.11
windows

Additional context
Add any other context about the problem here.

bloebp · 2024-12-02T15:47:33Z

Hi, thanks for your question! The model evaluation is based on sampling, it is expected to (slightly) differ between runs. Generally, if it differs a lot (high variance), this can also be seen as a large confidence interval (e.g., in the sense of getting a confidence interval via bootstrapping).

Is there a way to set a seed to get reproducible results?

You can set a random seed via

from dowhy import gcm
gcm.util.general.set_random_seed(0)

That should make it deterministic, let me know if not.

Also how does one interpret that DAG is informative?

This is an indicator if checking violations of the local Markov condition via independence tests can give an 'informative'/'valuable' insight in the first place. Since the main idea is to look for violations under permutations of the graph, this would only give insights if the permuted graph structure significantly changes in the first place. For instance, in a fully connected graph, a permutation of nodes would not change anything about the number of violations, i.e., it is not really informative. In a rather sparse graph, a permutation would drastically change how specific nodes are connected, i.e., if we introduce more/less violations, statements about such a graph is more informative.

github-actions · 2025-01-02T01:59:19Z

This issue is stale because it has been open for 30 days with no activity.

NMahajan85 added the question Further information is requested label Dec 2, 2024

github-actions bot added the stale label Jan 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reproducibility of gcm.evaluate_causal_model #1288

Reproducibility of gcm.evaluate_causal_model #1288

NMahajan85 commented Dec 2, 2024

bloebp commented Dec 2, 2024

github-actions bot commented Jan 2, 2025

Reproducibility of gcm.evaluate_causal_model #1288

Reproducibility of gcm.evaluate_causal_model #1288

Comments

NMahajan85 commented Dec 2, 2024

bloebp commented Dec 2, 2024

github-actions bot commented Jan 2, 2025