Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reproducibility of gcm.evaluate_causal_model #1288

Open
NMahajan85 opened this issue Dec 2, 2024 · 2 comments
Open

Reproducibility of gcm.evaluate_causal_model #1288

NMahajan85 opened this issue Dec 2, 2024 · 2 comments
Labels
question Further information is requested stale

Comments

@NMahajan85
Copy link

Ask your question
While assessing the gcm quality with gcm.evaluate_causal_model post fitting and using gcm.auto.assign_causal_mechanisms gives different results with rerun of gcm.evaluate_causal_model. The DAG is rejected in some cases while it is not rejected in other cases. I guess the use of permutations might have something to do with this. Is there a way to set a seed to get reproducible results?
Also how does one interpret that DAG is informative?

+-------------------------------------------------------------------------------------------------------+
| Falsification Summary |
+-------------------------------------------------------------------------------------------------------+
| The given DAG is informative because 0 / 100 of the permutations lie in the Markov |
| equivalence class of the given DAG (p-value: 0.00). |
| The given DAG violates 6/18 LMCs and is better than 80.0% of the permuted DAGs (p-value: 0.20). |
| Based on the provided significance level (0.2) and because the DAG is informative, |
| we do not reject the DAG. |
+-------------------------------------------------------------------------------------------------------+
{'Page Views': (np.float64(1.0), np.False_, 0.05), 'Sold Units': (np.float64(1.0), np.False_, 0.05), 'Revenue': (np.float64(1.0), np.False_, 0.05), 'Profit': (np.float64(1.0), np.False_, 0.05), 'Unit Price': (np.float64(0.9626791405827025), np.False_, 0.05), 'Ad Spend': (np.float64(1.0), np.False_, 0.05), 'Operational Cost': (np.float64(1.0), np.False_, 0.05)}
Evaluating causal mechanisms...: 100%|██████████| 8/8 [00:00<00:00, 7981.55it/s]
Test permutations of given graph: 100%|██████████| 100/100 [00:23<00:00, 4.30it/s]
overall_kl_divergence - 1.0752892331747705
+-------------------------------------------------------------------------------------------------------+
| Falsification Summary |
+-------------------------------------------------------------------------------------------------------+
| The given DAG is informative because 0 / 100 of the permutations lie in the Markov |
| equivalence class of the given DAG (p-value: 0.00). |
| The given DAG violates 6/18 LMCs and is better than 68.0% of the permuted DAGs (p-value: 0.32). |
| Based on the provided significance level (0.2) and because the DAG is informative, |
| we reject the DAG. |
+-------------------------------------------------------------------------------------------------------+
{'Page Views': (np.float64(1.0), np.False_, 0.05), 'Sold Units': (np.float64(1.0), np.False_, 0.05), 'Revenue': (np.float64(1.0), np.False_, 0.05), 'Profit': (np.float64(1.0), np.False_, 0.05), 'Unit Price': (np.float64(0.9626791405817909), np.False_, 0.05), 'Ad Spend': (np.float64(1.0), np.False_, 0.05), 'Operational Cost': (np.float64(1.0), np.False_, 0.05)}
Evaluating causal mechanisms...: 100%|██████████| 8/8 [00:00<00:00, 4003.15it/s]
Test permutations of given graph: 100%|██████████| 100/100 [00:24<00:00, 4.04it/s]
overall_kl_divergence - 1.3110770186869007

Expected behavior
Shouldn't there be any reproducibility in results? Even on running the example notebooks I get different results that show in the notebook output.

Version information:

  • DoWhy version 0.1
  • python version 3.11
  • windows

Additional context
Add any other context about the problem here.

@NMahajan85 NMahajan85 added the question Further information is requested label Dec 2, 2024
@bloebp
Copy link
Member

bloebp commented Dec 2, 2024

Hi, thanks for your question! The model evaluation is based on sampling, it is expected to (slightly) differ between runs. Generally, if it differs a lot (high variance), this can also be seen as a large confidence interval (e.g., in the sense of getting a confidence interval via bootstrapping).

Is there a way to set a seed to get reproducible results?

You can set a random seed via

from dowhy import gcm
gcm.util.general.set_random_seed(0)

That should make it deterministic, let me know if not.

Also how does one interpret that DAG is informative?

This is an indicator if checking violations of the local Markov condition via independence tests can give an 'informative'/'valuable' insight in the first place. Since the main idea is to look for violations under permutations of the graph, this would only give insights if the permuted graph structure significantly changes in the first place. For instance, in a fully connected graph, a permutation of nodes would not change anything about the number of violations, i.e., it is not really informative. In a rather sparse graph, a permutation would drastically change how specific nodes are connected, i.e., if we introduce more/less violations, statements about such a graph is more informative.

Copy link

github-actions bot commented Jan 2, 2025

This issue is stale because it has been open for 30 days with no activity.

@github-actions github-actions bot added the stale label Jan 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested stale
Projects
None yet
Development

No branches or pull requests

2 participants