Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247

nparent1 · 2024-09-01T21:04:35Z

This PR extends the has_directed_path method to accept multiple treatment and outcome nodes. The approach taken here is as recommended from Issue #654:

(2) more aggressively: every one of the treatments has at least one direct path to one of the outcomes. And, every one of the outcomes has a direct path from at least one of the treatments.

I opted to implement this by computing the list of all nodes reachable from our treatments, plus the list of all nodes that can reach our outcomes, rather than iteratively using networkx's has_path method, since under the hood has_path is computing shortest paths, but we really do not need to incur the additional runtime of computing these shortest paths.

Related Issue: #654

Signed-off-by: Nicholas Parente <[email protected]>

nparent1 · 2024-09-10T02:56:44Z

This should also fix #1250, added a test for that issue here: 0b6288c

nparent1 · 2024-09-12T03:19:15Z

I dont see an option to request a reviewer - any chance @bloebp or @amit-sharma can review? I see that you both are regular maintainers of this repo

Signed-off-by: Nicholas Parente <[email protected]>

bloebp · 2024-10-01T14:40:52Z

Hi, thanks for the PR!

I am wondering, if we have a treatment variable that is, e.g., completely disconnected, but another one that has a path to an outcome, we would return True here. An alternative would be to only return True if all have a directed path (instead of any), but I am not sure in which context we are currently using the method.
Can you briefly summarize where the function is currently being used (sorry, I don't have the full overview :))?

Signed-off-by: Nicholas Parente <[email protected]>

dowhy/causal_graph.py

nparent1 · 2024-10-03T02:15:02Z

Hi @bloebp !

I am wondering, if we have a treatment variable that is, e.g., completely disconnected, but another one that has a path to an outcome, we would return True here. An alternative would be to only return True if all have a directed path (instead of any), but I am not sure in which context we are currently using the method.

I believe this PR accomplishes the latter. In this PR we enforce that we should only return True if all treatment nodes have a directed path to an outcome node (and that all outcome nodes have a directed path from a treatment node).

We check that all treatment nodes have a directed path to an outcome node by getting the set of ancestors of the outcome nodes, and making sure the treatment nodes are a subset of this set. E.g. every single treatment node must be an ancestor of some outcome node. In the example where one treatment node is completely disconnected, then the action_node_candidates will never contain that treatment node, and so we will fail the check for the set of treatment nodes being a subset of action_node_candidates .

Can you briefly summarize where the function is currently being used

Currently, the has_directed_path function is only used in one place, here, at the start of the identify_effect_auto() method. Specifically, we check if there is no directed path from the treatment nodes to the outcome nodes, and if so, we return a message saying the causal effect is zero.

Before this PR we were only checking for a directed path between the first treatment node and the first outcome node. So, before this PR this function would return True if the first treatment variable had a directed path to the first outcome node, regardless of any other treatment or outcome variables which might be disconnected. This creates some very inconsistent behavior when calling the identify_effect_auto() method (see issue #1250).

As an aside, there is a duplicate definition of exactly this function which is not being used anywhere in the repository, in the CausalGraph class here. To prevent maintaining the same code in two places, I modified this method to just call the has_directed_path method from the graph.py file.

bloebp · 2024-10-03T14:09:44Z

Thanks a lot for the context, that was really helpful and it makes sense!

Maybe one thing, let's rename nodes1 and nodes2 to something more descriptive, like action_nodes and outcome_nodes. This makes it clearer in which order the arguments are expected.

Signed-off-by: Nicholas Parente <[email protected]>

nparent1 · 2024-10-04T01:09:41Z

Done @bloebp !

bloebp · 2024-10-07T14:22:35Z

Thanks a lot! Can you briefly check if the test tests/causal_identifiers/test_efficient_backdoor_identifier.py::test_fail_multivar_outcome_efficient_backdoor_algorithms works on your side? It seems to be failing in the builds.

Signed-off-by: Nicholas Parente <[email protected]>

nparent1 · 2024-10-08T04:55:04Z

Hi @bloebp

It turned out that test did need to be modified for my PR. I modified the test just now in this commit: 31bf34d

Basically, that unit test is testing for this assert, in which it wants the method to throw an exception because the efficient backdoor identifier only supports singleton outcome sets.

However, the outcome set used in the unit test does not have a directed path from the treatment to the second outcome node, so our new has_directed_path method exits the identify_effect() method gracefully before it can even raise that exception.

All I did to modify the test was to change the outcome set to be such that a directed path does exist from all of the action nodes to all of the outcome nodes, this way the unit test is actually reaching the code which would throw an exception for there being more than one outcome node.

bloebp

Thanks for addressing the issues!

bloebp · 2024-10-09T16:09:37Z

@all-contributors please add @nparent1 for code.

allcontributors · 2024-10-09T16:09:47Z

@bloebp

I've put up a pull request to add @nparent1! 🎉

nparent1 added 2 commits September 1, 2024 17:03

Modifies has_directed_path and adds test

943f219

Signed-off-by: Nicholas Parente <[email protected]>

Adding test for auto identify

0b6288c

Signed-off-by: Nicholas Parente <[email protected]>

nparent1 mentioned this pull request Sep 10, 2024

Order of Action Nodes effects Identified Estimand #1250

Closed

emrekiciman requested review from amit-sharma and bloebp September 12, 2024 17:38

nparent1 added 2 commits September 14, 2024 00:35

poe format

ff7f3ec

Signed-off-by: Nicholas Parente <[email protected]>

remove dependency change

daa868f

Signed-off-by: Nicholas Parente <[email protected]>

Cleaning up and refactoring causal_learns has_directed_path

9be439e

Signed-off-by: Nicholas Parente <[email protected]>

nparent1 commented Oct 3, 2024

View reviewed changes

dowhy/causal_graph.py Outdated Show resolved Hide resolved

nparent1 added 2 commits October 3, 2024 21:07

rename arguments

a29e93d

Signed-off-by: Nicholas Parente <[email protected]>

rename arguments pt 2

706ff09

Signed-off-by: Nicholas Parente <[email protected]>

bloebp previously approved these changes Oct 6, 2024

View reviewed changes

Modify other test

31bf34d

Signed-off-by: Nicholas Parente <[email protected]>

nparent1 dismissed bloebp’s stale review via 31bf34d October 8, 2024 04:15

nparent1 requested a review from bloebp October 8, 2024 04:55

bloebp approved these changes Oct 9, 2024

View reviewed changes

allcontributors bot mentioned this pull request Oct 9, 2024

docs: add nparent1 as a contributor for code #1261

Merged

bloebp merged commit a5b56fb into py-why:main Oct 9, 2024
18 of 30 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247

Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247

nparent1 commented Sep 1, 2024

nparent1 commented Sep 10, 2024

nparent1 commented Sep 12, 2024

bloebp commented Oct 1, 2024

nparent1 commented Oct 3, 2024

bloebp commented Oct 3, 2024

nparent1 commented Oct 4, 2024

bloebp commented Oct 7, 2024

nparent1 commented Oct 8, 2024 •

edited

Loading

bloebp left a comment

bloebp commented Oct 9, 2024

allcontributors bot commented Oct 9, 2024

Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247

Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247

Conversation

nparent1 commented Sep 1, 2024

nparent1 commented Sep 10, 2024

nparent1 commented Sep 12, 2024

bloebp commented Oct 1, 2024

nparent1 commented Oct 3, 2024

bloebp commented Oct 3, 2024

nparent1 commented Oct 4, 2024

bloebp commented Oct 7, 2024

nparent1 commented Oct 8, 2024 • edited Loading

bloebp left a comment

Choose a reason for hiding this comment

bloebp commented Oct 9, 2024

allcontributors bot commented Oct 9, 2024

nparent1 commented Oct 8, 2024 •

edited

Loading