Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247

Merged
merged 8 commits into from
Oct 9, 2024

Conversation

nparent1
Copy link
Contributor

@nparent1 nparent1 commented Sep 1, 2024

This PR extends the has_directed_path method to accept multiple treatment and outcome nodes. The approach taken here is as recommended from Issue #654:

(2) more aggressively: every one of the treatments has at least one direct path to one of the outcomes. And, every one of the outcomes has a direct path from at least one of the treatments.

I opted to implement this by computing the list of all nodes reachable from our treatments, plus the list of all nodes that can reach our outcomes, rather than iteratively using networkx's has_path method, since under the hood has_path is computing shortest paths, but we really do not need to incur the additional runtime of computing these shortest paths.

Related Issue: #654

@nparent1
Copy link
Contributor Author

This should also fix #1250, added a test for that issue here: 0b6288c

@nparent1
Copy link
Contributor Author

I dont see an option to request a reviewer - any chance @bloebp or @amit-sharma can review? I see that you both are regular maintainers of this repo

Signed-off-by: Nicholas Parente <[email protected]>
Signed-off-by: Nicholas Parente <[email protected]>
@bloebp
Copy link
Member

bloebp commented Oct 1, 2024

Hi, thanks for the PR!

I am wondering, if we have a treatment variable that is, e.g., completely disconnected, but another one that has a path to an outcome, we would return True here. An alternative would be to only return True if all have a directed path (instead of any), but I am not sure in which context we are currently using the method.
Can you briefly summarize where the function is currently being used (sorry, I don't have the full overview :))?

dowhy/causal_graph.py Outdated Show resolved Hide resolved
@nparent1
Copy link
Contributor Author

nparent1 commented Oct 3, 2024

Hi @bloebp !

I am wondering, if we have a treatment variable that is, e.g., completely disconnected, but another one that has a path to an outcome, we would return True here. An alternative would be to only return True if all have a directed path (instead of any), but I am not sure in which context we are currently using the method.

I believe this PR accomplishes the latter. In this PR we enforce that we should only return True if all treatment nodes have a directed path to an outcome node (and that all outcome nodes have a directed path from a treatment node).

We check that all treatment nodes have a directed path to an outcome node by getting the set of ancestors of the outcome nodes, and making sure the treatment nodes are a subset of this set. E.g. every single treatment node must be an ancestor of some outcome node. In the example where one treatment node is completely disconnected, then the action_node_candidates will never contain that treatment node, and so we will fail the check for the set of treatment nodes being a subset of action_node_candidates .

Can you briefly summarize where the function is currently being used

Currently, the has_directed_path function is only used in one place, here, at the start of the identify_effect_auto() method. Specifically, we check if there is no directed path from the treatment nodes to the outcome nodes, and if so, we return a message saying the causal effect is zero.

Before this PR we were only checking for a directed path between the first treatment node and the first outcome node. So, before this PR this function would return True if the first treatment variable had a directed path to the first outcome node, regardless of any other treatment or outcome variables which might be disconnected. This creates some very inconsistent behavior when calling the identify_effect_auto() method (see issue #1250).

As an aside, there is a duplicate definition of exactly this function which is not being used anywhere in the repository, in the CausalGraph class here. To prevent maintaining the same code in two places, I modified this method to just call the has_directed_path method from the graph.py file.

@bloebp
Copy link
Member

bloebp commented Oct 3, 2024

Thanks a lot for the context, that was really helpful and it makes sense!

Maybe one thing, let's rename nodes1 and nodes2 to something more descriptive, like action_nodes and outcome_nodes. This makes it clearer in which order the arguments are expected.

Signed-off-by: Nicholas Parente <[email protected]>
Signed-off-by: Nicholas Parente <[email protected]>
@nparent1
Copy link
Contributor Author

nparent1 commented Oct 4, 2024

Done @bloebp !

bloebp
bloebp previously approved these changes Oct 6, 2024
@bloebp
Copy link
Member

bloebp commented Oct 7, 2024

Thanks a lot! Can you briefly check if the test tests/causal_identifiers/test_efficient_backdoor_identifier.py::test_fail_multivar_outcome_efficient_backdoor_algorithms works on your side? It seems to be failing in the builds.

Signed-off-by: Nicholas Parente <[email protected]>
@nparent1
Copy link
Contributor Author

nparent1 commented Oct 8, 2024

Hi @bloebp

It turned out that test did need to be modified for my PR. I modified the test just now in this commit: 31bf34d

Basically, that unit test is testing for this assert, in which it wants the method to throw an exception because the efficient backdoor identifier only supports singleton outcome sets.

However, the outcome set used in the unit test does not have a directed path from the treatment to the second outcome node, so our new has_directed_path method exits the identify_effect() method gracefully before it can even raise that exception.

All I did to modify the test was to change the outcome set to be such that a directed path does exist from all of the action nodes to all of the outcome nodes, this way the unit test is actually reaching the code which would throw an exception for there being more than one outcome node.

@nparent1 nparent1 requested a review from bloebp October 8, 2024 04:55
Copy link
Member

@bloebp bloebp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for addressing the issues!

@bloebp
Copy link
Member

bloebp commented Oct 9, 2024

@all-contributors please add @nparent1 for code.

Copy link
Contributor

@bloebp

I've put up a pull request to add @nparent1! 🎉

@bloebp bloebp merged commit a5b56fb into py-why:main Oct 9, 2024
18 of 30 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants