-
Notifications
You must be signed in to change notification settings - Fork 939
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for non-singleton treatment/outcome sets to "has_directed_path" method #1247
Conversation
Signed-off-by: Nicholas Parente <[email protected]>
Signed-off-by: Nicholas Parente <[email protected]>
I dont see an option to request a reviewer - any chance @bloebp or @amit-sharma can review? I see that you both are regular maintainers of this repo |
Signed-off-by: Nicholas Parente <[email protected]>
Signed-off-by: Nicholas Parente <[email protected]>
Hi, thanks for the PR! I am wondering, if we have a treatment variable that is, e.g., completely disconnected, but another one that has a path to an outcome, we would return |
Signed-off-by: Nicholas Parente <[email protected]>
Hi @bloebp !
I believe this PR accomplishes the latter. In this PR we enforce that we should only return True if all treatment nodes have a directed path to an outcome node (and that all outcome nodes have a directed path from a treatment node). We check that all treatment nodes have a directed path to an outcome node by getting the set of ancestors of the outcome nodes, and making sure the treatment nodes are a subset of this set. E.g. every single treatment node must be an ancestor of some outcome node. In the example where one treatment node is completely disconnected, then the action_node_candidates will never contain that treatment node, and so we will fail the check for the set of treatment nodes being a subset of action_node_candidates .
Currently, the has_directed_path function is only used in one place, here, at the start of the identify_effect_auto() method. Specifically, we check if there is no directed path from the treatment nodes to the outcome nodes, and if so, we return a message saying the causal effect is zero. Before this PR we were only checking for a directed path between the first treatment node and the first outcome node. So, before this PR this function would return True if the first treatment variable had a directed path to the first outcome node, regardless of any other treatment or outcome variables which might be disconnected. This creates some very inconsistent behavior when calling the identify_effect_auto() method (see issue #1250). As an aside, there is a duplicate definition of exactly this function which is not being used anywhere in the repository, in the CausalGraph class here. To prevent maintaining the same code in two places, I modified this method to just call the has_directed_path method from the graph.py file. |
Thanks a lot for the context, that was really helpful and it makes sense! Maybe one thing, let's rename |
Signed-off-by: Nicholas Parente <[email protected]>
Signed-off-by: Nicholas Parente <[email protected]>
Done @bloebp ! |
Thanks a lot! Can you briefly check if the test |
Signed-off-by: Nicholas Parente <[email protected]>
Hi @bloebp It turned out that test did need to be modified for my PR. I modified the test just now in this commit: 31bf34d Basically, that unit test is testing for this assert, in which it wants the method to throw an exception because the efficient backdoor identifier only supports singleton outcome sets. However, the outcome set used in the unit test does not have a directed path from the treatment to the second outcome node, so our new has_directed_path method exits the identify_effect() method gracefully before it can even raise that exception. All I did to modify the test was to change the outcome set to be such that a directed path does exist from all of the action nodes to all of the outcome nodes, this way the unit test is actually reaching the code which would throw an exception for there being more than one outcome node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for addressing the issues!
@all-contributors please add @nparent1 for code. |
I've put up a pull request to add @nparent1! 🎉 |
This PR extends the has_directed_path method to accept multiple treatment and outcome nodes. The approach taken here is as recommended from Issue #654:
I opted to implement this by computing the list of all nodes reachable from our treatments, plus the list of all nodes that can reach our outcomes, rather than iteratively using networkx's has_path method, since under the hood has_path is computing shortest paths, but we really do not need to incur the additional runtime of computing these shortest paths.
Related Issue: #654