-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Graphfcns #31
Graphfcns #31
Conversation
Update empty branch from main
-Create graph_utilities.py -Create tests -Create parameters
I don't understand the justification for the need to treat graph functions like this. From the development (especially typing) and maintenance perspective, this approach can make debugging difficult and reduce readability. What's the trade-off for this? If you want to apply some functions in a specific sequence, why not just have a wrapper function that does so? |
Yes I probably should have motivated it a bit:
Hopefully it is clear? |
I see. For this case, I recommend defining a base class with some required methods (can be enforced by using python's built-in |
Thanks for the suggestions. The register is implemented but it was minimal work and easy to change. I think how this interacts with the config file #10 is probably the most critical aspect to useability and will be the hardest bit to code (saving this for now). I think a class based approach could make sense. I will have a look at @dalonsoa - any views on this? Edit: |
OK here is a testable mockup of how this could work in practice to give everyone a better idea - would be great to hear thoughts: import networkx as nx
from swmmanywhere import parameters
from abc import ABC, abstractmethod
class BaseGraphFunction(ABC):
@abstractmethod
def __init__(self):
self.required_attributes = []
self.adds_attributes = []
@abstractmethod
def __call__(self,
G: nx.Graph,
**kwargs) -> nx.Graph:
return G
def validate_requirements(self,
attributes: set):
for attribute in self.required_attributes:
assert attribute in attributes, "{0} not in attributes".format(
attribute)
def add_graphfcn(self,
attributes: set):
self.validate_requirements(attributes)
return attributes.union(self.adds_attributes)
graphfcns = {}
def register_graphfcn(cls):
graphfcns[cls.__name__] = cls
return cls
def get_osmid_id(data):
id_ = data.get('osmid', data.get('id'))
if isinstance(id_, list):
id_ = id_[0]
return id_
@register_graphfcn
class assign_id(BaseGraphFunction):
def __init__(self):
super().__init__()
self.required_attributes = ['osmid']
self.adds_attributes = ['id']
def __call__(self,
G: nx.Graph,
**kwargs):
for u, v, data in G.edges(data=True):
data['id'] = get_osmid_id(data)
return G
@register_graphfcn
class format_osmnx_lanes(BaseGraphFunction):
def __init__(self):
super().__init__()
self.required_attributes = ['lanes']
self.adds_attributes = ['width', 'nlanes']
def __call__(self,
G: nx.Graph,
subcatchment_derivation: parameters.SubcatchmentDerivation,
**kwargs):
G = G.copy()
for u, v, data in G.edges(data=True):
lanes = data.get('lanes',1)
if isinstance(lanes, list):
lanes = sum([float(x) for x in lanes])
else:
lanes = float(lanes)
data['nlanes'] = lanes
data['width'] = lanes * subcatchment_derivation.lane_width
return G
@register_graphfcn
class double_directed(BaseGraphFunction):
def __init__(self):
super().__init__()
self.required_attributes = ['id']
self.adds_attributes = []
def __call__(self,
G: nx.Graph,
**kwargs):
G_new = G.copy()
for u, v, data in G.edges(data=True):
if (v, u) not in G.edges:
reverse_data = data.copy()
reverse_data['id'] = '{0}.reversed'.format(data['id'])
G_new.add_edge(v, u, **reverse_data)
return G_new
from swmmanywhere.prepare_data import download_street
# Download example graph
bbox = (-0.11643,51.50309,-0.11169,51.50549)
G = download_street(bbox)
# Define functions list
functions_list = ['assign_id', 'format_osmnx_lanes', 'double_directed']
functions_list_init = []
# Populate attributes common to all edges in graph
attributes = set(list(G.edges(data=True))[0][2].keys())
for u, v, data in G.edges(data=True):
attributes.intersection(data.keys())
# Add and validate attributes and functions
for function_name in functions_list:
functions_list_init.append(graphfcns[function_name]())
attributes = functions_list_init[-1].add_graphfcn(attributes)
# iterate over functions
params = {'subcatchment_derivation' : parameters.SubcatchmentDerivation()}
for function in functions_list_init:
G = function(G,**params)
# This will break because double_directed needs 'id' which has not yet been
# added
G = download_street(bbox)
functions_list = ['double_directed', 'assign_id', 'format_osmnx_lanes']
attributes = set()
for u, v, data in G.edges(data=True):
attributes.update(data.keys())
for function_name in functions_list:
functions_list_init.append(graphfcns[function_name]())
attributes = functions_list_init[-1].add_graphfcn(attributes)
params = {'subcatchment_derivation' : parameters.SubcatchmentDerivation()}
for function in functions_list_init:
G = function(G,**params) |
That's not exactly what I had in mind. For example, if I were to refactor my BUSN code, here's the base class that I would have defined with all the must-have methods: import functools
from abc import ABC, abstractmethod
from pathlib import Path
import networkx as nx
import geopandas as gpd
import xarray as xr
from typing import Optional, Any
import yaml
Loader = getattr(yaml, "CSafeLoader", yaml.SafeLoader)
yaml_load = functools.partial(yaml.load, Loader=Loader)
class BaseModel(ABC):
def __init__(self, config_path: Path | str):
self.config = self.load_config(config_path)
# Additional initialization code if necessary
@staticmethod
def load_config(config_path: Path | str) -> dict[str, Any]:
config = yaml_load(Path(config_path).read_text())
# Additional config validation code
return config
@abstractmethod
def get_street(self, bbox: tuple[float, float, float, float]) -> nx.DiGraph:
pass
@abstractmethod
def network_cleanup(self, street: nx.DiGraph, merge_dist: int) -> nx.DiGraph:
pass
@abstractmethod
def get_buildings(self, bbox: tuple[float, float, float, float]) -> gpd.GeoDataFrame:
pass
@abstractmethod
def set_network_attrs(self, street: nx.DiGraph, building_footprints: Optional[gpd.GeoDataFrame], cover_mul: float) -> nx.DiGraph:
pass
@abstractmethod
def set_street_topo(self, street: nx.DiGraph, slope_min: float) -> nx.DiGraph:
pass
@abstractmethod
def pipe_attrs(self, street: nx.DiGraph) -> nx.DiGraph:
pass
@abstractmethod
def flow_weight(self, street: nx.DiGraph) -> None:
pass
@abstractmethod
def cover_weight(self, street: nx.DiGraph, cover_da: xr.DataArray) -> None:
pass
@abstractmethod
def road_weight(self, street: nx.DiGraph) -> None:
pass
@abstractmethod
def btw_weight(self, street: nx.DiGraph) -> None:
pass
@abstractmethod
def compute_betweenness(self, street: nx.DiGraph) -> nx.DiGraph:
pass
@abstractmethod
def remove_streets(self, street: nx.DiGraph, drain_levels: int) -> dict[float, nx.DiGraph]:
pass
@abstractmethod
def run(self) -> None:
"""Abstract method for running the model's main workflow."""
# It will use `self.config` to run the operations in desired order
pass Then, I would've defined a |
I see the advantage of @barneydobson implementation using small classes that deal with one particular activity, instead of plain functions, as it was the initial idea. It needs some work to polish some rough edges, but the validation step to ensure the right information is there is definitely useful. I definitely do not understand the advantage that @cheginit implementation of an all-mighty class that do a lot of work with many methods to define (even if we consider inheritance). I do not see how that is more readable, easy to test and debug or extensible by end users. Anyone who wants to try something new will need to hack the core code to tell it to use a new (sub)class, rather than just providing a configuration that will run any registered functions (or classes) in the right order. And such registration can happen in a python module totally external to this tool, so no need to change the core code. |
-Will likely need refactoring!
Maybe I couldn't explain it well. There's no reason for a user to "hack the core code". If a user wants to change a specific step, they just need to inherit from one of the main classes, let's say class NewBCModel(BCModel):
def set_network_attrs(...):
# new code This approach is common for packages that require this level of customization, such as optimization packages (pymoo) or ML ones like PyTorch. I am not sure why you'd think this is difficult to read, test and debug. Perhaps because I am used to using many such packages, this approach feels more natural and suitable to me. |
Because you need the whole class that do everything (including testing) and there is a somewhat implicit implementation of much of the functionality (defined in the parent classes - I know this is what inheritance is all about, but I find that much harder to follow that a more functional approach). Why putting all possible functionality in a single class when you can have small classes that only need to worry about their own specific functionality? About the customisation, I think we are seeing different use cases - or understanding the use cases differently. I see this tool used either directly from a command line or with a very simple script that reads a configuration file and calls high level functions. In other words, an application. You see it more as, following your example, PyTorch, a library that contains many bits and pieces that you put together to do some customised tasks. Both are totally fine, they are just different and need different solutions. I don't know what @barneydobson had in mind. |
Ok, I see what you mean. The reason I think this class-based approach makes more sense is that we have all these functionalities and there are also different methods of generating the network, each of which uses a subset of these functionalities. So, IMHO, since each method bundles a subset of these functionalities, it's easier to understand and follow the workflow when all required steps are gathered in a class. There is a base class that contains the steps shared between all methods, then there are method-specific classes containing only modified/additional steps. I think, the advantages of this approach don't depend on whether it's mainly used via CLI/config file or scripting. Having said that, since @barneydobson and @dalonsoa will be maintaining the package, using an implementation that you're more comfortable with is fine by me. |
OK thanks so much @cheginit and @dalonsoa for this very helpful discussion. I definitely see the pros and cons of both. I think we are probably all agreed that there is both a need for and benefit of having a 'default loadout' that can be customised by a user for their case. For me I think the practical distinction above is boiling down to whether this default is contained in a default config file, or a default model class - with many implications discussed. One thing that was not made explicit in @cheginit 's approach was the handling of If you are both happy with it - perhaps we can revisit the discussion of default config vs default model class when at #10 , this will also have the benefit of some more tangible examples. |
-Load test data rather than download -test derive_rc -minor edits to fix derive subs
I think all validation and checks should be done when instantiating the class. If you look at the example template class that I wrote, there is a The validation is done by calling the functions or classes that you develop separately in the other PR and don't need to (shouldn't) be part of this base class. |
- Introduce Addresses into parameters - Test
-Add tests -Identify and fix error in interpolate_points_on_raster in geospatial_utilities and its test
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made a collection of comments and suggestions to make the code cleaner and easier to read. I have not enter into discussing the science and specialised packages and formats because, to be honest, you know more about that.
It is a huge PR, so it might deserve another review because I migth have missed something, but so far is looking really good.
Co-authored-by: Diego Alonso Álvarez <[email protected]>
Co-authored-by: Diego Alonso Álvarez <[email protected]>
Co-authored-by: Diego Alonso Álvarez <[email protected]>
Make graphfcn registry a dict
Description
This PR will be used to implement all of the graph functions, addressing #8, part of #19 . Examples have been discussed in #10 and this PR will address the 'model parameters' part of the issue.
Design
I have implemented some functions and parameters to demonstrate how we might do this, but wanted @dalonsoa and @cheginit to agree before I do all of them (as there are many so I don't want to do them and have to change them...). To remind how these functions are expected to be called (example from #10 ):
(We may also have addresses separately in cases such as where a raster or downloaded data is needed, but as mentioned in #10 , it may be sensible to store addresses as parameters too).
A couple of discussion points:
format_osmnx_lanes
). I plan also to add some kind of 'verbosity' setting, so for example, even if the subcatchment shapes are not actually needed for the algortihm, they can be printed during the derive subcatchments function. Part of verbosity could also include printing the graph after every function, this would certainly be useful for debugging.Edits:
register_graphfcn
seems weird, like surely the...
is antithesis to typing, but I couldn't get it to work any other way since the graph functions require all different inputs and also kwargs.Current graph functions to implement