Graphfcns #31

barneydobson · 2024-01-26T15:28:27Z

Description

This PR will be used to implement all of the graph functions, addressing #8, part of #19 . Examples have been discussed in #10 and this PR will address the 'model parameters' part of the issue.

Design

I have implemented some functions and parameters to demonstrate how we might do this, but wanted @dalonsoa and @cheginit to agree before I do all of them (as there are many so I don't want to do them and have to change them...). To remind how these functions are expected to be called (example from #10 ):

# This kind of behaviour would be in generate_network.py
custom_parameters = {'outlet_derivation' : {'max_river_length': 50.0}}
params = validate_parameters(custom_parameters)

import networkx as nx
G = nx.Graph()
function_list = ['double_directed', 'split_long_edges']
for function in function_list:
    assert function in graphfcns.keys(), f"Function {function} not registered in graphfcns."
    G = graphfcns[function](G, **params)

(We may also have addresses separately in cases such as where a raster or downloaded data is needed, but as mentioned in #10 , it may be sensible to store addresses as parameters too).

A couple of discussion points:

I will try to have it so that one function adds or changes as few attributes as possible, but in some cases this doesn't seem to make sense that it has to be only one (e.g., see format_osmnx_lanes). I plan also to add some kind of 'verbosity' setting, so for example, even if the subcatchment shapes are not actually needed for the algortihm, they can be printed during the derive subcatchments function. Part of verbosity could also include printing the graph after every function, this would certainly be useful for debugging.
At the moment, the function docstring specifies that the graph that is being operated on needs to have certain attributes, and that it will add certain attributes. I can envisage that there may be some clever way to design this so that you can determine if a list of functions are feasible a-priori based on which functions need which attributes and which functions create such attributes. Any ideas on this front welcomed!

Edits:

A smaller point - the typing in register_graphfcn seems weird, like surely the ... is antithesis to typing, but I couldn't get it to work any other way since the graph functions require all different inputs and also kwargs.

Current graph functions to implement

Update empty branch from main

-Create graph_utilities.py -Create tests -Create parameters

cheginit · 2024-01-26T18:26:13Z

I don't understand the justification for the need to treat graph functions like this. From the development (especially typing) and maintenance perspective, this approach can make debugging difficult and reduce readability. What's the trade-off for this? If you want to apply some functions in a specific sequence, why not just have a wrapper function that does so?

barneydobson · 2024-01-27T07:05:27Z

Yes I probably should have motivated it a bit:

Keeping functions in a prescribed format makes it easy for a user to switch them out if they want to change the method (e.g. not shortest path but another algorithm, or a new method for sub catchment derivation). They just need to register a new graph function and change the function list in their config.
Making bigger changes to the workflow are also possible. E.g. some methods derive topology and hydraulics simultaneously or iteratively. Something like this is also easier to accommodate.
The anticipated use case is to run downloaders, pre processing and a certain number of graph functions only once for a case study. And then vary parameters for the remaining graph functions thousands of times. However, which graph functions need to be run once, and which need to be run lots depends on which parameters I want to vary. E.g., sub catchment derivation is slow, so I will probably want to run that (and thus all of the graph functions needed to enable it) only once, unless of course my parameters interact with sub catchment derivation.
On the debugging, the kwargs argument is ignored in a graph function, it is just so that they can be called in a loop. Happy for other ideas (especially RE maintenance) here but ultimately the benefits above of flexible function choice and operation order seem to me to outweigh. I think from a research and methods comparison perspective, it is very good if others can use the software to implement inherently different kinds of synthetic algorithms within this software.

Hopefully it is clear?

cheginit · 2024-01-29T11:27:03Z

I see. For this case, I recommend defining a base class with some required methods (can be enforced by using python's built-in ABC module) that users can inherent from. The class can be used for any changes to the workflow, either replace or add. For running the workflow, you can either define the __call__ method of the class in such a way that it will carry out the operations in the desired sequence, or have a method called, for example, run. The advantage of the __call__ method is that whenever you call the class, it runs the entire workflow. The advantage of the run method is that you can instantiate the class once and run the workflow multiple times as you change its attributes. IMHO, this approach gives you the flexibility that you're looking for without sacrificing readability and maintainability. That said, if you have already implemented the register approach and feel strongly about it, it's fine by me.

barneydobson · 2024-01-29T11:38:13Z

Thanks for the suggestions. The register is implemented but it was minimal work and easy to change. I think how this interacts with the config file #10 is probably the most critical aspect to useability and will be the hardest bit to code (saving this for now). I think a class based approach could make sense. I will have a look at ABC and see - the pydantic package for the parameters seems pretty nice so far so am happy if there is some graph function or maybe entire 'workflow' equivalent.

@dalonsoa - any views on this?

Edit:
In particular a class based approach would nicely handle the 'adds attributes' and 'requires attributes' elements of a graphfcn. Very easy to iterate over a list_of_functions and determine a priori whether a sequence is feasible

barneydobson · 2024-01-29T16:40:34Z

OK here is a testable mockup of how this could work in practice to give everyone a better idea - would be great to hear thoughts:

import networkx as nx
from swmmanywhere import parameters
from abc import ABC, abstractmethod

class BaseGraphFunction(ABC):
    @abstractmethod
    def __init__(self):
        self.required_attributes = []
        self.adds_attributes = []

    @abstractmethod
    def __call__(self, 
                 G: nx.Graph, 
                 **kwargs) -> nx.Graph:
        return G
    
    def validate_requirements(self,
                              attributes: set):
        for attribute in self.required_attributes:
            assert attribute in attributes, "{0} not in attributes".format(
                attribute)

    def add_graphfcn(self,
                    attributes: set):
        self.validate_requirements(attributes)
        return attributes.union(self.adds_attributes)

graphfcns = {}
   
def register_graphfcn(cls):
    graphfcns[cls.__name__] = cls
    return cls

def get_osmid_id(data):
    id_ = data.get('osmid', data.get('id'))
    if isinstance(id_, list):
        id_ = id_[0]
    return id_

@register_graphfcn
class assign_id(BaseGraphFunction):
    def __init__(self):
        super().__init__()
        self.required_attributes = ['osmid']
        self.adds_attributes = ['id']
    def __call__(self, 
                 G: nx.Graph, 
                **kwargs):
        for u, v, data in G.edges(data=True):
            data['id'] = get_osmid_id(data)
        return G
    
@register_graphfcn
class format_osmnx_lanes(BaseGraphFunction):
    def __init__(self):
        super().__init__()
        self.required_attributes = ['lanes']
        self.adds_attributes = ['width', 'nlanes']
    def __call__(self, 
                G: nx.Graph, 
                subcatchment_derivation: parameters.SubcatchmentDerivation, 
                **kwargs):
        G = G.copy()
        for u, v, data in G.edges(data=True):
            lanes = data.get('lanes',1)
            if isinstance(lanes, list):
                lanes = sum([float(x) for x in lanes])
            else:
                lanes = float(lanes)
            data['nlanes'] = lanes
            data['width'] = lanes * subcatchment_derivation.lane_width
        return G

@register_graphfcn
class double_directed(BaseGraphFunction):
    def __init__(self):
        super().__init__()
        self.required_attributes = ['id']
        self.adds_attributes = []
    def __call__(self, 
                 G: nx.Graph, 
                 **kwargs):
        G_new = G.copy()
        for u, v, data in G.edges(data=True):
            if (v, u) not in G.edges:
                reverse_data = data.copy()
                reverse_data['id'] = '{0}.reversed'.format(data['id'])
                G_new.add_edge(v, u, **reverse_data)
        return G_new


from swmmanywhere.prepare_data import download_street

# Download example graph
bbox = (-0.11643,51.50309,-0.11169,51.50549)
G = download_street(bbox)

# Define functions list
functions_list = ['assign_id', 'format_osmnx_lanes', 'double_directed']
functions_list_init = []
# Populate attributes common to all edges in graph
attributes = set(list(G.edges(data=True))[0][2].keys())
for u, v, data in G.edges(data=True):
    attributes.intersection(data.keys())

# Add and validate attributes and functions
for function_name in functions_list:
    functions_list_init.append(graphfcns[function_name]())
    attributes = functions_list_init[-1].add_graphfcn(attributes)

# iterate over functions
params = {'subcatchment_derivation' : parameters.SubcatchmentDerivation()}
for function in functions_list_init:
    G = function(G,**params)

# This will break because double_directed needs 'id' which has not yet been 
# added
G = download_street(bbox)

functions_list = ['double_directed', 'assign_id', 'format_osmnx_lanes']
attributes = set()
for u, v, data in G.edges(data=True):
    attributes.update(data.keys())

for function_name in functions_list:
    functions_list_init.append(graphfcns[function_name]())
    attributes = functions_list_init[-1].add_graphfcn(attributes)

params = {'subcatchment_derivation' : parameters.SubcatchmentDerivation()}
for function in functions_list_init:
    G = function(G,**params)

cheginit · 2024-01-29T20:01:34Z

That's not exactly what I had in mind. For example, if I were to refactor my BUSN code, here's the base class that I would have defined with all the must-have methods:

import functools
from abc import ABC, abstractmethod
from pathlib import Path
import networkx as nx
import geopandas as gpd
import xarray as xr
from typing import Optional, Any
import yaml


Loader = getattr(yaml, "CSafeLoader", yaml.SafeLoader)
yaml_load = functools.partial(yaml.load, Loader=Loader)

class BaseModel(ABC):
    def __init__(self, config_path: Path | str):
        self.config = self.load_config(config_path)
        # Additional initialization code if necessary

    @staticmethod
    def load_config(config_path: Path | str) -> dict[str, Any]:
        config = yaml_load(Path(config_path).read_text())
        # Additional config validation code
        return config

    @abstractmethod
    def get_street(self, bbox: tuple[float, float, float, float]) -> nx.DiGraph:
        pass

    @abstractmethod
    def network_cleanup(self, street: nx.DiGraph, merge_dist: int) -> nx.DiGraph:
        pass

    @abstractmethod
    def get_buildings(self, bbox: tuple[float, float, float, float]) -> gpd.GeoDataFrame:
        pass

    @abstractmethod
    def set_network_attrs(self, street: nx.DiGraph, building_footprints: Optional[gpd.GeoDataFrame], cover_mul: float) -> nx.DiGraph:
        pass

    @abstractmethod
    def set_street_topo(self, street: nx.DiGraph, slope_min: float) -> nx.DiGraph:
        pass

    @abstractmethod
    def pipe_attrs(self, street: nx.DiGraph) -> nx.DiGraph:
        pass

    @abstractmethod
    def flow_weight(self, street: nx.DiGraph) -> None:
        pass

    @abstractmethod
    def cover_weight(self, street: nx.DiGraph, cover_da: xr.DataArray) -> None:
        pass

    @abstractmethod
    def road_weight(self, street: nx.DiGraph) -> None:
        pass

    @abstractmethod
    def btw_weight(self, street: nx.DiGraph) -> None:
        pass

    @abstractmethod
    def compute_betweenness(self, street: nx.DiGraph) -> nx.DiGraph:
        pass

    @abstractmethod
    def remove_streets(self, street: nx.DiGraph, drain_levels: int) -> dict[float, nx.DiGraph]:
        pass

    @abstractmethod
    def run(self) -> None:
        """Abstract method for running the model's main workflow."""
        # It will use `self.config` to run the operations in desired order
        pass

Then, I would've defined a DefaultModel that inherits from this with default methods in my methodology. However, in this case, you can define several classes for each method that inherits from the base class. For example, there can be a class that is based on the shortest path that you implemented, another one for the betweenness centrality that I implemented, and so on.

dalonsoa · 2024-01-30T11:20:53Z

I see the advantage of @barneydobson implementation using small classes that deal with one particular activity, instead of plain functions, as it was the initial idea. It needs some work to polish some rough edges, but the validation step to ensure the right information is there is definitely useful.

I definitely do not understand the advantage that @cheginit implementation of an all-mighty class that do a lot of work with many methods to define (even if we consider inheritance). I do not see how that is more readable, easy to test and debug or extensible by end users. Anyone who wants to try something new will need to hack the core code to tell it to use a new (sub)class, rather than just providing a configuration that will run any registered functions (or classes) in the right order. And such registration can happen in a python module totally external to this tool, so no need to change the core code.

-Will likely need refactoring!

cheginit · 2024-01-30T14:57:02Z

Maybe I couldn't explain it well. There's no reason for a user to "hack the core code". If a user wants to change a specific step, they just need to inherit from one of the main classes, let's say BCModel, and then change that specific function. For example, if a user would like to modify the set_network_attrs step, they simply need to do:

class NewBCModel(BCModel):
    def set_network_attrs(...):
        # new code

This approach is common for packages that require this level of customization, such as optimization packages (pymoo) or ML ones like PyTorch.

I am not sure why you'd think this is difficult to read, test and debug. Perhaps because I am used to using many such packages, this approach feels more natural and suitable to me.

dalonsoa · 2024-01-30T15:06:57Z

Because you need the whole class that do everything (including testing) and there is a somewhat implicit implementation of much of the functionality (defined in the parent classes - I know this is what inheritance is all about, but I find that much harder to follow that a more functional approach). Why putting all possible functionality in a single class when you can have small classes that only need to worry about their own specific functionality?

About the customisation, I think we are seeing different use cases - or understanding the use cases differently. I see this tool used either directly from a command line or with a very simple script that reads a configuration file and calls high level functions. In other words, an application. You see it more as, following your example, PyTorch, a library that contains many bits and pieces that you put together to do some customised tasks. Both are totally fine, they are just different and need different solutions. I don't know what @barneydobson had in mind.

cheginit · 2024-01-30T15:33:59Z

Ok, I see what you mean. The reason I think this class-based approach makes more sense is that we have all these functionalities and there are also different methods of generating the network, each of which uses a subset of these functionalities. So, IMHO, since each method bundles a subset of these functionalities, it's easier to understand and follow the workflow when all required steps are gathered in a class. There is a base class that contains the steps shared between all methods, then there are method-specific classes containing only modified/additional steps.

I think, the advantages of this approach don't depend on whether it's mainly used via CLI/config file or scripting. Having said that, since @barneydobson and @dalonsoa will be maintaining the package, using an implementation that you're more comfortable with is fine by me.

barneydobson · 2024-01-31T11:00:07Z

OK thanks so much @cheginit and @dalonsoa for this very helpful discussion. I definitely see the pros and cons of both. I think we are probably all agreed that there is both a need for and benefit of having a 'default loadout' that can be customised by a user for their case. For me I think the practical distinction above is boiling down to whether this default is contained in a default config file, or a default model class - with many implications discussed.

One thing that was not made explicit in @cheginit 's approach was the handling of graphfcn attribute dependencies and validation. Which I suppose still treating the graphfcns as a class is still probably the most general way to handle this. Are these graphfcns defined either within or attached to a model_class or instantiated within a function in generate_network as defined by the config file? I don't know - but I think from the perspective of this PR, the graphfcns are written the same.

If you are both happy with it - perhaps we can revisit the discussion of default config vs default model class when at #10 , this will also have the benefit of some more tangible examples.

-Load test data rather than download -test derive_rc -minor edits to fix derive subs

cheginit · 2024-01-31T14:19:08Z

One thing that was not made explicit in cheginit 's approach was the handling of graphfcn attribute dependencies and validation.

I think all validation and checks should be done when instantiating the class. If you look at the example template class that I wrote, there is a load_config method with a comment about validations. Since all the other methods in the class will use self.config to get the attributes, so when you validate and check the state of the required attributes during loading the config, then method calls will run without any issues.

The validation is done by calling the functions or classes that you develop separately in the other PR and don't need to (shouldn't) be part of this base class.

- Introduce Addresses into parameters - Test

-Add tests -Identify and fix error in interpolate_points_on_raster in geospatial_utilities and its test

dalonsoa

I've made a collection of comments and suggestions to make the code cleaner and easier to read. I have not enter into discussing the science and specialised packages and formats because, to be honest, you know more about that.

It is a huge PR, so it might deserve another review because I migth have missed something, but so far is looking really good.

swmmanywhere/geospatial_utilities.py

swmmanywhere/graph_utilities.py

Co-authored-by: Diego Alonso Álvarez <[email protected]>

Make graphfcn registry a dict

barneydobson and others added 4 commits January 26, 2024 10:36

Merge pull request #30 from ImperialCollegeLondon/main

6b6b629

Update empty branch from main

Initial attempt at graph function examples

3ac4d52

-Create graph_utilities.py -Create tests -Create parameters

Update pydantic req

1880635

Update requirements

ad5e321

barneydobson self-assigned this Jan 26, 2024

Add split_long_edges (first attempt)

3166f52

Dobson added 2 commits January 30, 2024 12:36

Add subcatchment derivation and tests

4eba34d

-Will likely need refactoring!

Update to use pathlib

da89ee1

barneydobson mentioned this pull request Jan 31, 2024

Configuration file #10

Closed

Dobson added 3 commits January 31, 2024 12:53

Create some data for tests

c4fc7e2

Edit geosptail_uitilties

a516bec

-Load test data rather than download -test derive_rc -minor edits to fix derive subs

Add load/save_graph

39cdca1

Dobson added 6 commits January 31, 2024 15:28

Update test_data CRS

b6f5095

Add calculate_contributing_area to graph_utilities

efb12d4

- Introduce Addresses into parameters - Test

Update geospatial_utilities.py

dfc44ba

Add set_elevation graphfcn

cfadabc

-Add tests -Identify and fix error in interpolate_points_on_raster in geospatial_utilities and its test

Add and test set_surface_slope

8fa708e

Add chahinan angle

36ba850

Use f-string formatting

abea9f3

barneydobson mentioned this pull request Feb 6, 2024

Refactor split_long_edges #39

Closed

barneydobson added 4 commits February 6, 2024 10:33

Tidy up contributing area assignment

1658ebc

Tidy bounds normalising in weights

85d09a7

Convert set_slope to comprehension

83ec08c

Use shapely for geoms rather than shapely.geom

c35616b

barneydobson mentioned this pull request Feb 6, 2024

Shapely operations could be vectorised for performance #40

Closed

barneydobson mentioned this pull request Feb 14, 2024

Subcatchment derivation refactor #11

Closed

dalonsoa requested changes Feb 16, 2024

View reviewed changes

Dobson and others added 12 commits February 20, 2024 10:36

Remove forward referencing in typing

4a77002

More informative pysheds names

ed5383f

Remove completed TODO

6dd7394

Update swmmanywhere/geospatial_utilities.py

f06043b

Co-authored-by: Diego Alonso Álvarez <[email protected]>

move serialize linestring out of save_graph

3b5e1f7

Update swmmanywhere/graph_utilities.py

6cba347

Co-authored-by: Diego Alonso Álvarez <[email protected]>

Merge remote-tracking branch 'origin/graphfcns' into graphfcns

586aa68

Use __init_subclass__ for graphfcn

19707a0

Update swmmanywhere/graph_utilities.py

ff8cbc4

Co-authored-by: Diego Alonso Álvarez <[email protected]>

Test opposite and unindent in shortest path

beb0b76

Add negative cycle check

4a1cca9

Add weighted negative cycle check

2bfedaf

This was referenced Feb 20, 2024

Refactor some of graphfcns #46

Closed

Implement loggers #47

Closed

More streamlined unit conversion #48

Open

Add Chahinian slope cost function #44

Closed

barneydobson and others added 2 commits February 25, 2024 16:12

Update geospatial_utilities.py

682bbe4

Update graph_utilities.py

c529443

Make graphfcn registry a dict

barneydobson merged commit e9284f3 into main Feb 28, 2024
7 of 10 checks passed

barneydobson deleted the graphfcns branch February 28, 2024 12:04

barneydobson mentioned this pull request Jul 29, 2024

Separate network grpahfcnns #247

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Graphfcns #31

Graphfcns #31

barneydobson commented Jan 26, 2024 •

edited

Loading

cheginit commented Jan 26, 2024

barneydobson commented Jan 27, 2024

cheginit commented Jan 29, 2024

barneydobson commented Jan 29, 2024 •

edited

Loading

barneydobson commented Jan 29, 2024 •

edited by dalonsoa

Loading

cheginit commented Jan 29, 2024

dalonsoa commented Jan 30, 2024

cheginit commented Jan 30, 2024

dalonsoa commented Jan 30, 2024

cheginit commented Jan 30, 2024

barneydobson commented Jan 31, 2024 •

edited

Loading

cheginit commented Jan 31, 2024

dalonsoa left a comment

Graphfcns #31

Graphfcns #31

Conversation

barneydobson commented Jan 26, 2024 • edited Loading

Description

Design

Current graph functions to implement

cheginit commented Jan 26, 2024

barneydobson commented Jan 27, 2024

cheginit commented Jan 29, 2024

barneydobson commented Jan 29, 2024 • edited Loading

barneydobson commented Jan 29, 2024 • edited by dalonsoa Loading

cheginit commented Jan 29, 2024

dalonsoa commented Jan 30, 2024

cheginit commented Jan 30, 2024

dalonsoa commented Jan 30, 2024

cheginit commented Jan 30, 2024

barneydobson commented Jan 31, 2024 • edited Loading

cheginit commented Jan 31, 2024

dalonsoa left a comment

Choose a reason for hiding this comment

barneydobson commented Jan 26, 2024 •

edited

Loading

barneydobson commented Jan 29, 2024 •

edited

Loading

barneydobson commented Jan 29, 2024 •

edited by dalonsoa

Loading

barneydobson commented Jan 31, 2024 •

edited

Loading