Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compute Grid Centerpoint using Welzl's algorithm #811

Merged
merged 60 commits into from
Oct 7, 2024
Merged

Conversation

rajeeja
Copy link
Contributor

@rajeeja rajeeja commented Jun 11, 2024

No description provided.

@rajeeja rajeeja changed the title DRAFT: Compute Grid Centerpoint using Welzl's algorithm Compute Grid Centerpoint using Welzl's algorithm Jun 26, 2024
@rajeeja rajeeja requested a review from philipc2 June 26, 2024 23:10
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
uxarray/grid/coordinates.py Show resolved Hide resolved
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
@philipc2
Copy link
Member

philipc2 commented Jul 2, 2024

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

@philipc2 philipc2 linked an issue Jul 2, 2024 that may be closed by this pull request
philipc2 and others added 4 commits July 3, 2024 12:08
…dependency (use with arcs and arcs use coordinates). o Remove new routine in favor of using the existing angle b/w vectors to calculate distance.
@rajeeja
Copy link
Contributor Author

rajeeja commented Jul 9, 2024

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.

Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

@philipc2
Copy link
Member

philipc2 commented Aug 1, 2024

I'm not a big fan of the _ctrpt approach to the properties. These values are still, for example, face_lon and face_lat, they simply use a different algorithm to compute them.
Maybe we should consider the following design?

  • Have the default face_xyz or face_latlon values be either what was parsed from a dataset OR the existing Cartesian averaging
  • Introduce a grid-level Grid.populate_face_coordinates() function (similar to the internal ones that we have to allow the user to re-populate or set the desired algorithm they'd like to use for the construction.

This would make the workflow look something like the following:

# get the value of face_lon without needing to specify an algorithm, will use either the stored value or cart avg
uxgrid.face_lon

# I want to explicitly set the algorithm to be Welzl
uxgrid.populate_face_coordinates(method='welzl')

# value will now be populated using your approach
uxgrid.face_lon

# I want to re-populate again using cartesiain averaging
uxgrid.populate_face_coordinates(method='cartesian average')

# value will now be populated using artesian average
uxgrid.face_lon

This allows us to not need to define any new properties and to better stick to the UGRID conventions. What do you think?

During my testing and sometimes in testing the face geometry both centerpoint and centroid might be needed. When working with a mesh I wanted to check how much did one deviate from the other and if one or the other made more sense.

We might be able to get both with the way you propose also, but with two calls to populate one with either options, having both available to the grid object at once might be better.

We can get another name for ctrpt, I don't like it also:)

My main concern with breaking up the different types of coordinates in separate attributes is that it'll add extra overhead for us to ensure that the coordinates we read match the ones that we want to store, not to mention needing to redefine / extent the UGRID conventions further past what we've already done. Even with this (and say some other method down the line), this could end up looking like:

  • Grid.face_lon
  • Grid.face_lon_centerpoint
  • Grid.face_lon_some_other_defenition

Consider the case where two UGRID (or any other format) grid files are loaded into UXarray. If we move forward with a split attribute approach, we'd need to ensure that the coordinates we are reading either go into face_lat/lon or face_lat/lon_centerpoints. There's also no easy way to determine what method each dataset used to compute the centroids at the loading step without parsing for any specific attributes in the file (if they exist), since this is not outlined in the UGRID conventions.

I'm still in favor of keeping face_lon and face_lat and general variables for storing some coordinate that represents the center/midpoint/centroid etc. of the face. This does limit us to only storing one type of "center" coordinate at a time, but ensures that we don't restrict us to strictly defining the type of definition for the center.

@paullric @rljacob Is there ever a sceneiro where we would want to have more than one definition of a "center" coordinate attached to a grid at a time?

Copy link
Member

@erogluorhan erogluorhan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. The only thing remaining was the reduced code coverage, but apparently codecov can't track test cases written for the njit-decorated functions. After we figure a path forward with that, I am happy to approve this.

uxarray/grid/grid.py Outdated Show resolved Hide resolved
uxarray/grid/utils.py Outdated Show resolved Hide resolved
@rajeeja
Copy link
Contributor Author

rajeeja commented Sep 16, 2024

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

@erogluorhan
Copy link
Member

erogluorhan commented Sep 17, 2024

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

@rajeeja
Copy link
Contributor Author

rajeeja commented Sep 17, 2024

Once, we resolve this circular dependency issue. I will disable NUMBA to check codecov - it might increase the percentage coverage issue

How about doing this in such a way: In the beginning of each indidvual case that tests a njit-decorated function, disable numba, in the end, enable it back? This helps us notice right away that whenever we see this kind of disable-enable pairs, that is a case for testing a njit-decorated function.

I removed all the numba stuff on my local and the coverage (85% total and 95% for coordinates.py - see pic below) is considerably higher for sha: dba53dc8 which is before I introduced circular dependency.

Screenshot 2024-09-17 at 17 37 01

benchmarks/quad_hexagon.py Outdated Show resolved Hide resolved
benchmarks/mpas_ocean.py Outdated Show resolved Hide resolved
benchmarks/mpas_ocean.py Outdated Show resolved Hide resolved
uxarray/grid/coordinates.py Outdated Show resolved Hide resolved
uxarray/grid/utils.py Outdated Show resolved Hide resolved
@UXARRAY UXARRAY deleted a comment from github-actions bot Oct 7, 2024
Copy link

github-actions bot commented Oct 7, 2024

ASV Benchmarking

Benchmark Comparison Results

Benchmarks that have improved:

Change Before [bca3b8c] After [066bb6b] Ratio Benchmark (Parameter)
failed 6.67±0.1ms n/a mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('120km')
failed 3.50±0.03ms n/a mpas_ocean.ConstructFaceLatLon.time_cartesian_averaging('480km')
failed 3.60±0.01s n/a mpas_ocean.ConstructFaceLatLon.time_welzl('120km')
failed 226±2ms n/a mpas_ocean.ConstructFaceLatLon.time_welzl('480km')
- 955±20ns 832±9ns 0.87 mpas_ocean.ConstructTreeStructures.time_kd_tree('120km')
- 340±30ns 285±1ns 0.84 mpas_ocean.ConstructTreeStructures.time_kd_tree('480km')
- 466M 407M 0.87 mpas_ocean.Integrate.peakmem_integrate('480km')

Benchmarks that have stayed the same:

Change Before [bca3b8c] After [066bb6b] Ratio Benchmark (Parameter)
442M 438M 0.99 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
442M 442M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
445M 445M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
443M 443M 1 face_bounds.FaceBounds.peakmem_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
1.59±0.01s 1.59±0s 1 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/mpas/QU/oQU480.231010.nc'))
228±0.05ms 225±1ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/scrip/outCSne8/outCSne8.nc'))
2.03±0.01s 2.07±0.01s 1.02 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/geoflow-small/grid.nc'))
7.78±0.07ms 7.68±0.07ms 0.99 face_bounds.FaceBounds.time_face_bounds(PosixPath('/home/runner/work/uxarray/uxarray/test/meshfiles/ugrid/quad-hexagon/grid.nc'))
3.02±0.04s 2.99±0.02s 0.99 import.Imports.timeraw_import_uxarray
675±20μs 654±8μs 0.97 mpas_ocean.CheckNorm.time_check_norm('120km')
442±5μs 429±4μs 0.97 mpas_ocean.CheckNorm.time_check_norm('480km')
636±10ms 645±2ms 1.01 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('120km')
41.6±0.4ms 41.2±0.2ms 0.99 mpas_ocean.ConnectivityConstruction.time_face_face_connectivity('480km')
1.84±0.1ms 1.93±0.1ms 1.05 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('120km')
499±20μs 497±10μs 1 mpas_ocean.ConnectivityConstruction.time_n_nodes_per_face('480km')
1.26±0μs 1.32±0.02μs 1.05 mpas_ocean.ConstructTreeStructures.time_ball_tree('120km')
332±5ns 306±4ns 0.92 mpas_ocean.ConstructTreeStructures.time_ball_tree('480km')
1.04±0.01s 1.06±0s 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', False)
54.8±0.5ms 55.8±0.6ms 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('120km', True)
78.7±0.2ms 80.3±1ms 1.02 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', False)
5.10±0.09ms 5.07±0.1ms 0.99 mpas_ocean.GeoDataFrame.time_to_geodataframe('480km', True)
318M 318M 1 mpas_ocean.Gradient.peakmem_gradient('120km')
295M 295M 1 mpas_ocean.Gradient.peakmem_gradient('480km')
2.69±0.02ms 2.66±0.03ms 0.99 mpas_ocean.Gradient.time_gradient('120km')
295±4μs 286±0.5μs 0.97 mpas_ocean.Gradient.time_gradient('480km')
239±7μs 251±7μs 1.05 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('120km')
121±0.7μs 121±0.6μs 1 mpas_ocean.HoleEdgeIndices.time_construct_hole_edge_indices('480km')
425M 424M 1 mpas_ocean.Integrate.peakmem_integrate('120km')
175±2ms 177±5ms 1.01 mpas_ocean.Integrate.time_integrate('120km')
11.8±0.1ms 12.9±0.1ms 1.1 mpas_ocean.Integrate.time_integrate('480km')
347±5ms 344±5ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'exclude')
339±3ms 342±4ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'include')
343±3ms 342±2ms 1 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('120km', 'split')
22.6±0.2ms 22.8±0.3ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'exclude')
22.6±0.4ms 22.9±0.3ms 1.01 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'include')
23.1±0.5ms 22.9±0.4ms 0.99 mpas_ocean.MatplotlibConversion.time_dataarray_to_polycollection('480km', 'split')
54.5±0.1ms 54.8±0.3ms 1.01 mpas_ocean.RemapDownsample.time_inverse_distance_weighted_remapping
44.2±0.04ms 44.2±0.01ms 1 mpas_ocean.RemapDownsample.time_nearest_neighbor_remapping
360±2ms 360±0.5ms 1 mpas_ocean.RemapUpsample.time_inverse_distance_weighted_remapping
263±0.2ms 265±1ms 1.01 mpas_ocean.RemapUpsample.time_nearest_neighbor_remapping
295M 291M 0.99 quad_hexagon.QuadHexagon.peakmem_open_dataset
291M 291M 1 quad_hexagon.QuadHexagon.peakmem_open_grid
6.49±0.04ms 6.56±0.06ms 1.01 quad_hexagon.QuadHexagon.time_open_dataset
5.55±0.01ms 5.66±0.06ms 1.02 quad_hexagon.QuadHexagon.time_open_grid

Copy link
Member

@philipc2 philipc2 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@philipc2 philipc2 merged commit 93f7575 into main Oct 7, 2024
18 of 20 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
run-benchmark Run ASV benchmark workflow
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Welzl's algorithm for "face centerpoint"
6 participants