CPU vs GPU methods for points in polygon #1090

epifanio · 2023-04-24T16:03:02Z

epifanio
Apr 24, 2023

Hi,

I am working on a point in a polygon problem, and I cam across cuspatial point in polygon implementation, and I compared it with other solutions found on the web. I am reporting my findings on a SO answer , but I thought it was also worth mentioning here.

For comparing methods, I coded the CPU algorithms using numba and reproduced some results with synthetic data - code follows:

from time import time
import numba
import numpy as np
from numba import jit, njit
import cudf
import cuspatial

is_inside_sm fastest method on the CPU:

@jit(nopython=True)
def is_inside_sm(polygon, point):
    length = len(polygon) - 1
    dy2 = point[1] - polygon[0][1]
    intersections = 0
    ii = 0
    jj = 1

    while ii < length:
        dy = dy2
        dy2 = point[1] - polygon[jj][1]

        # consider only lines which are not completely above/below/right from the point
        if dy * dy2 <= 0.0 and (
            point[0] >= polygon[ii][0] or point[0] >= polygon[jj][0]
        ):

            # non-horizontal line
            if dy < 0 or dy2 < 0:
                F = dy * (polygon[jj][0] - polygon[ii][0]) / (dy - dy2) + polygon[ii][0]

                if (
                    point[0] > F
                ):  # if line is left from the point - the ray moving towards the left, will intersect it
                    intersections += 1
                elif point[0] == F:  # point on line
                    return 2

            # point on upper peak (dy2=dx2=0) or horizontal line (dy=dy2=0 and dx*dx2<=0)
            elif dy2 == 0 and (
                point[0] == polygon[jj][0]
                or (
                    dy == 0
                    and (point[0] - polygon[ii][0]) * (point[0] - polygon[jj][0]) <= 0
                )
            ):
                return 2

        ii = jj
        jj += 1

    # print 'intersections =', intersections
    return intersections & 1


@njit(parallel=True)
def is_inside_sm_parallel(points, polygon):
    ln = len(points)
    D = np.empty(ln, dtype=numba.boolean)
    for i in numba.prange(ln):
        D[i] = is_inside_sm(polygon, points[i])
    return D

Comparison:

np.random.seed(2)

time_parallelpointinpolygon = []
time_mpltPath = []
time_ray_tracing_numpy_numba = []
time_is_inside_sm_parallel = []
time_is_inside_postgis_parallel = []
time_is_inside_rapids = []

n_points = []

for i in range(1, 1000000002, 100000000):
    n_points.append(i)

    lenpoly = 1000
    polygon = [
        [np.sin(x) + 0.5, np.cos(x) + 0.5] for x in np.linspace(0, 2 * np.pi, lenpoly)
    ]
    polygon = np.array(polygon)
    N = i
    points = np.random.uniform(-1.5, 1.5, size=(N, 2))


    # Method 1 CPU
    start_time = time()
    inside4 = is_inside_sm_parallel(points, polygon)
    time_is_inside_sm_parallel.append(time() - start_time)


    # Method 2 GPU
    x_pnt = points[:, 0]
    y_pnt = points[:, 1]

    x_poly = polygon[:, 0]
    y_poly = polygon[:, 1]
    start_time = time()
    result = cuspatial.point_in_polygon(
        x_pnt,
        y_pnt,
        cudf.Series([0], index=["geom"]),
        cudf.Series([0], name="r_pos", dtype="int32"),
        x_poly,
        y_poly,
    )
    time_is_inside_rapids.append(time() - start_time)

import matplotlib.pyplot as plt

plt.plot(n_points, time_is_inside_sm_parallel, label="is_inside_sm_parallel")
plt.plot(n_points, time_is_inside_rapids, label="time_is_inside_rapids")

plt.xlabel("N points")
plt.ylabel("time (sec)")
plt.legend(loc="best")
plt.show()

which lead to the following results:

 time_is_inside_sm_parallel:          19.110478401184082 
 
 time_is_inside_rapids:               45.32019829750061

Hardware specs:

GPU: RTX 3090 TI
CPU: R9 5950X @ 4.8500 all cores

Is the "is_inside_sm" method portable to GPU? Will cuspatial benefit from it ?
I understand that the cuspatial method offers much more flexibility and can handle perhaps complex polygonal shapes - but for a simple scenario, the is_inside_sm seems more convenient as it doesn't require a complex environment and dedicated hardware. I wonder how this will compare with the Quad-Tree based method in cuspatial - will someone like to extend this example with synthetic data, including it?

zhangjianting · 2023-04-24T16:20:25Z

zhangjianting
Apr 24, 2023

cuspatial PIP test should not take 45s for testing 1B points in a single polygon with 1000 vertices. Python overhead and CPU->GPU transfer are likely the dominating factors.

0 replies

jarmak-nv · 2023-04-24T16:24:59Z

jarmak-nv
Apr 24, 2023
Collaborator

Hi @epifanio! Appreciate you taking the time to do this!

A few things come to my mind here:

cuSpatial stores all of our coordinates in FP64 format, which on hardware like the 3090ti has significantly lower perf (1/64th on the 3090ti) than FP32 or FP16. It would be interesting to test on a V100 or similar to see the difference
- We're looking into allowing users to lower precision to accelerate their use-cases when FP64 is unnecessary
Agree it would be valuable to test using our quadtree implementation - it's our recommended approach for the fastest implementation (just a little bit slower to get up and running)

To @zhangjianting's point, curious if consecutive runs of the cuSpatial PIP lower the time (should help show kernel launch overhead).

In general, right now we're really focused on increasing the reach of cuSpatial, then we can implement further accelerations, but regardless this is valuable and we really appreciate the feedback. I hope you find the time to test using the quadtree.

Thanks!

2 replies

thomcom Apr 24, 2023
Collaborator

Allowing FP32 might be a pretty big boon toward getting cuspatial out there. I'm not too sad to lose against a Ryzen 5950 because they're awesome, but if we could run the 3090 in 32 bit mode then I think we'd retain a healthy lead.

thomcom Apr 24, 2023
Collaborator

What @epifanio has in his machine is pretty close to "consumer state-of-the-art" and the 3090 Ti seems to be really underperforming. Crossing my fingers that the issue has more to do with his offsets buffer than his GPU, will be testing.

thomcom · 2023-04-24T16:37:45Z

thomcom
Apr 24, 2023
Collaborator

Thanks for looking at cuspatial @epifanio, your results are interesting to say the least. Can you tell me which version of cuspatial you're using?

2 replies

thomcom Apr 24, 2023
Collaborator

Also thanks for your great reproducer example I'm currently updating it to the latest version and having a gander at your performance results.

epifanio Apr 24, 2023
Author

Built from a branch on my fork: https://github.com/epifanio/cuspatial/tree/gdal_362

cuspatial.__version__

>>> '0+untagged.1758.gc856fa0'

thomcom · 2023-04-24T17:01:02Z

thomcom
Apr 24, 2023
Collaborator

Sounds like @jarmak-nv is right about the architectural limitations @epifanio . Here's my modifications to get it up to the latest API spec and I only iterate twice, but on my Xeon 3.4 Ghz with 6 cores and 12 threads, running a GV 100 GPU (32GB) I got 10x faster with cuspatial than using your awesome numba kernel:

np.random.seed(2)

time_parallelpointinpolygon = []
time_mpltPath = []
time_ray_tracing_numpy_numba = []
time_is_inside_sm_parallel = []
time_is_inside_postgis_parallel = []
time_is_inside_rapids = []

n_points = []

for i in range(1, 200_000_002, 100_000_000):
    n_points.append(i)

    lenpoly = 1000
    polygon = [
        [np.sin(x) + 0.5, np.cos(x) + 0.5] for x in np.linspace(0, 2 * np.pi, lenpoly)
    ]
    polygon = np.array(polygon)
    N = i
    points = np.random.uniform(-1.5, 1.5, size=(N, 2))


    # Method 1 CPU
    start_time = time()
    inside4 = is_inside_sm_parallel(points, polygon)
    time_is_inside_sm_parallel.append(time() - start_time)


    # Method 2 GPU
    x_pnt = points[:, 0]
    y_pnt = points[:, 1]
    points = cuspatial.core.geoseries.GeoSeries.from_points_xy(points.flatten())

    x_poly = polygon[:, 0]
    y_poly = polygon[:, 1]
    polygons = cuspatial.core.geoseries.GeoSeries.from_polygons_xy(
        polygon.flatten(), [0, 1000], [0, 1], [0, 1]
    )
    breakpoint()
    """
    result = cuspatial.point_in_polygon(
        x_pnt,
        y_pnt,
        cudf.Series([0], index=["geom"]),
        cudf.Series([0], name="r_pos", dtype="int32"),
        x_poly,
        y_poly,
    )
    """
    print(len(points))
    print(len(polygons))
    start_time = time()
    result = cuspatial.point_in_polygon(points, polygons)
    time_is_inside_rapids.append(time() - start_time)

time_is_inside_rapids

[0.43572211265563965, 1.0868034362792969, 2.0259852409362793]

time_is_inside_sm_parallel

[0.722322940826416, 9.562219381332397, 20.059306144714355]

2 replies

epifanio Apr 24, 2023
Author

@thomcom cuspatial.core.geoseries.GeoSeries.from_points_xy this is what is available from the new API. right?

thomcom Apr 24, 2023
Collaborator

Yes, in order to get a GeoSeries containing Points objects to pass to the API you'll need .from_points_xy and also .from_polygons_xy as I gave in my response.

thomcom · 2023-04-24T17:03:57Z

thomcom
Apr 24, 2023
Collaborator

I'm looking more closely at your code here

    result = cuspatial.point_in_polygon(
        x_pnt,
        y_pnt,
        cudf.Series([0], index=["geom"]),
        cudf.Series([0], name="r_pos", dtype="int32"),
        x_poly,
        y_poly,
    )

Using the older API. I believe it should read

    result = cuspatial.point_in_polygon(
        x_pnt,
        y_pnt,
        cudf.Series([0, 1], index=["geom"]),
        cudf.Series([0, 1000], name="r_pos", dtype="int32"),
        x_poly,
        y_poly,
    )
    ```
    Which would definitely produce some undefined behavior.
    
    Looking at the docs my guess is you're using 23.02, which I should be able to roll back to and test if passing in the wrong geometry and ring offsets produces very slow behavior. I'm surprised it works as such.

10 replies

epifanio Apr 24, 2023
Author

@thomcom , I may have some issue running the code with the new API on the new build - do I need to clean up the memory after running each iteration?

It works until 10E8 points, the following is from a copy/paste of your code:

Attempting to run with range(1, 1000_000_002, 1000_000_000): the kernel (jupyter) hangs and then dies - it was able to handle 10E9 before

epifanio Apr 24, 2023
Author

running the following in a single cell:

import numpy as np
import cuspatial

lenpoly = 1000
polygon = [
    [np.sin(x) + 0.5, np.cos(x) + 0.5] for x in np.linspace(0, 2 * np.pi, lenpoly)
]
polygon = np.array(polygon)
N = 1000000001
points = np.random.uniform(-1.5, 1.5, size=(N, 2))


x_pnt = points[:, 0]
y_pnt = points[:, 1]
points = cuspatial.core.geoseries.GeoSeries.from_points_xy(points.flatten())

x_poly = polygon[:, 0]
y_poly = polygon[:, 1]
polygons = cuspatial.core.geoseries.GeoSeries.from_polygons_xy(
    polygon.flatten(), [0, 1000], [0, 1], [0, 1]
)

result = cuspatial.point_in_polygon(points, polygons)

lead to the following traceback

https://gist.github.com/epifanio/61a8a31b524f7811f0493be9f7a094be

epifanio Apr 24, 2023
Author

I reverted the jupyter kernmel to use the old docker image, a test confirmed the memory issue is only with the new build

harrism Apr 24, 2023
Collaborator

If there is an issue, can you file it as a github issue? Thanks!

epifanio Apr 25, 2023
Author

If there is an issue, can you file it as a github issue? Thanks!

#1092

harrism · 2023-04-24T21:43:05Z

harrism
Apr 24, 2023
Collaborator

@epifanio I found the SO answer you mentioned but didn't link.

https://stackoverflow.com/a/48760556/749748

1 reply

epifanio Apr 25, 2023
Author

sorry I forgot to paste the link! I will edit my answer with the right syntax for the offset - still using the old API

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CPU vs GPU methods for points in polygon #1090

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 6 comments 17 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

CPU vs GPU methods for points in polygon #1090

epifanio Apr 24, 2023

Replies: 6 comments · 17 replies

zhangjianting Apr 24, 2023

jarmak-nv Apr 24, 2023 Collaborator

thomcom Apr 24, 2023 Collaborator

thomcom Apr 24, 2023 Collaborator

thomcom Apr 24, 2023 Collaborator

thomcom Apr 24, 2023 Collaborator

epifanio Apr 24, 2023 Author

thomcom Apr 24, 2023 Collaborator

epifanio Apr 24, 2023 Author

thomcom Apr 24, 2023 Collaborator

thomcom Apr 24, 2023 Collaborator

epifanio Apr 24, 2023 Author

epifanio Apr 24, 2023 Author

epifanio Apr 24, 2023 Author

harrism Apr 24, 2023 Collaborator

epifanio Apr 25, 2023 Author

harrism Apr 24, 2023 Collaborator

epifanio Apr 25, 2023 Author

epifanio
Apr 24, 2023

Replies: 6 comments 17 replies

zhangjianting
Apr 24, 2023

jarmak-nv
Apr 24, 2023
Collaborator

thomcom Apr 24, 2023
Collaborator

thomcom Apr 24, 2023
Collaborator

thomcom
Apr 24, 2023
Collaborator

thomcom Apr 24, 2023
Collaborator

epifanio Apr 24, 2023
Author

thomcom
Apr 24, 2023
Collaborator

epifanio Apr 24, 2023
Author

thomcom Apr 24, 2023
Collaborator

thomcom
Apr 24, 2023
Collaborator

epifanio Apr 24, 2023
Author

epifanio Apr 24, 2023
Author

epifanio Apr 24, 2023
Author

harrism Apr 24, 2023
Collaborator

epifanio Apr 25, 2023
Author

harrism
Apr 24, 2023
Collaborator

epifanio Apr 25, 2023
Author