Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support dask-geopandas #114

Open
davidbrochart opened this issue Jul 29, 2022 · 3 comments
Open

Support dask-geopandas #114

davidbrochart opened this issue Jul 29, 2022 · 3 comments
Labels
proposal Idea for a new feature.

Comments

@davidbrochart
Copy link

davidbrochart commented Jul 29, 2022

@snowman2 mentioned using dask-geopandas to rasterize in chunks using Dask. I just tried and it doesn't seem to work out of the box. Here is an example:

from functools import partial

import geopandas
import dask_geopandas
from rasterio.enums import MergeAlg
from geocube.api.core import make_geocube
from geocube.rasterize import rasterize_image

# download: wget https://raw.githubusercontent.com/drei01/geojson-world-cities/master/cities.geojson

df = geopandas.read_file(
    "cities.geojson",
    crs="epsg:4326",
)

ddf = dask_geopandas.from_geopandas(df, npartitions=4)
ddf["mask"] = 1

geo_grid = make_geocube(
    vector_data=ddf,
    resolution=(-0.01, 0.01),
    measurements=["mask"],
    rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True),
    fill=0,
)
Traceback
---------------------------------------------------------------------------
VectorDataError                           Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 geo_grid = make_geocube(
      2     vector_data=ddf,
      3     resolution=(-0.05, 0.05),
      4     measurements=['mask'],
      5     rasterize_function=partial(rasterize_image, merge_alg=MergeAlg.add, all_touched=True),
      6     fill=0,
      7 )
      8 geo_grid

File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/api/core.py:99, in make_geocube(vector_data, measurements, datetime_measurements, output_crs, resolution, align, geom, like, fill, group_by, interpolate_na_method, categorical_enums, rasterize_function)
     35 """
     36 Rasterize vector data into an ``xarray`` object.  Each attribute will be a data
     37 variable in the :class:`xarray.Dataset`.
   (...)
     95 
     96 """
     97 geobox_maker = GeoBoxMaker(output_crs, resolution, align, geom, like)
---> 99 return VectorToCube(
    100     vector_data=vector_data,
    101     geobox_maker=geobox_maker,
    102     fill=fill,
    103     categorical_enums=categorical_enums,
    104 ).make_geocube(
    105     measurements=measurements,
    106     datetime_measurements=datetime_measurements,
    107     group_by=group_by,
    108     interpolate_na_method=interpolate_na_method,
    109     rasterize_function=rasterize_function,
    110 )

File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/vector_to_cube.py:79, in VectorToCube.__init__(self, vector_data, geobox_maker, fill, categorical_enums)
     53 def __init__(
     54     self,
     55     vector_data: Union[str, os.PathLike, geopandas.GeoDataFrame],
   (...)
     58     categorical_enums: Optional[Dict[str, List]],
     59 ):
     60     """
     61     Initialize the GeoCube class.
     62 
   (...)
     77 
     78     """
---> 79     self._vector_data = load_vector_data(vector_data)
     80     self._geobox = geobox_maker.from_vector(self._vector_data)
     81     self._grid_coords = affine_to_coords(
     82         self._geobox.affine, self._geobox.width, self._geobox.height
     83     )

File ~/mambaforge/envs/xarray_leaflet/lib/python3.10/site-packages/geocube/geo_utils/geobox.py:73, in load_vector_data(vector_data)
     71     raise VectorDataError("Empty GeoDataFrame.")
     72 if "geometry" not in vector_data.columns:
---> 73     raise VectorDataError(
     74         "'geometry' column missing. Columns in file: "
     75         f"{vector_data.columns.values.tolist()}"
     76     )
     78 # make sure projection is set
     79 if not vector_data.crs:

VectorDataError: 'geometry' column missing. Columns in file: [0, 1, 2]
@davidbrochart davidbrochart added the proposal Idea for a new feature. label Jul 29, 2022
@snowman2
Copy link
Member

snowman2 commented Aug 2, 2022

Sounds like a good idea. A MR adding support for dask-geopandas is welcome.

@jessjaco
Copy link

I question whether this is the best approach as my (admittedly very limited) understanding is that a dask-geopandas dataframe is chunked along an index rather than into pieces of equivalent (spatial) size, so a single large feature could blow things up.

@gtmaskall
Copy link

I think that's a fair concern, @jessjaco . I suspect an important preliminary step would be to perform a spatial_shuffle on the dask-geopandas GeoDataframe. I think it's not so much an issue of single large features, but more how your features are distributed over the larger region. Of course a single large feature could still trip you up, but perhaps that's then down to the user to manage as only they would know how they'd want to subdivide their features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
proposal Idea for a new feature.
Projects
None yet
Development

No branches or pull requests

4 participants