-
-
Notifications
You must be signed in to change notification settings - Fork 0
Community Meeting Notes Archive
The archive of Community Meeting Notes. See the most recent and tentative agenda for the next meeting on hackmd.
- 2022-09-01
- 2022-07-28
- 2022-06-02
- 2022-04-28
- 2022-03-31
- 2022-01-27
- 2021-12-02
- 2021-09-30
- 2021-08-05
- 2021-05-27
- 2021-05-24 GSOC coordination
- 2021-03-25
- 2021-01-14
- 2020-08-20: A second meeting!
- 2020-05-07: A first meeting!
Attendees: Martin Fleischmann, Brendan Ward, Joris van den Bossche, Matt Richards
- Shapely 2.0, pygeos and a strategy to phase-out different geometry engines
- https://github.com/geopandas/geopandas/pull/2275#issuecomment-1206347135
- Decision: release as 0.12 with Shapely 2.0 support once 2.0b1 is released; can merge PR right after updating to latest Shapely
- warn during import if detect shapely 2.0 (full release, not beta) and pygeos are both installed; provide instructions on how to set env var
- Requirements for NumFOCUS fiscal sponsorship:
- application questionnaire: https://docs.google.com/document/d/1axg8QK_onmUemh2ylsNvaGToM9NNvCAlguRkU2S8hcs/edit
- (contact Brendan for write privileges in short term)
- application deadline 0ct 15, notification Nov 30
- comprehensive model vs grantor-grantee model
- Joris can check re: other NumFOCUS projects
- May need to follow up with NumFOCUS to clarify
- Need 5 signatories
- Roadmap
- GeoPandas 1.0:
- only Shapely 2.0, no PyGEOS / Shapely 1.8x shims
- feature parity with shapely, at least for easy ones that are element-wise operations
- preparation of geometries; need good automatic behavior
- are there any API changes we want to make / remove?
- re-enable documentation page in main repo?
- Also project-level roadmap:
- I/O, scaling, S2 geographic
- link to Geopandas roadmap from higher-level roadmap
- inspiration:
- GeoPandas 1.0:
- Governance
- "NumFOCUS requires fiscally sponsored projects to have an explicit governance structure listed publicly on the project website or documentation."
- initial draft: https://docs.google.com/document/d/1N2BbZe1PujL69req9wAA0_xUb_EPoOSoWxt2CuTpilg/edit?usp=sharing
- (contact Brendan for write privileges in short term)
- insipiration:
- need to define:
- location: add to community repo, in order to apply to all GeoPandas subprojects? Or within main repo docs?
- See Jupyter model: dedicated repo for governance, associated documentation page in main repo
- Put into dedicated repo
- project roles and associated mechanics: steering council, lead developer, etc
- Steering council:
- example: for PySAL there are 20 devs and 3 serve on the council on annual basis; self-nominations and voting each year
- need to decide period of membership on steering council
- Steering council:
- process for updating governance docs
- location: add to community repo, in order to apply to all GeoPandas subprojects? Or within main repo docs?
- will need to be updated after acceptance by NumFOCUS:
- application questionnaire: https://docs.google.com/document/d/1axg8QK_onmUemh2ylsNvaGToM9NNvCAlguRkU2S8hcs/edit
- Code of conduct
- PSF has group that can receive code of conduct reports
- May also be able to get NumFOCUS to do this: https://numfocus.org/code-of-conduct
- General desire to have reports go to people that are not also on the steering council
Attendees: Martin Fleischmann, Brendan Ward, Thomson Comer
- Preserving sindex in copies of GeoDataFrame (https://github.com/geopandas/geopandas/issues/2510)
- in favor of the idea but may be very complex
- with pygeos / shapely 2.0 it is very fast to build the index, but slower with RTree; checking against original may be more computationally expensive than just building new tree
- need to time it to be sure
- Discontinued Windows wheels (https://github.com/geopandas/geopandas/issues/2465)
- unclear of state, there is an archive version as of June 26, 2022 at: https://www.lfd.uci.edu/~gohlke/pythonlibs/
- Fiona is working toward having Windows wheel builds same way as pyogrio
- Projecting to WGS84 when outputing GeoJSON
- old PR: https://github.com/geopandas/geopandas/pull/416
- may need to document thatwe are following the 2008 spec (no automatic reprojection) rather than RFC7946 (2016 spec).
- Ideally would reproject because of latest / formal spec, but GDAL does not do by default
- Fiona / pyogrio always
- For to_json, add a flag that by default does what GDAL does (no automatic reprojection) with initial default of
None
(what is in PR); maybe flip this over a deprecation period to automatic - Path forward: work on getting existing PR merged, document inconsistency vs Fiona / Pyogrio
- M1 / Arm64 installation issues via pip
- seems to require uninstalling then building binary deps (pygeos, shapely, pyogrio) from source; do we want to note this in the docs?
- Installation all works fine for Martin using mamba-forge / conda-forge; these are what we should document
- Pip: generally no wheels available for M1 (pygeos has them)
- https://github.com/geopandas/geopandas/issues/1816#issuecomment-1003093329
- modernizing setup.py => pyproject.toml per PEP 621
- cuSpatial:
- Thomson joined to represent project
- wants to have API parity with GeoPandas
- Working on geoarrow spec
- Working on I/O between cuSpatial and GeoPandas using geoarrow
- For feature parity, will need to rewrite core GEOS functionality for CUDA
- Working on implementing DE-9IM
- then projective transforms (similar to pyproj)
- team is largely focused on C++ header only libraries
- Working on defining I/O between geopandas / GEOS objects and arrow for use in CUDA
- also wanting to support dask-geopandas integration with cuSpatial
- adding
poetry
-specific sections topyproject.toml
?- consider adding these in pyogrio as a test for folks that use poetry
- S2 NumFOCUS grant
- Follow up with Benoît re: schedule for this
- Next NumFOCUS grant cycle(s)
- next proposal submission deadline: September 2, 2022
- Joris may have a contact to work on a NumFOCUS grant
- shapely 2.0 update
- getting close to 2.0a1 release
- using quadtree to help with partitioning of geometries for dask-geopandas
- already a python implementation of quadtree
- could expose a CAPI in GEOS against quadtree
- helps because you know the rectangular geometries of the quads, but to do this you need to process all the geometries at once
- could maybe do as a 2 pass operation; create the quadtree structure then have workers populate geometries into it
- look at the cuSpatial quadtree implementation for inspiration
- pyogrio update:
- fix conda-forge for 0.4.1 release; some issues building, Martin will follow up with conda-forge folks
- next meeting schedule?
- Martin will work with Joris to set next meeting date
Attendees: Martin Fleischmann, Joris van den Bossche, Brendan Ward
- GeoPandas 0.10.3 (aim for next couple of days? Nope: superseded by below)
- Can include just GeoParquet 0.4.0, regression fixes
- Regression: https://github.com/geopandas/geopandas/issues/2282
- GeoPandas 0.11 (aim for next couple of days)
- One option, cut release from main (w/o pyogrio yet):
- Or try to include pyogrio and other larger changes still in progress; but will take longer
- Make Fiona truly optional: https://github.com/geopandas/geopandas/pull/2427
- S2 NumFOCUS grant
- See writeup of 6/1/2022 discussion at: https://hackmd.io/FiNFZ6wkQCufZyyxBewFYA?both
- Still need to figure out how we want to do bindings (e.g., using pybind11)
- GSoC update
- no projects selected this year
- GeoPython 2022
- dask-geopandas workshop (2 hours)
- GeoPandas talk
- FOSS4G
- state of GeoPandas talk
- NumFOCUS small development grants (https://numfocus.org/programs/small-development-grants):
- round 2 closes June 5
- round 3 closes Sept 2:
- check in on this in July dev meeting
- maybe put in for pyogrio GDAL Arrow I/O
- Ecosystem updates:
- pyogrio
- Update PR to integrate pyogrio: https://github.com/geopandas/geopandas/pull/2225
- Goal for 0.4.0 is to make sure that pyogrio passes all Geopandas I/O tests
- Need to handle I/O for datetimes:
- Bug fix for reading datetime: https://github.com/geopandas/pyogrio/pull/111
- For conversion to pandas we need nanosecond resolution
- Look into adding datetime write support: https://github.com/geopandas/pyogrio/issues/58
- need to rename
layer_geometry_type
=>geometry_type
for consistency with everywhere else we use it - Merge: https://github.com/geopandas/pyogrio/pull/110
- Try to figure out what is causing failure on Linux: https://github.com/geopandas/pyogrio/pull/104
- Once vcpkg is updated to GDAL 3.5 / Cmake build we need to update our pinned version
- shapely
- STRtree API: query() returns indices, query_items() returns custom items, query_geoms() returns geometries; add deprecation warning to 1.8 branch
- Adding new GEOS 3.11 functionality:
- Consider adding more PRs soon
- GDAL:
- GeoParquet / GeoArrow support landed in 3.5.0
- Arrow I/O interface coming in GDAL C API; pyogrio opt-in if GDAL new enough and pyarrow available
- GeoParquet spec:
- O.4.0 support now in place
- pyogrio
Attendees: Martin Fleischmann, Joris van den Bossche, Levi John Wolf, Matt Richards, Pieter Roggemans, Brendan Ward
- Ecosystem updates:
- pyogrio:
- mixed geometries
- https://github.com/geopandas/pyogrio/pull/75
- Fiona always uses "Unknown" for mixed pluralities,
ogr2ogr
generally preserves existing type - by default (allow user to opt-out), try to write (promote if needed), fall back to "unknown"
- maybe change
geometry_type
parameter to be instead allow toggling promote as needed (maybe don't need promote="always"; instead have a method on GeoDataFrame instead?) - maybe
promote_to_multi
whereTrue
= always,False
= don't do it,None
do it automatically - Consider promoting truly mixed geometries to lowest common representation (e.g., Points + Lines => GeometryCollection)
- wheel building (GEOS)
- 0.4 release
- Issues with GeoPandas integration tests:
- lots of GeoJSON files used for testing, but these don't match because pyogrio sets correct integer types
- GDAL uses int32 by default whereas Fiona is using int64 by default
- Missing features:
- write support for datetime
- allow write support for
None
geometry values
- mixed geometries
- GeoParquet
- 0.2 requirement for winding order rolled back in 0.3
- CRS now optional and by default assumed to be OGS:CRS84
- Could do a GeoPandas patch release for matching 0.1 spec; maybe do so for 0.3 but still ambiguous default re: missing CRS
- Shapely 2.0
- STRtree changes are merged
- a few aliases still need to be updated
- pyogrio:
- GSoC update
- selection process ongoing (note that we cannot publish the decision yet)
- for future GSoC, may get better response if we have more specific tasks (similar in focus to PySAL); GeoPandas roadmap might help too
- try harder to get potential GSoC contributors to contribute something small early on while they are in the pre-proposal timeframe
- NumFOCUS SDG update
- S2 spherical geometry pilot grant received!
- GeoPandas 0.11
- would be nice to get pyogrio 0.4 in
- if possible, get random sampling PR merged
- given limited capacity, not try to pull in too many other features
- Next community meeting:
- consider moving to first week of June; aim to dedicate it to roadmap
- cancel following one at end of June?
(attending: Joris van den Bossche, Martin Fleischmann, Brendan Ward)
- Ecosystem updates
- pyogrio 0.4 release
- release without arm64
- aim for release next week
- should be able to finish the integration into GeoPandas
- shapely 2.0
- xyz services keeps getting updated with new services every 1-2 months
- dask-geopandas 0.1.0 released
- GDAL: initial parquet / arrow driver support: https://github.com/OSGeo/gdal/pull/5477
- pyogrio 0.4 release
- GeoPandas roadmap
- moved from last meeting (2x)
- Some notes about this in the first meeting notes: https://github.com/geopandas/community/wiki/Community-Meeting-Notes-Archive#2020-05-07-a-first-meeting
- GSoC update
- Applications open on April 4, closes on April 19
- NumFOCUS SDG update
- Benoit submitted proposal for Google S2 spherical geometry pilot
- GeoPandas 0.11 release
- can refresh pure-Python shapefile I/O PR and include in release (would be a different engine)
- make pyogrio opt-in via engine keyword if installed; make default if installed in 0.12
- need to decide if
index_right
being renamed to right dataframe index name from spatial join should make it into release PR but want to get in the other performance updates
- growing maintenance backlog
- need more help reviewing PRs
- Plan to have monthly GeoPandas meeting
- Next meeting (2022/04/28) will be at 20:00 UTC
- consider having every other meeting at 17:00 UTC
- Dask-geopandas meeting times and frequency
- stop until next round of GSoC
- Website
- right now sphinx is both documentation and GeoPandas homepage
- may want to consider splitting out GeoPandas documentation from homepage, so that can update homepage without having to push a release
- put docs at docs.geopandas.org
- may want to check w/ NumFOCUS if they can manage the domain
- NumFOCUS
- Need to investigate what it takes to become a fiscally sponsored project
- Need roadmap and governance (but we have code of conduct)
- Need to investigate what it takes to become a fiscally sponsored project
(attending: Martin Fleischmann, Brendan Ward, Thomas Statham, Matt Richards, Levi Wolf, Joris Van den Bossche, Alan Snow)
- Community call time
- Matt is in UTC+10, Brendan UTC-8, Alan is UTC-6, Joris is UTC+1, Martin and Levi are UTC
- Shall we consider different time or switching between them periodically?
- Next time will try later UTC time (20:00 UTC?)
- Ecocystem updates
- GeoPandas
- Shapely 2.0
- pygeos is now merged into
master
; this breaks GeoPandas but PR underway
- pygeos is now merged into
- Dask-geopandas
- GeoArrow spec
- Pyogrio (need another Conda release)
- GDAL / GeoArrow bridge; use GeoArrow as transport between GDAL/OGR and numpy arrays instead of WKB
- XYZ services: also have made recent updates
- NumFOCUS SDG (S2?)
- Round 1
- Call for Proposals Announcement: February 4, 2022
- Proposal Submission Deadline: March 4, 2022
- Committee Selection Deadline: March 18, 2022
- Notification to Applicants Deadline: April 15, 2022
- Ideas:
- seed funding to start bindings to Google S2 (need to follow up with Benoit); if still interested schedule follow up meeting to work out proposal details
- https://github.com/geopandas/community/issues/10
- Round 1
- GSoC
- New flexible format: contributors determine if they are short or long format
- https://developers.google.com/open-source/gsoc/timeline
- will work with NumFOCUS like last time
- Need to start drafting list of ideas; discuss at next meeting
- Consider including S2 as a project
- pure python I/O (mostly on geopackage side)
- make mapping better
- see the last year https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2021
- Complete outstanding tasks from GSoC 2021
- need notebook demonstrating new stuff (Thomas has ready now, just needs review)
- other minor tasks; see dask-geopandas issues
-
GeoPython 2022
- Basel Switzerland June 13-15 (hybrid; might have in-person component)
- Talk submission end of Feb, workshops end of March
- Might be good to have a talk on state of GeoPandas and ecosystem
- Might be good to have workshop on dask-geopandas
-
Foss4G
- Firenze Italy, Aug 22-28
- Deadlines: Talks/papers: end of Feb
- Let's follow up offline with Martin, Joris, Levi
- GeoPandas 0.11 release timeline
- should there be one more release before shapely 2.0 support?
- Issues around GeoDataFrame constructor / active geometry columns
- 0 geometry columns -> no GeoDataFrame
- >=1 geometry columns -> keep GeoDataFrame, even if active is not present
- better handle the case of the active geometry column being present
- better handle crs on the geodataframe
- dask-geopandas 0.1 release timeline
- want to include spatial shuffle, documentation updates, Hilbert distance with numpy
-
read_file
via pyogrio would also be nice to include and nearly ready; Joris will look at this again soon
-
- Hilbert distance: do we want to have this in GEOS C API and just include in Shapely 2.0? Unlikely to be faster having this as a scalar ufunc against GEOS
- TODO: open issue upstream at GEOS
-
regression in
dissolve
operation- dask renames columns in intermediate aggregation results then names them back; this creates a new GeoDataFrame with no geometry, which then fails in subsequent step
- want to include spatial shuffle, documentation updates, Hilbert distance with numpy
- possible dask-geopandas funding from the GDSL
- may be opportunity to fund someone on dask-geopandas
- GeoPandas roadmap
- moved from last meeting
- Some notes about this in the first meeting notes: https://github.com/geopandas/community/wiki/Community-Meeting-Notes-Archive#2020-05-07-a-first-meeting
(attending: Martin Fleischmann, Joris Van den Bossche, Brendan Ward, Benoit Bovy, Alan Snow, Jan Simbera)
- Ecosystem updates
- GeoPandas:
- pyogrio engine (https://github.com/geopandas/geopandas/pull/2225)
- longer term may want to do a hard switch from Fiona to pyogrio; some problems if both are installed via pip (conda is OK)
- may also want to make the backends optional, and install pure python support for Shapefile / geopackage by default (or leave all as optional)
- may want to look into xarray engine loading model
- need to figure out how to build wheels for pyogrio
- pyogrio engine (https://github.com/geopandas/geopandas/pull/2225)
- Shapely 2.0 / pygoes
- The merge is finally happening! (this will also mean that Shapely main branch is temporarily not working with GeoPandas)
- push new feature development to shapely instead of pygeos
- pygeos 0.12 release coming soon
- once pygeos is fully integrated into Shapely and stable, then archive pygeos; will remove pygeos opt dep. from geopandas by geopandas 1.0
- Dask-GeoPandas
- much of the core functionality, mostly working on spatial partitions
- GeoArrow specification
- First draft: https://github.com/geopandas/geo-arrow-spec/pull/12
-
__geo_arrow_interface__
in Python? - goal is to have arrow native way to store geometries instead of WKB for storage, uses compact storage of coordinates
- approach already used by cuSpatial for copying spatial data to GPU
- already some of the basic functionality in pygeos / Shapely 2.0 (get rings, coordinates, etc); requires multiple steps, but already faster than WKB conversion.
- goal is to have one function that does this conversion
- GeoPandas:
- Expansion of the team
- Martin's time is restricted in the following months leading to long response times on issues and PRs
- consider using triage approach
- need to formalize approach for adding new committers
- S2 geometry engine
- see this thread for context
- https://github.com/benbovy/pys2index
- overview from Benoit:
- lightweight wrapper for S2 point index with API similar to
scipy.spatial.cKDTree
- performance in benchmarks so far is quite fast
- would like to have vectorized wrappers for S2
- S2 appears to be actively maintained and about to get additional functionality soon
- used python-xtensor and pybind11 to work with S2 and numpy arrays
- lightweight wrapper for S2 point index with API similar to
- Two possible approaches to integrate:
- own way to store geometries specfic to backend engine; convert geometries on the fly to S2 objects as part of specific operation (e.g., predicate)
- R library converts on the fly to GEOS or S2 as needed
- More info: https://r-spatial.github.io/sf/articles/sf7.html
- own way to store geometries specfic to backend engine; convert geometries on the fly to S2 objects as part of specific operation (e.g., predicate)
- Wrapper classes:
- pygeos uses Python C extension wrapper for GEOS geometries so that GEOS objects are managed according to Python object lifecycles
- related issue
- consider putting in a request for NumFOCUS small development grant to start building out some of this support (next cycle may open early 2022): https://numfocus.org/programs/small-development-grants
- Formalise and publish a roadmap
- Some notes about this in the first meeting notes: https://github.com/geopandas/community/wiki/Community-Meeting-Notes-Archive#2020-05-07-a-first-meeting
- 2022 meetings schedule
- keep current cycle: last Thurs of uneven months, same UTC time
(attending: Martin Fleischmann, Joris Van den Bossche, Levi Wolf, Thomas Louf, Brendan Ward, Daniel Alejandro Mesejo-Leon, Imanol)
- 0.10 release
- #2076
-
sindex.nearest
vssindex.nearest_all
API- #1977 comment
- decision:
sindex.nearest
takes parametersreturn_all=True/False
,max_distance=None/float value
,return_distance=True/False
- use pygeos.nearest_all under the hood unless parameters are such that nearest will suffice
- sjoin/overlay/clip as methods (https://github.com/geopandas/geopandas/issues/2141)
- decision: discuss more on the issue
- deprecations (https://github.com/geopandas/geopandas/pull/2100)
- decision: merge after 0.10
- approved / partially approved PRs tagged to 0.10:
-
Add id_as_index argument to GeoDataFrame.from_features
- decision: not ready for 0.10
-
ENH: expose points_from_xy as a GeoSeries method
- decision: ready for merge
-
BUG: Fix multipoint clipping
- decision: Joris will review after meeting
-
Add id_as_index argument to GeoDataFrame.from_features
- let's release this evening?
- decision: no.
- geopandas.org domain
- any updated regarding ownership?
- need to find someone who can get a direct response from Kelsey
- geopandas/benchmarks repo (or benchmark-data)
- for macro benchmarks that don't fit into the ASV benchmarks
- use issues to nominate datasets to use for benchmarks
- GADM polygons often offer very good variety in terms of points-per-polygon
- railways, municipalities..
- EPSRC grant call
- grant proposal is coming along; due Oct 14
- integrating pygeos and geopandas-dask philosophies into core geopandas plus integration with other libraries
- need
- letter of support from Tom Augspurger (w/ planetary computer) (if anyone has better contact info than his gmail, send to [email protected])
- 2 page resume/CV for @jorisvandenbossche & @martinfleis
- proposal text:
- Sub-project status updates:
- dask-geopandas
- once shuffle is in place will make next release (0.1) and publicize more
- already in planetary computer docker images
- had a Google Summer of Code project on this focused on spatial partitioning methods
- pygeos / Shapely 2.0
- GEOS 3.10 release coming soon; will have some things that we'd like to add
- Shapely 1.8 is ready with all deprecations in place, just needs to be reviewed / released
- will migrate pygeos into shapely after 1.8 is out
- need to coordinate with Sean Gillies re: committers / admin rights
- pyogrio
- after geopandas 0.10, add an engine keyword to read_file / to_file to use
- create issue for this (-> who?)
- longer term (geopandas 1.0) aim to have Fiona replaced by pyogrio
- we can also expose the pyogrio helper functions in geopandas (eg
list_layers()
)
- after geopandas 0.10, add an engine keyword to read_file / to_file to use
- xyzservices (https://github.com/geopandas/xyzservices)
- contextily
- dask-geopandas
(attending: Martin Fleischmann, Levi Wolf, Stefanie Lumnitz, Tom Augspurger, Brendan Ward, Joris Van den Bossche, Thomas Statham)
- Microsoft / planetary-computer & tabular data
- https://planetarycomputer.microsoft.com/catalog
- based around STAC; supports Zarr / NetCDF. Working on expanding to tabular support
- wanting to refine recommendations for representing tabular data
- use Parquet format
- need to finalize the implementation of geo-arrow-spec in Geopandas
- metadata is mostly done
- storage currently uses WKB (will always support this as a fallback), planning to revist this to optimize using Arrow data structures
- would like GDAL to support this as well; longer term want to use the Arrow C data API (both for file formats as well as transport after reading those to downstream libs like Geopandas)
- See Cloud Data Warehouse Geospatial Interoperability
- this is just getting off the ground
-
NumFOCUS dask-geopandas IO project
- see proposal
- Parquet support is mostly complete
- Feather dataset (https://github.com/geopandas/dask-geopandas/pull/91/)
- Plan is to:
- read bounds from file (already implemented)
- use methods in
dask-geopandas
to determine partitions (via Hilbert curve distance, etc) - then read underlying features into those paritions
- Timeline is next ~3 months
-
funding
-
https://www.ukri.org/opportunity/software-for-research-communities/?utm_medium=email&utm_source=govdelivery
- limited to UK research staff
- plan is to expand on work in
dask-geopandas
plus other foundational work around spatial indexes, topologies, top-K nearest geometries - expression of interest due in a month
- letters of recommendation in late Sept.
- will post public request for comment about the work proposed here
- needs to be driven by community demand
-
https://www.ukri.org/opportunity/software-for-research-communities/?utm_medium=email&utm_source=govdelivery
-
xyzservices release
- https://github.com/geopandas/xyzservices
- takes contextily providers (metadata for tile providers) and puts into dedicated package
- goal is centralized package to be used within the ecosystem
- contextily will be updated soon to use this
- pushing more broadly within ecosystem; other packages starting to use or expressed interest
-
installation as
geopandas
andgeopandas-base
to either get minimal dependencies or most dependencies- https://github.com/geopandas/geopandas/issues/1313
- https://github.com/geopandas/geopandas/issues/1261
- now a
geopandas-base
on conda-forge - need to decide what to do about
pip
installs- if make it leaner (remove fiona, rtree) will make it much easier to install
- use install options,e.g.,
[full]
to add the others
- may want to consider
pyproj
as optional dep and lazy load -
pygeos
has good support withpip
(getting better very soon with CI wheel builds)
-
API for tools
-
gdf.sjoin(other)
vsgeopandas.sjoin(left, right)
#1984 - need to define rules for what is a method vs a function
- method approach is more common for pandas
- for
dask-geopandas
method is preferred -
clip
is potentially problematic since supported by pandas, but since it is a numeric method not applicable to geometries anyway (currently fails), probably OK to makeclip
here support only geometry implementation
- duplicate vs deprecate functional approach?
- in favor of deprecation, though a bit annoying for community since functions are widely used
- short term can pass through functional to method approach to limit duplication of code
- start teaching around the method approach
- next release aim to have method approach, release after mark functional ones as deprecated
-
-
API of matrix binary operations
- https://github.com/geopandas/geopandas/pull/1674
- We now have an implementation based on sparse matrix which works really well for all the use cases
- API:
- always return sparse array (use
sparse
package as optional dependency)- basic support for sparse in dask (using scipy.sparse) but lots of things not yet in place
- have a keyword for sparse backend
scipy.sparse
or pydatasparse
?
- single method (predicate is a parameter) vs one method per predicate (which could use the former internally)
- could also have
predicate_matrix
for everything, and also exposeintersects_matrix
since this is most likely used
- could also have
- always return sparse array (use
-
pyData Global
- possible talk on updates in Geopandas
-
type hints
- lots of outstanding PRs -> start with reviewing the geoseries.py file
- testing
- may not want to do for next release
- for internal functions, aim to have strict types
-
0.10 release target
- add
explore
as a highlight in this release
- add
-
Shapely 2.0
- slowly moving forward
- STRtree discussion resolved
- need to have a shapely 1.8 release first
- branch in shapely using pygeos is ready to merge into master
- numpy warnings -> also ignore in geopandas (TODO Joris)
-
functions in pygeos as methods on GeoSeries
- https://github.com/geopandas/geopandas/issues/2010
- in favor of doing this; hard requirement on pygeos is fine
(attending Martin, Joris, Stefanie, Thomas, Brendan)
-
dask-geopandas
- Dask Summit workshop debrief
-
Google Summer of Code
- We have one project on dask-geopandas development
- Logistics:
- smaller meetings every week, aim for Thurs 4-5 PM UTC; Martin will setup meetings
- every 2 months a larger GeoPandas meeting
- use Github issues, PRs, GeoPandas gitter, dask-geopandas gitter
- Martin is admin point of contact
- Blog posts from GSOC: these to get linked into NumFOCUS blog
- Goals:
- spatial partitioning
- explore writing out to Parquet?
- need to figure out partitioning methods, e.g., Hilbert curve
- probably want to implement a couple methods: Hilbert, maybe a gridded approach
- first identify some of the options:
- simple grid
- known regions (can do spatial clustering for getting more or less homogeneous sized partitions)
- hilbert curve
- quadtree: might work well, not exposed yet in GEOS C API / pygeos
- strtree: don't have access to nodes / leaves via GEOS C API / pygeos
- storage of partitions
- right now just polygons as a geoseries
- spatial indexing
- also want to make sure this gets done
- only place this is currently used is for writing to Parquet and
cx
coordinate indexer - good starter PR: simple predicates:
intersects
; check for overlap with partition first, before checking geometries within partition
- spatial partitioning
- Logistics:
- feedback to rejected projects?
- We have one project on dask-geopandas development
-
NumFocus SDG
- Joris wants to apply for SDG to work on
dask-geopandas
- Focus more on I/O
- Read large dataset, have
dask-geopandas
figure out partitioning to files - Read index and bounding boxes into memory to drive the partitioning, then use the partion bounding boxes or lists of indexes to query out chunks of data
- Optimize parquet: store coordinates instead of WKB
- Feather support? Right now using the dask support for Parquet, not available for Feather in dask; Joris has a prototype Feather file reader for dask
- Convert GDAL directly to Arrow memory format instead of WKB
- maybe do directly in GDAL
- try first in
pyogrio
- Read large dataset, have
- Joris wants to apply for SDG to work on
-
GeoPandas Blog
- Shall we create GeoPandas blog? We can follow pandas model with an aggregator.
- https://pandas.pydata.org/community/blog/
- Joris to reach out to Kelsey re: domain name for geopandas
- Shall we create GeoPandas blog? We can follow pandas model with an aggregator.
-
API of matrix binary operations
- https://github.com/geopandas/geopandas/pull/1674
- We now have an implementation based on sparse matrix which works really well for all the use cases
- Qs:
- API
- which sparse backend?
scipy.sparse
or pydatasparse
? - Martin is planning to base the implementation around sparse approach
- Discuss next time
-
API for interactive plotting
- https://github.com/geopandas/geopandas/issues/1904
- We want pluggable interactive plotting backends. How to do it smoothly?
- interest from some of the plotting backends
- don't really want global config for plot method
- want to keep usage of static and interactive plotting separate, don't clobber the static implementation by using interactive plotting; keep these in separate methods
- add another method:
explore
/view
for interactive maps
- datashader option to HVplot:
- works quite well for large data
- Joris follow up with them: can instance check be expanded to include geopandas geodataframes (via
dask-geopandas
), not just spatialpandas frames
-
Community calls
- we have a shared Google Calendar for GeoPandas-related events
- meetings are set to 17:00 UTC every two months (last Thursday)
-
xyzservices
- new package under geopandas umbrella
- formerly
contextily.providers
- https://github.com/geopandas/xyzservices
- planning to have available before next release of geopandas
- will have 2 JSON formats:
- pretty version that includes metadata
- compiled / compressed version that is actually used in code; plan is to create via Github action
-
Ecosystem update
-
cuSpatial
should fully support geopandas-cuspatial dataframe conversion in the next release
-
-
Shapely 2.0
- Joris planning to do more on this in June
- main blocking issue is the discussion around STRtree
- Differences in minimum rotated rectangle between Shapely's pure python method and method in GEOS
- Follow up with GEOS team about differences
- OpenCV method same as SHapely
- Also a method in PostGIS - is it the same
-
Pyogrio
- Brendan: transfer to GeoPandas org
- Other
- Weekly meetings
- Use public channels for discussion / questions (github issues, gitter channel, (specific? -> make a dask-geopandas channel))
- Single Point of Contact (more for administrative questions)
- Martin
- Blog: on NumFOCUS & personal site is fine, no need for GeoPandas branded one
(attending: Martin, Joris, James, Brendan, Sangarshanan, Levi)
-
Google Summer of Code
- We have submitted 3 project ideas
- Pure Python IO
- Plotting enhancements
dask-geopandas
- https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2021
- Students should get in touch now and submit proposals within weeks
- students will start applying next Monday
- We need to select students between mid-April and mid-May
- Should we advertise it more? Prospect on possible students?
- TODO: Post on Twitter again (done)
- PySAL: primarily recruits from own students; ~1/2 have been affiliated that way
- We have submitted 3 project ideas
-
Community repository
- we have a new geopandas/community repo
- if not package specific to not specific to code, governance, code of conduct, post to this
- if specific to GeoPandas post issues to GeoPandas instead
- use for announcing meetings or proposals (workshops, funding)
- how should we efficiently use it?
- https://github.com/geopandas/community
- TODO: post issue for how to get funding for GeoPandas features or ideas list for potential future grants
- we have a new geopandas/community repo
-
Community calls
- shall we switch to some predictable schedule? (Bi-)Monthly?
- start with bimonthly on last Thursday of each month
- TODO: post schedule to community repo
- archive prior call notes to community repo; keep markdown doc for latest meeting
-
dask-geopandas
- repository moved to GeoPandas org
- https://github.com/geopandas/dask-geopandas
- Dask-Summit workshop proposal
- In May: https://summit.dask.org/
- submitted proposal around scaling GeoPandas vector operations
- Could have a presentation about current status of dask-geopandas
- Some discussion around spatial partitioning
- Look for ways to collaborate with spatial pandas
- Would be good to do visualization of bigger data
- TODO: add issue in community repo for ideas for this workshop
- First alpha released on PyPI, still needs conda-forge
- Martin: will add to conda-forge
- Biggest needs: spatial index and overlap operations
-
User-friendly API of matrix binary operations
- would be nice to have "
intersects_matrix
" in 0.10 - We should agree on the API design, implementation should be straigtforward based on
query_bulk
, - https://github.com/geopandas/geopandas/pull/1674
- returning a list maybe not particularly useful
- might be a good to have a few example use cases
- does any polygon in input intersect any in right dataframe
- which of them in left dataframe intersects any in right dataframe
- how many intersects
- use outer strategy with sparse argument
- currently don't depend on scipy; makes it harder to use sparse option
- can keep sparse as an optional argument; fall back to full matrix
- another alternative is to use xarray and pydata sparse backend (optional dependencies)
- could just return dense pandas table of left and right indices
- would be nice to have "
-
Interactive plotting
- the existing tools are not as friendly as we thought
- folium-based implementation of
GeoDataFrame.view()
mirroring the language ofplot()
- https://github.com/martinfleis/geopandas-view
- should it be embedded in GeoPandas? Or as an affiliated project under GeoPandas repo?
- @sangarshanan is willing to help maintaining it
- status: most of the stuff supported for static plotting in matplotlib is now supported against folium
- considerations for API:
- plotting backend provider
- namespacing folium / interactive methods to prevent collision with static plotting
- over some threshold do not want to plot in folium
- might be good to look at how
sf
in R handles translation to backend providers - implementation of backend can be outside GeoPandas; might be easier to have this directly in GeoPandas in order to allow it as a default (not a lot of code)
- will do a bit more work to polish then migrate into GeoPandas
-
contextily providers module
- there is an idea to convert contextily providers module to a separate package
- both contextily and
view()
could be using it + others - https://github.com/geopandas/contextily/issues/153 and partially https://github.com/geopandas/contextily/issues/172
-
Ecosystem update
- pygeos/shapely2.0
- Current blocker: STRtree design (https://github.com/Toblerity/Shapely/pull/1064, https://github.com/Toblerity/Shapely/pull/1094)
- Shapely 1.8 release in prep for the transition; will raise deprecation warnings
- After 1.8, move pygeos code into Shapely; will need to coordinate with pygeos
- pyogrio
- Windows support?
- Do we need something similar as
fiona.Env
?
- pygeos/shapely2.0
-
geopandas.org
- we still don't have access to the domain to point it to RTD
- Joris will ping Kelsey J.
- also need to have ownership in Pypi; need to be able to add others
- conda forge:
- anyone can help maintain this
- currently Joris, James, Filipe
- we still don't have access to the domain to point it to RTD
-
NumFOCUS small grants
- do we want to apply for something in the near future?
- anyone has capacity?
- next round likely before summer
- open issue on community repo
(attending: Martin, Joris, James, Brendan, Sangarshanan, Levi)
-
Google Summer of Code
- We have submitted 3 project ideas
- Pure Python IO
- Plotting enhancements
dask-geopandas
- https://github.com/geopandas/geopandas/wiki/Google-Summer-of-Code-2021
- Students should get in touch now and submit proposals within weeks
- students will start applying next Monday
- We need to select students between mid-April and mid-May
- Should we advertise it more? Prospect on possible students?
- TODO: Post on Twitter again (done)
- PySAL: primarily recruits from own students; ~1/2 have been affiliated that way
- We have submitted 3 project ideas
-
Community repository
- we have a new geopandas/community repo
- if not package specific to not specific to code, governance, code of conduct, post to this
- if specific to GeoPandas post issues to GeoPandas instead
- use for announcing meetings or proposals (workshops, funding)
- how should we efficiently use it?
- https://github.com/geopandas/community
- TODO: post issue for how to get funding for GeoPandas features or ideas list for potential future grants
- we have a new geopandas/community repo
-
Community calls
- shall we switch to some predictable schedule? (Bi-)Monthly?
- start with bimonthly on last Thursday of each month
- TODO: post schedule to community repo
- archive prior call notes to community repo; keep markdown doc for latest meeting
-
dask-geopandas
- repository moved to GeoPandas org
- https://github.com/geopandas/dask-geopandas
- Dask-Summit workshop proposal
- In May: https://summit.dask.org/
- submitted proposal around scaling GeoPandas vector operations
- Could have a presentation about current status of dask-geopandas
- Some discussion around spatial partitioning
- Look for ways to collaborate with spatial pandas
- Would be good to do visualization of bigger data
- TODO: add issue in community repo for ideas for this workshop
- First alpha released on PyPI, still needs conda-forge
- Martin: will add to conda-forge
- Biggest needs: spatial index and overlap operations
-
User-friendly API of matrix binary operations
- would be nice to have "
intersects_matrix
" in 0.10 - We should agree on the API design, implementation should be straigtforward based on
query_bulk
, - https://github.com/geopandas/geopandas/pull/1674
- returning a list maybe not particularly useful
- might be a good to have a few example use cases
- does any polygon in input intersect any in right dataframe
- which of them in left dataframe intersects any in right dataframe
- how many intersects
- use outer strategy with sparse argument
- currently don't depend on scipy; makes it harder to use sparse option
- can keep sparse as an optional argument; fall back to full matrix
- another alternative is to use xarray and pydata sparse backend (optional dependencies)
- could just return dense pandas table of left and right indices
- would be nice to have "
-
Interactive plotting
- the existing tools are not as friendly as we thought
- folium-based implementation of
GeoDataFrame.view()
mirroring the language ofplot()
- https://github.com/martinfleis/geopandas-view
- should it be embedded in GeoPandas? Or as an affiliated project under GeoPandas repo?
- @sangarshanan is willing to help maintaining it
- status: most of the stuff supported for static plotting in matplotlib is now supported against folium
- considerations for API:
- plotting backend provider
- namespacing folium / interactive methods to prevent collision with static plotting
- over some threshold do not want to plot in folium
- might be good to look at how
sf
in R handles translation to backend providers - implementation of backend can be outside GeoPandas; might be easier to have this directly in GeoPandas in order to allow it as a default (not a lot of code)
- will do a bit more work to polish then migrate into GeoPandas
-
contextily providers module
- there is an idea to convert contextily providers module to a separate package
- both contextily and
view()
could be using it + others - https://github.com/geopandas/contextily/issues/153 and partially https://github.com/geopandas/contextily/issues/172
-
Ecosystem update
- pygeos/shapely2.0
- Current blocker: STRtree design (https://github.com/Toblerity/Shapely/pull/1064, https://github.com/Toblerity/Shapely/pull/1094)
- Shapely 1.8 release in prep for the transition; will raise deprecation warnings
- After 1.8, move pygeos code into Shapely; will need to coordinate with pygeos
- pyogrio
- Windows support?
- Do we need something similar as
fiona.Env
?
- pygeos/shapely2.0
-
geopandas.org
- we still don't have access to the domain to point it to RTD
- Joris will ping Kelsey J.
- also need to have ownership in Pypi; need to be able to add others
- conda forge:
- anyone can help maintain this
- currently Joris, James, Filipe
- we still don't have access to the domain to point it to RTD
-
NumFOCUS small grants
- do we want to apply for something in the near future?
- anyone has capacity?
- next round likely before summer
- open issue on community repo
-
User Survey Review
- Let's see what people think
- https://github.com/geopandas/geopandas-user-surveys/pull/1
- make private repo to store private responses
- Some points:
- interactive plotting: more examples
- performance is a consistent mentioned issue
-
Core dev team organisation
- Have official list of people?
- Mailing list
- Org like https://github.com/dask/community/
- Expanding the team?
- governance questions
- code of conduct
- mediation
- violations of CoC
- adding developers/removing (retiring?) developers
-
NumFOCUS fiscal sponsorship
-
Documentation
- Status of Martin's work
- https://github.com/geopandas/geopandas/pull/1759, https://github.com/geopandas/geopandas/pull/1757
-
geopandas-base
- having an option to depend on shapely only
- pure-Python I/O, no CRS
-
IO
-
pyogrio integration
- Discuss integration plan for testing I/O using
pyogrio
instead offiona
(seeing about 10-16x speedups)- try to package up on conda forge
- Discuss integration plan for testing I/O using
- non-GDAL IO
- pygpkg
- pyshp
- GSOC application focusing on non-GDAL IO @martin
-
pyogrio integration
-
GSOC
- think about participating in GSOC 21
- https://opensource.googleblog.com/2020/10/google-summer-of-code-2021-is-bringing.html
- Python GPGK IO project?
-
pyrosm
- Henrikki is looking for a home for pyrosm (yes to us)
-
GeoPandas paper
- REGION OA (no APC) journal
- https://openjournals.wu.ac.at/ojs/index.php/region/index
-
Ecosystem update
- pygeos/shapely2.0
- dask-geopandas
-
0.9 release
-
NumFOCUS Documentation project
- I'd like to update you on current development and discuss a bit further steps to decide on priorities and time frame.
- context: https://github.com/geopandas/geopandas/issues/1564
- Martin provided an update on the latest direction in documentation work in https://github.com/geopandas/geopandas/issues/1564
- some examples will move to user guide where they are using the core functions
- for examples gallery may use nb-sphynx instead of sphynx-gallery
- Will bulk up installation instructions to help alleviate many of the complaints around installation issues
- will add a longer-term roadmap within the docs
- Going forward, Martin will add examples incrementally but will try to get this reviewed as a larger PR
- New Advanced Guide will include more advanced topics like using spatial index and vectorization
- Will need to add redirects from important pages from existing readthedocs pages to the new documentation structure
-
Select final logo
- https://github.com/geopandas/geopandas/issues/1405
- Let's make the final decision!
- Go with the one with highest votes
- This will go into a separate PR with all the versions and source files
- Add a page to documentation with the logo and specific colors used
- Share logo back to NumFOCUS
- TODO: update the logo on twitter, etc
-
GitHub Sponsors
- We may consider using GitHub Sponsor button. Someone recently asked how to support GeoPandas and I was not sure if there is any possibility of a direct (financial) support, apart from donating to NumFOCUS.
- In order to have NumFOCUS accept $ on behalf of GeoPandas, may need to become a fiscally-sponsored project instead of just an affiliated project; Joris will check into this
- For GitHub Sponsor have seen examples of sponsoring individuals; will need to see what it would take to sponsor the larger project
-
GeoPandas usage / promotion
- Would like to feature groups that use GeoPandas as part of their work, maybe on GeoPandas blog (if there was one)
- Blog: would like to do this outside sphynx
-
GeoPandas domain
- Joris will follow up with Kelsey
- Also request PyPi access from Kelsey
- Joris will follow up with Kelsey
-
Packaging automation
- Can use GitHub Actions to publish packages to PyPi / Conda
- Can derive this from Pydata project
-
Social media
- Twitter
- Joris is currently maintaining this
- Martin can help with this; Joris will share access
- Example that came up on twitter from COVID-19 dashboards around showing density of points, maybe by hexagon; might want to add something like this as an example in the docs
- Twitter
-
GeoPandas academic paper
- Geographical Analysis journal is having a special issue on Open Source Software for Spatial Analysis, edited by Luc Anselin and Serge Rey (both PySAL). We had a small exchange about the possibility of writing a paper about GeoPandas (which is long overdue I'd say) with Joris and Serge on twitter: https://twitter.com/jorisvdbossche/status/1282208649335779328 I feel that this would be great thing to do, although it naturally takes time to write a proper paper.
- Special issue will require more background documentation & contextualization; not just a description about the project
- Need to position it into the wider ecosystem; directly address how it has advanced spatial analysis in Python
- Could start brainstorming / collecting ideas
- Martin will make a google doc
- Martin will check to see if there is sponsorship from the university for making this open access
- Full fee is $3,000 US
- If we don't go for this, make sure to go after a different publication that allows open access
-
GeoPandas Survey
- Discuss plan to finish up and post GeoPandas survey: https://docs.google.com/document/d/1caityqUUfgAN2u9VUJN78mTyS3fMYZI-ZvgtfLfio9A/edit?usp=sharing
- Martin will add the GDPR compliance
- Use Google Forms to release this
- Can be individual owner
- Martin can create the form
- Timeframe:
- Would like to launch as soon as possible, aim for sometime in Sept.
-
GeoPandas 0.9 roadmap
- If we want to release 0.9 in December (we discussed switching to 6-month release cycle), we could discuss what do we want to (ideally) include.
- Binary predicates change - https://gist.github.com/martinfleis/abc7cdbf9f9266bf9ed369080eec7cea
- proposal is to build this on the output of query bulk
- people normally interested in 2 questions: does my polygon intersect any in the other data frame (not just same line), which polygons from right data frame are intersected with the one on the left
-
sf
(in R) doesn't return series, they return metrics (sparse / dense) - could have a function that gives more direct access to sindex bulk query
- general agreement about keeping the existing predicate behavior as is, but adding a new set of methods on GeoSeries to add the cross / matrix oriented approach
- Martin will add a new issue for this with notebook example
- spatial index
- do we want to expose interface to multiple spatial index or abstract base class that can wrap other spatial index implementations
- can revise the issue based on discussion but don't target for 0.9
- revisit once pygeos / shapely 2.0 integration is complete and no longer optional; STRtree will be default as part of that
- Brendan will try to get outstanding pygeos issue to add other predicates to STRtree in for next pygeos version:
- Upcoming pygeos features in next release: mostly around multithreading, adding support for Z values to coordinate ops
- geodetic distance / area calculations
- this was tricky to write these to be performant, dealing with wrap around the poles
- there is project to extract out the S2 ideas into a general purpose library
- Create an example out of this work and put in documentation
- Create an issue about adapting ideas from
sf
- Aim for supporting different spatial backend (e.g.,
S2
) after 1.0 - Look into some of the other backends
- cuSpatial:
- want to support interoperability, not sure about supporting different underlying geometry providers / backends
- Longer term, maybe consider making GDAL / Fiona optional (e.g., read data from Parquet...)
- vectorized snap
- e.g., make larger linestring out of 2 disconnected segments
- in GEOS overlay refactor, this will include a precision-based snap
-
Future NumFOCUS grants
- I am not aware of the schedule of future funding rounds, but we should be prepared (if anyone has a capacity).
- Normally should be 3rd round for this year, but haven't heard yet
- I am not aware of the schedule of future funding rounds, but we should be prepared (if anyone has a capacity).
-
dask-geopandas
- Discuss the current state and future of
dask-geopandas
. - Big work items underway:
- I/O methods: Joris adding Parquet support from geopandas
- making use of spatial partitioning
- Discuss the current state and future of
-
NumFOCUS
- Small development grants ideas:
- better documentation
- better integration / leveraging spatial indexes for operations
- small improvements to topological operations (relates operations); elementwise vs all-pairwise
- Small development grants ideas:
-
Logo
- https://github.com/geopandas/geopandas/issues/1405/
- Joris: check with pandas
- Try different color, otherwise go with it!
-
Lowering barriers to effective engagement / involving community
- reviewing PR bottlenecks
- time of core maintainers
- huge PRs, can we suggest folks make smaller PRs?
- reviewing PR bottlenecks
-
Maintenance bottlenecks
-
Roadmap (1.0?)
- Shapely 2.0 / pygeos speed-ups
- API for topological operations
- IO
- parquet/feather
- faster GDAL
- databases
- consistent API
- Integrating raster operations
- zonal stats is problematic for large data
- geodetic distance etc (geography)
- visualization
- maybe geoplot becomes an affiliate like contextily
- residentmario may not have time naymore for maintenance
- Vectorized snap feature to other feature
-
Do something like http://xarray.pydata.org/en/stable/roadmap.html
- Open an issue for this
-
places to ask questions vs. filing an issue? document.
-
Documentation
- notebooks/examples
-
Installation issues