Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support geospatial data types during cdc replication #19519

Open
darkcofy opened this issue Nov 21, 2024 · 8 comments
Open

Support geospatial data types during cdc replication #19519

darkcofy opened this issue Nov 21, 2024 · 8 comments

Comments

@darkcofy
Copy link

darkcofy commented Nov 21, 2024

Is your feature request related to a problem? Please describe.

We use geometric data types like point,polygon, lat long in our systems. Currently tables that contain this data can't be ingested using the select * statement in risingwave as the entire table is rejected automatically

Describe the solution you'd like

Support ingesting geospatial data types from source DB into risingwave and sinking them to downstream system like iceberg for further processing.

Describe alternatives you've considered

Currently there is no alternative as there's no workaround in risingwave

Additional context

Geospatial functions aren't as important as the ability to stream this data into the system and ingest into the lake house. Further processing can be done downstream by other engines like duckdb etc

@github-actions github-actions bot added this to the release-2.2 milestone Nov 21, 2024
@fuyufjh
Copy link
Member

fuyufjh commented Nov 22, 2024

How about ingesting these geo data types as strings?

As far as I know, Iceberg or Parquet doesn't have official support for geospatial data yet, so I am wondering what format will be used in the downstream Iceberg in your case.

The full support #16829 requires more effort, so it would be nice if there's a quick workaround.

@darkcofy
Copy link
Author

darkcofy commented Nov 22, 2024

Hey so we currently use debezium and debezium automatically converts such values into wkb json object which works perfectly fine with iceberg tables. We have this running in production right now and it's working with zero issues something similar would be amazing.

Strings work as well, we can translate them back elsewhere maybe?

@darkcofy
Copy link
Author

darkcofy commented Nov 22, 2024

https://debezium.io/blog/2018/01/25/debezium-0-7-2-released/

The geometry columns are exported as structs as mentioned here should be easy enough to sink?
https://debezium.io/documentation/reference/stable/connectors/mysql.html#mysql-spatial-types

@fuyufjh
Copy link
Member

fuyufjh commented Nov 26, 2024

I see. It sounds good to ingest the io.debezium.data.geometry.Geometry as a struct<wkb bytea, srid int> or jsonb in RisingWave (better than varchar, I think).

What do you think? cc. @StrikeW

@fuyufjh
Copy link
Member

fuyufjh commented Nov 26, 2024

By the way, we rely on sea-schema to backfill data from upstream database. It doesn't support PostGIS's data types (geometry/geography) yet. source code link.

Update: Created an issue SeaQL/sea-schema#140

@darkcofy
Copy link
Author

darkcofy commented Nov 26, 2024

Is this only for the select * syntax? if so were happy to define the schema manually for the geospatial tables

Also our case is specifically MySql only and it looks like they are implemented MySql geospatial types

@StrikeW
Copy link
Contributor

StrikeW commented Nov 26, 2024

I see. It sounds good to ingest the io.debezium.data.geometry.Geometry as a struct<wkb bytea, srid int> or jsonb in RisingWave (better than varchar, I think).

What do you think? cc. @StrikeW

The challenge lies in the backfill part, we may need to decode the bytes to struct<wkb bytea, srid int>.
https://github.com/blackbeam/rust_mysql_common/blob/e9566e9536ddf0892cb9e2180a5751a4b2fc4379/src/value/mod.rs#L418

@fuyufjh
Copy link
Member

fuyufjh commented Nov 26, 2024

The challenge lies in the backfill part, we may need to decode the bytes to struct<wkb bytea, srid int>.
https://github.com/blackbeam/rust_mysql_common/blob/e9566e9536ddf0892cb9e2180a5751a4b2fc4379/src/value/mod.rs#L418

True. Good news is we are not blocked by them.

@fuyufjh fuyufjh added the good first issue Good for newcomers label Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants