Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider to support JDBC Catalog for RisingWave’s Iceberg sink #10830

Closed
manifest opened this issue Jul 7, 2023 · 12 comments
Closed

Consider to support JDBC Catalog for RisingWave’s Iceberg sink #10830

manifest opened this issue Jul 7, 2023 · 12 comments
Assignees
Milestone

Comments

@manifest
Copy link

manifest commented Jul 7, 2023

Is your feature request related to a problem? Please describe.

Currently RisingWave’s Iceberg sink only supports filesystem catalog. The sink is not appropriate for production use, because filesystem catalog is highly inefficient.

Iceberg supports multiple catalog implementations: JDBC, Hive Metastore, custom REST service, Nessie, and AWS Glue.

The JDBC catalog is the most general vendor-agnostic implementation that allows arbitrary configurations through database connection URL. In our case, we use PostgreSQL.

Describe the solution you'd like

I'd like to be able to configure RisingWave’s Iceberg sink to use JDBC Catalog.

Describe alternatives you've considered

In our case, the only (efficient) option is the JDBC Catalog.

Additional context

An example of a Spark configuration for our setting (PostgreSQL as a JDBC Catalog).

spark-sql \
    --packages "org.apache.iceberg:iceberg-spark-runtime-3.4_2.12:1.3.0,org.apache.iceberg:iceberg-spark-extensions-3.4_2.12:1.3.0,org.apache.iceberg:iceberg-aws:1.3.0,software.amazon.awssdk:bundle:2.20.83,software.amazon.awssdk:url-connection-client:2.20.83,org.postgresql:postgresql:42.6.0" \
    --conf spark.sql.extensions="org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions" \
    --conf spark.sql.catalog.demo="org.apache.iceberg.spark.SparkCatalog" \
    --conf spark.sql.catalog.demo.catalog-impl="org.apache.iceberg.jdbc.JdbcCatalog" \
    --conf spark.sql.catalog.demo.uri="jdbc:postgresql://localhost:5432/iceberg" \
    --conf spark.sql.catalog.demo.jdbc.user="iceberg" \
    --conf spark.sql.catalog.demo.jdbc.password="" \
    --conf spark.sql.catalog.demo.jdbc.verifyServerCertificate=true \
    --conf spark.sql.catalog.demo.jdbc.useSSL="true" \
    --conf spark.sql.catalog.demo.io-impl="org.apache.iceberg.aws.s3.S3FileIO" \
    --conf spark.sql.catalog.demo.warehouse=s3://path/to/warehouse/ \
    --conf spark.sql.catalog.demo.s3.endpoint="${AWS_ENDPOINT}" \
    --conf spark.sql.defaultCatalog="demo"
@github-actions github-actions bot added this to the release-0.20 milestone Jul 7, 2023
@fuyufjh
Copy link
Member

fuyufjh commented Jul 10, 2023

Thanks for the feedback! We'll investigate it soon

@liurenjie1024
Copy link
Contributor

We have supported rest catalog to access all kinds of catalogs.

@liurenjie1024 liurenjie1024 closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2023
@manifest
Copy link
Author

manifest commented Nov 8, 2023

Hey @liurenjie1024, can you please elaborate why did you decide not to support JDBC catalog and as a consequence using PostgreSQL as a metadata store?

@liurenjie1024
Copy link
Contributor

Hey @liurenjie1024, can you please elaborate why did you decide not to support JDBC catalog and as a consequence using PostgreSQL as a metadata store?

Hi, @manifest Rest catalog is community's recommended way of decoupling catalog client from different catalog implementation. It doesn't forbid you from using jdbc catalog, it acts a proxy of different catalogs. Our system is written in rust, and this the perfect approach for risingwave to interact with different catlogs.

@liurenjie1024
Copy link
Contributor

liurenjie1024 commented Nov 8, 2023

You can refer these:

  1. https://iceberg.apache.org/concepts/catalog/
  2. https://github.com/tabular-io/iceberg-rest-image

cc @manifest

Our tutorials and examples are under construction.

@manifest
Copy link
Author

manifest commented Nov 8, 2023

I'm familiar with the REST catalog option. It is only worth using when support of multiple catalog backends is required.

In our case, we don't have legacy and we only target PostgreSQL. I'm not aware of any available REST catalog supporting PostgreSQL that can be used out of the box and is ready for production use.

It doesn't make much sense to develop and maintain a custom REST service just to glue PostgreSQL with RisingWave. There are simpler options available.

All that just makes RisingWave out of option for production use cases similar to ours.

@liurenjie1024

@liurenjie1024
Copy link
Contributor

liurenjie1024 commented Nov 8, 2023

You can use this as out of box rest catalog server.

https://github.com/tabular-io/iceberg-rest-image

@manifest
Copy link
Author

manifest commented Nov 8, 2023

You can use this as out of box rest catalog server.

It seems more like a hello world example on how to develop REST catalog services rather than one ready for production use.

@manifest
Copy link
Author

manifest commented Nov 8, 2023

Ok. I just wanted to understand the reasoning behind dropping this feature request. Thanks.

@liurenjie1024
Copy link
Contributor

Ok. I just wanted to understand the reasoning behind dropping this feature request. Thanks.

Are you interested in deploying risingwave? How about joining our community slack to further discuss about your requirements?

@liurenjie1024
Copy link
Contributor

It doesn't have to be dropped forever, and the priority is determined by user requirements. CC @manifest

@manifest
Copy link
Author

manifest commented Nov 8, 2023

Are you interested in deploying risingwave? How about joining our community slack to further discuss about your requirements?

We would definitely consider it. It depends how well risingwave would fit our use cases.

How about joining our community slack

This issue was discussed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants