Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(sink): add config auto_create for BigQuery #17393

Merged
merged 7 commits into from
Jun 27, 2024

Conversation

jetjinser
Copy link
Contributor

I hereby agree to the terms of the RisingWave Labs, Inc. Contributor License Agreement.

What's changed and what's your intention?

resolve #15586.

Checklist

  • I have written necessary rustdoc comments
  • I have added necessary unit tests and integration tests
  • I have added test labels as necessary. See details.
  • I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features Sqlsmith: Sql feature generation #7934).
  • My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future).
  • All checks passed in ./risedev check (or alias, ./risedev c)
  • My PR changes performance-critical code. (Please run macro/micro-benchmarks and show the results.)
  • My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details)

Documentation

  • My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users)

Release note

New config bigquery.auto_create_table or auto_create_table, defaults to false.
If enabled, a new table will be automatically created in BigQuery when the specified table is not found.

The datatype mapping from RisingWave to BigQuery:

RisingWave BigQuery
Boolean Bool
Int16, Int32, Int64, Serial Integer
Float32 BigQuery not support real
Float64 Float
Decimal Numeric
Date Date
Varchar String
Time Time
Timestamp Datetime
Timestamptz timestamp
Interval BigQuery not support Interval^1
Struct Struct (alias Record)
List x repeated mode x
Bytea Bytes
Jsonb Json
Int256 BigQuery not support Int256

^1: Pre-GA: https://cloud.google.com/bigquery/docs/reference/standard-sql/data-types#interval_type

@jetjinser jetjinser added the user-facing-changes Contains changes that are visible to users label Jun 21, 2024
_ => {}
}
}

let mut rs = client
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When auto_create_table=true, it seems that there is no need to query the schema again

DataType::Time => TableFieldSchema::time(&rw_field.name),
DataType::Timestamp => TableFieldSchema::date_time(&rw_field.name),
DataType::Timestamptz => TableFieldSchema::timestamp(&rw_field.name),
DataType::Interval => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be supported, If it is not possible, please also set what is written below as not supported

Copy link
Contributor Author

@jetjinser jetjinser Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should indeed be unsupported for now: https://cloud.google.com/bigquery/docs/schemas#standard_sql_data_types. (no interval here, in pre-ga)
What needs to be added in the error message?

Copy link
Contributor

@xxhZs xxhZs Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tested it before and it works, you can retest it, if it doesn't work, please disable interval in the code elsewhere
doc is in https://cloud.google.com/bigquery/docs/write-api?hl=zh-cn

Copy link
Contributor Author

@jetjinser jetjinser Jun 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does support it, but it is not a stable feature yet.
In addition, the sdk FieldType have no Interval. It's used to construct Table that table().create() would use.
I think we can let it keep the unsupported error? Panic when validate failed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot to change it in get_string_and_check_support_from_datatype?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

About Interval?
Unlike the case of map_field, which is used to create a BigQuery table, we can avoid actively using unstable type Interval; the return value of get_string_and_check_support_from_datatype is used to validate types of BigQuery table columns. If a column is indeed an Interval type, I don't think we have any reason to refuse?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it, thanks for the explanation. Would you mind adding the explanations here into the code comment?

@jetjinser jetjinser force-pushed the jinser/bigquery-auto-create-table branch from 8b8bac2 to 72b4250 Compare June 25, 2024 08:59
@jetjinser jetjinser requested a review from xxhZs June 25, 2024 09:12
Copy link
Contributor

@xxhZs xxhZs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the test runs properly, lgtm

@jetjinser jetjinser changed the title feat(sink): add config auto_create_table for BigQuery feat(sink): add config auto_create for BigQuery Jun 27, 2024
@jetjinser jetjinser force-pushed the jinser/bigquery-auto-create-table branch from c418061 to bbdc227 Compare June 27, 2024 06:40
@jetjinser jetjinser added this pull request to the merge queue Jun 27, 2024
Merged via the queue into main with commit 18f9980 Jun 27, 2024
29 of 30 checks passed
@jetjinser jetjinser deleted the jinser/bigquery-auto-create-table branch June 27, 2024 09:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component/connector type/feature user-facing-changes Contains changes that are visible to users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

auto.create for BigQuery sink
4 participants