Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(source): prefer SpecificParserConfig over SourceStruct #12602

Merged
merged 4 commits into from
Oct 21, 2023

Merge branch 'main' into refactor-source-struct-format-encode

7cc8e0c
Select commit
Loading
Failed to load commit list.
Merged

refactor(source): prefer SpecificParserConfig over SourceStruct #12602

Merge branch 'main' into refactor-source-struct-format-encode
7cc8e0c
Select commit
Loading
Failed to load commit list.
Task list completed / task-list-completed Started 2023-10-22 04:14:34 ago

3 / 7 tasks completed

4 tasks still to be completed

Details

Required Tasks

Task Status
consolidate enum ProtocolProperties and enum SourceFormat into a single enum ConnectorFormat (or derived similar to encode below when needed). Incomplete
consolidate enum EncodingProperties and enum SourceEncode into enum ConnectorEncode with derived EnumDiscriminants. Incomplete
extract_source_struct: PbStreamSourceInfo -> SourceStruct Incomplete
SpecificParserConfig::new: (SourceStruct, PbStreamSourceInfo, HashMap<String, String>) -> Self Incomplete
SpecificParserConfig::new: (PbStreamSourceInfo, HashMap<String, String>) -> Self Incomplete
FsSourceDesc does not use source_struct. It already contains SpecificParserConfig inside FsConnectorSource. Incomplete
DatagenEventGenerator only supports json today. When it supports more encodings later, it would need the properties and having only enum discriminants is not enough. Incomplete
handle_alter_source_column rejects avro or protobuf based on discriminant alone. It also rejects json with schema registry. It is unclear whether we need discriminant alone or the full properties. This usage has not been changed by this PR. Incomplete
I have written necessary rustdoc comments Completed
I have added necessary unit tests and integration tests Completed
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934). Incomplete
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future). Incomplete
All checks passed in ./risedev check (or alias, ./risedev c) Completed
My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details) Incomplete
My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users) Incomplete
Before this PR, the relevant code path allows certain create source/table to succeed, but the source parser constantly throws errors that are suppressed. Incomplete
After this PR, such create source/table would fail with the corresponding source parser error. Incomplete
(Before this PR) call extract_avro_table_schema, which pretends to be (plain, avro). Incomplete
(After this PR) call extract_avro_table_schema, which uses (upsert, avro) from passed source_info. Incomplete
(Question above) call extract_upsert_avro_table_schema. Incomplete
When key schema is defined, it ensures key columns are subset of value columns. Incomplete
When key schema is not defined, it adds an _rw_key column in addition to value columns. Incomplete
This extra behavior is not intended to be followed by the upsert-avro-with-user-defined-parimay-key-in-sql case. So the 3rd option is not what we want. Incomplete
On upsert, it requires use of schema registry, and always fetches both key and value schema[1]. Incomplete
On plain, it allows both schema registry and file/https/s3 URLs[2]. It only fetches value schema. Incomplete
Note that when using schema registry with key schema in schema registry, there is essentially no difference. Incomplete
The frontend call is our focus here, which either (1) pretends to be (plain, avro) or (2) uses (upsert, avro) from passed source_info. Incomplete
The compute source executor call has always been using (upsert, avro) from passed source_info. Incomplete