refactor(source): prefer SpecificParserConfig
over SourceStruct
#12602
Task list completed / task-list-completed
Started
2023-10-22 04:14:34
ago
3 / 7 tasks completed
4 tasks still to be completed
Details
Required Tasks
Task | Status |
---|---|
consolidate enum ProtocolProperties and enum SourceFormat into a single enum ConnectorFormat (or derived similar to encode below when needed). |
Incomplete |
consolidate enum EncodingProperties and enum SourceEncode into enum ConnectorEncode with derived EnumDiscriminants . |
Incomplete |
extract_source_struct: PbStreamSourceInfo -> SourceStruct |
Incomplete |
SpecificParserConfig::new: (SourceStruct, PbStreamSourceInfo, HashMap<String, String>) -> Self |
Incomplete |
SpecificParserConfig::new: (PbStreamSourceInfo, HashMap<String, String>) -> Self |
Incomplete |
FsSourceDesc does not use source_struct . It already contains SpecificParserConfig inside FsConnectorSource . |
Incomplete |
DatagenEventGenerator only supports json today. When it supports more encodings later, it would need the properties and having only enum discriminants is not enough. |
Incomplete |
handle_alter_source_column rejects avro or protobuf based on discriminant alone. It also rejects json with schema registry. It is unclear whether we need discriminant alone or the full properties. This usage has not been changed by this PR. |
Incomplete |
I have written necessary rustdoc comments | Completed |
I have added necessary unit tests and integration tests | Completed |
I have added fuzzing tests or opened an issue to track them. (Optional, recommended for new SQL features #7934). | Incomplete |
My PR contains breaking changes. (If it deprecates some features, please create a tracking issue to remove them in the future). | Incomplete |
All checks passed in ./risedev check (or alias, ./risedev c ) |
Completed |
My PR contains critical fixes that are necessary to be merged into the latest release. (Please check out the details) | Incomplete |
My PR needs documentation updates. (Please use the Release note section below to summarize the impact on users) | Incomplete |
Before this PR, the relevant code path allows certain create source/table to succeed, but the source parser constantly throws errors that are suppressed. |
Incomplete |
After this PR, such create source/table would fail with the corresponding source parser error. |
Incomplete |
(Before this PR) call extract_avro_table_schema , which pretends to be (plain, avro) . |
Incomplete |
(After this PR) call extract_avro_table_schema , which uses (upsert, avro) from passed source_info . |
Incomplete |
(Question above) call extract_upsert_avro_table_schema . |
Incomplete |
When key schema is defined, it ensures key columns are subset of value columns. | Incomplete |
When key schema is not defined, it adds an _rw_key column in addition to value columns. |
Incomplete |
This extra behavior is not intended to be followed by the upsert-avro-with-user-defined-parimay-key-in-sql case. So the 3rd option is not what we want. | Incomplete |
On upsert , it requires use of schema registry, and always fetches both key and value schema[1]. |
Incomplete |
On plain , it allows both schema registry and file/https/s3 URLs[2]. It only fetches value schema. |
Incomplete |
Note that when using schema registry with key schema in schema registry, there is essentially no difference. | Incomplete |
The frontend call is our focus here, which either (1) pretends to be (plain, avro) or (2) uses (upsert, avro) from passed source_info . |
Incomplete |
The compute source executor call has always been using (upsert, avro) from passed source_info . |
Incomplete |
Loading