-
Notifications
You must be signed in to change notification settings - Fork 592
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(sink): jsonb column to avro record/map/union type #16941
Comments
@maingoh In your specific case, could you try converting the loosely-typed
(Note the function As for a general case, note that Does this address your issue? |
Thank you, I managed to sink a simple struct as an avro record but I don't find a way to generate a type that is convertible to an avro map. Is a jsonb object directly convertible to a map ? I actually have a joined table with a name, and some other columns. I would like to build a I would also need a way to union different types as avro allow it. For example we have a jsonb column which can be different types. I feel it does not exist (yet ?), what would be the best way to sink such values ?
Good to know! I actually didn't need to use |
You are right it is not available in RisingWave yet. Although
There is no native
To clarify:
|
Is it really possible to store bytes into a jsonb field ? At least JSON support only strings (usually bytes are b64 encoded strings). So for me it sounds quite natural to convert it natively to
I agree that the union type is not very needed as a risingwave type. However doing the conditional check on types is something that RW could support natively for most types, it is not very user friendly to do so in SQL. If the jsonb column is compatible with the avro format, why not try to serialize it without needing some strong casts ? And having a way to default to null the uncompatible ones like in the example with There could be a parameter on the sink side to treat jsonb columns as avro ones while sinking. This way it would still be stored as unstructured JSONB on RW (allowing a single json field to have different type) but would be converted in the sink only using this mapping:
All other very specific (dates, bytes, smallint, float) would need an explicit conversion in the query itself. But since most JSON column only have the types above, it would save a lot of boilerplate to not have to handle this on the user side. In a similar way, on the source side, RW would find the best native type if possible otherwise fallback to JSONB in case it is not possible. It could be an optional parameter |
Here is another mapping that would fit my use case https://materialize.com/docs/sql/create-sink/kafka/#avro:
|
My major concern above is about adding a native union type, or sinking arbitrary jsonb as avro union. There was one core difference I did not mention and may have led to some confusions: RisingWave does not automatically generate and register the avro schema to schema registry. Instead, it accepts an existing one read from schema registry, so it need to do the following validation before seeing any concrete records:
If we were to generate the avro schema, we could generate a Does this address your concern? |
If 1. and 3. land quickly I feel handling union is less of a priority. In any case the native union type feels too much and can be handled using jsonb. I have not seen any database supporting it (I didn't spend much time though).
In this case I would say:
|
@xiangjinwu any approximate idea when 1. and 3. would be available ? |
I think I am facing a similar issue. I am trying to sink jsonb values from a materialized view into an avro schema in kafka, with map and record types. I am getting the following errors:
Or
Originally posted by @maingoh in #11699 (comment)
The text was updated successfully, but these errors were encountered: