Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support serial type (rowid) in sink #16009

Closed
xxchan opened this issue Mar 28, 2024 · 5 comments · Fixed by #16969
Closed

Support serial type (rowid) in sink #16009

xxchan opened this issue Mar 28, 2024 · 5 comments · Fixed by #16969

Comments

@xxchan
Copy link
Member

xxchan commented Mar 28, 2024

ERROR rw-streaming risingwave_stream::task::stream_manager: actor exit with error actor_id=6 error=Executor error: Sink error: Encode error: datum_to_json_object: unsupported data type: field name: "row_id1", logical type: Serial, physical type: Serial(Serial(395969369987592192))

One use case is that we can specify rowid in sink primary key, so that internal buffering in sink executor can be avoided for lower latency.

Similar to #15008, but we might need implicit cast.

@github-actions github-actions bot added this to the release-1.8 milestone Mar 28, 2024
@shanicky shanicky self-assigned this Apr 8, 2024
@shanicky shanicky modified the milestones: release-1.8, release-1.10 May 8, 2024
@fuyufjh
Copy link
Member

fuyufjh commented May 23, 2024

Any progress? @xxchan @shanicky

@shanicky
Copy link
Contributor

Currently, we have supported int64 in protobuf format directly as a native number, but we haven't supported it in JSON and Avro formats. Because we need to use serial as a key and JSON's handling of large numbers cannot guarantee precision, we need to convert it to a string type. cc @fuyufjh Is this approach okay? Should we convert it directly to a string as "12345", or convert it to hexadecimal as "0x12345", or perhaps add a prefix like "rw-serial:12345"? Which option would have the lowest overhead for us in terms of interpretation?

Avro doesn't have this issue, so we can use native int64.

@fuyufjh
Copy link
Member

fuyufjh commented May 28, 2024

Agree that the serial value needs to be converted to string. This because our row_id is almost 100% out of the MAX_SAFE_INTEGER so that a normal downstream JSON parser may lose precision.

Either plain string "12345" or hexadecimal "0x12345" looks good to me.

@xiangjinwu
Copy link
Contributor

One benefit of fixed-length 0x00012345 over 12345 is that consumer can order by it without casting it back to bigints, which can be easily forgotten. This is treating serial like an opaque type.

If that is not a concern, or we already display serial in decimal in some other places, using a consistent decimal representation could make crossing referencing much easier.

@shanicky shanicky linked a pull request May 29, 2024 that will close this issue
4 tasks
@shanicky
Copy link
Contributor

I submitted a PR trying to use padded hexadecimal strings in the JSON. It looks ok.

{"_row_id":"0x05cd07777f434000","v":1}
{"_row_id":"0x05cd07758bc9c000","v":2}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants