External binary representation support (SEND/RECEIVE for custom types) #887

yrashk · 2022-11-24T20:25:37Z

Problem: text type representation is not always efficient

For this reason, Postgres allows types to have an external binary representation. Also, some clients insist on using binary representation.

Solution: introduce SendRecvFuncs trait and sendrecvfuncs attribute

These are used to specify how external binary representation encoding is accomplished.

For this reason, Postgres allows types to have an external binary representation. Also, some clients insist on using binary representation. Solution: introduce SendRecvFuncs trait and `sendrecvfuncs` attribute These are used to specify how external binary representation encoding is accomplished.

workingjubilee · 2022-11-27T00:25:42Z

Several traits all reusing the same pattern suggests we should be attempting to create a more generic interface instead of copypasta.

yrashk · 2022-11-27T00:36:25Z

I am not against having a "more generic" interface, but since we don't have any specific proposal for one, can we work on getting this one since it's of very practical use as is? Even if under a feature gate if so desired.

I've spent a good amount of time getting it done following the current approach found in the library (in/out type of traits), and it feels that saying "we should now find a common interface" is an unfair treatment of the PR. So I would propose to get it into a mergeable shape first and get it in. We can work on generalizing interfaces for these things as a follow-up. What do you think?

workingjubilee · 2022-11-27T20:16:48Z

My concern is not necessarily a blocking concern, just something of note.

yrashk · 2022-11-27T23:48:39Z

As discussed in private, I understand the concern of SendRecvFuncs superficial similarity to PgVarlenaInOutFuncs. That being said, while they are very similar shape-wise, their distinction lies in the specificity of such a trait (Send/Recv are intended explicitly for SEND/RECEIVE functions of the type).

PgVarlenaInOutFuncs is not suitable for some types (where Copy is not feasible), and thus I didn't consider adopting that trait (or breaking its API). I understand that you have doubts about whether Copy is required and whether it can be relaxed to Clone.

You suggested that perhaps in most cases users will not need different serialization (between varlena and send/receive). While I tend to agree, I don't think this should mean we should not make it possible to specify a send/receive-specific implementation.

I feel like this can be a very productive discussion that can help us design the next iteration of API and make this part of pgx much smoother. I'd love to be a part of the conversation.

My only concern is making external binary format (as a feature) depend on the timeline of this discussion. I suggest getting it done first (with the caveat of some potentially changing APIs; but we're pre 1.0 so we can afford some refining) and then, once there's a good consensus, transition to a better design.

eeeebbbbrrrr · 2022-11-28T14:23:40Z

Several traits all reusing the same pattern suggests we should be attempting to create a more generic interface instead of copypasta.

Yes

My concern is not necessarily a blocking concern, just something of note.

I think it probably should be. We're at the point where a better approach is necessary.

I've been thinking about this over the holiday and I'd like to take a stab at making all of this better. I have some ideas, but don't quite yet have the time to focus on it.

yrashk · 2022-11-28T14:47:55Z

If you don't have time for this, perhaps it's a case of "better is the enemy of good," and we should consider something like this PR to provide the functionality first, and improve it later?

eeeebbbbrrrr · 2022-11-28T14:54:49Z

I don't quite have the time yet. I will soon and I'm not all that excited about adding some code to paper over design flaws now, just to remove it all in say, a week.

workingjubilee · 2022-11-29T06:41:47Z

In practice, almost all implementations and users of PgVarlenaInOutFuncs are doing it via macro code, so I am not really worried about breaking users of it who manually touch it.

workingjubilee

There are actually two changesets here, and we can accept one if it takes a slightly different shape.

workingjubilee · 2022-11-29T21:42:04Z

pgx/src/stringinfo.rs

+    /// Reads a range of bytes, modifying the underlying cursor to reflect what was read
+    ///
+    /// Returns None if the underlying remaining binary is smaller than requested with the range.
+    ///
+    /// Ranges can start from an offset, resulting in skipped information.
+    ///
+    /// Most common use-case for this is reading the underlying data in full (`read(..)`)
+    pub fn read<R: RangeBounds<usize>>(&mut self, range: R) -> Option<&[u8]> {
+        use std::ffi::c_int;
+        let remaining = unsafe { (*self.sid).len - (*self.sid).cursor } as usize;
+        let start = match range.start_bound() {
+            Bound::Included(bound) => *bound,
+            Bound::Excluded(bound) => *bound + 1,
+            Bound::Unbounded => 0,
+        };
+        let end = match range.end_bound() {
+            Bound::Included(bound) => *bound,
+            Bound::Excluded(bound) => *bound - 1,
+            Bound::Unbounded => remaining,
+        };
+        let total = end - start;
+
+        if total > remaining {
+            return None;
+        }
+
+        // safe:  self.sid will never be null
+        Some(unsafe {
+            if (*self.sid).data.is_null() {
+                &[]
+            } else {
+                (*self.sid).cursor += start as c_int;
+                let result = std::slice::from_raw_parts(
+                    (*self.sid).data.add((*self.sid).cursor as usize) as *const u8,
+                    total,
+                );
+                (*self.sid).cursor += total as c_int;
+                result
+            }
+        })
+    }


We should probably just impl std::io::Read using similar underlying code, which can be extracted into another PR. str implements a similar .get(..) but it doesn't modify cursors. I wouldn't mind seeing this function with a different name.

I can totally do this, but I think it would make most sense to do this after #903

workingjubilee · 2022-11-29T21:45:53Z

pgx-tests/src/tests/stringinfo_tests.rs

+/*
+Portions Copyright 2019-2021 ZomboDB, LLC.
+Portions Copyright 2021-2022 Technology Concepts & Design, Inc. <[email protected]>
+
+All rights reserved.
+
+Use of this source code is governed by the MIT license that can be found in the LICENSE file.
+*/
+
+#[cfg(any(test, feature = "pg_test"))]
+#[pgx::pg_schema]
+mod tests {
+    #[allow(unused_imports)]
+    use crate as pgx_tests;
+
+    use pgx::*;
+
+    #[pg_test]
+    fn test_string_info_read_full() {
+        let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]);
+        assert_eq!(string_info.read(..), Some(&[1, 2, 3, 4, 5][..]));
+        assert_eq!(string_info.read(..), Some(&[][..]));
+        assert_eq!(string_info.read(..=1), None);
+    }
+
+    #[pg_test]
+    fn test_string_info_read_offset() {
+        let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]);
+        assert_eq!(string_info.read(1..), Some(&[2, 3, 4, 5][..]));
+        assert_eq!(string_info.read(..), Some(&[][..]));
+    }
+
+    #[pg_test]
+    fn test_string_info_read_cap() {
+        let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]);
+        assert_eq!(string_info.read(..=1), Some(&[1][..]));
+        assert_eq!(string_info.read(1..=2), Some(&[3][..]));
+        assert_eq!(string_info.read(..), Some(&[4, 5][..]));
+    }
+}


These tests, of course, can also go in to the other PR.

This is not convenient, especially because it requires a forked version of `cargo-pgx` Solution: backport pgcentralfoundation/pgrx#887 into pg_crdt pgcentralfoundation/pgrx#887 has stalled and it is unknown when and if it'll make it into the mainline. The approach may change.

etylermoss · 2024-03-01T22:13:55Z

Any update on getting this merged for #1364?

yrashk force-pushed the external-binary-representation branch from 0564fc4 to 0b5a248 Compare November 24, 2022 20:27

yrashk force-pushed the external-binary-representation branch from 0b5a248 to 91d0d68 Compare November 24, 2022 20:45

yrashk mentioned this pull request Nov 25, 2022

Problem: requires a patched version of pgx supabase/pg_crdt#1

Closed

1 task

workingjubilee requested changes Nov 29, 2022

View reviewed changes

workingjubilee added frozen-until-release and removed frozen-until-release labels Nov 30, 2022

This was referenced Oct 18, 2023

feat: Implement python binding tensorchord/pgvecto.rs#58

Closed

feat: Support binary representation (SEND/RECEIVE for Vector type) tensorchord/pgvecto.rs#100

Closed

workingjubilee mentioned this pull request Nov 1, 2023

Full binary receive/send support #1364

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

External binary representation support (SEND/RECEIVE for custom types) #887

External binary representation support (SEND/RECEIVE for custom types) #887

yrashk commented Nov 24, 2022

workingjubilee commented Nov 27, 2022

yrashk commented Nov 27, 2022 •

edited

Loading

workingjubilee commented Nov 27, 2022

yrashk commented Nov 27, 2022

eeeebbbbrrrr commented Nov 28, 2022

yrashk commented Nov 28, 2022

eeeebbbbrrrr commented Nov 28, 2022

workingjubilee commented Nov 29, 2022

workingjubilee left a comment

workingjubilee Nov 29, 2022

yrashk Dec 2, 2022

workingjubilee Nov 29, 2022

etylermoss commented Mar 1, 2024

External binary representation support (SEND/RECEIVE for custom types) #887

Are you sure you want to change the base?

External binary representation support (SEND/RECEIVE for custom types) #887

Conversation

yrashk commented Nov 24, 2022

workingjubilee commented Nov 27, 2022

yrashk commented Nov 27, 2022 • edited Loading

workingjubilee commented Nov 27, 2022

yrashk commented Nov 27, 2022

eeeebbbbrrrr commented Nov 28, 2022

yrashk commented Nov 28, 2022

eeeebbbbrrrr commented Nov 28, 2022

workingjubilee commented Nov 29, 2022

workingjubilee left a comment

Choose a reason for hiding this comment

workingjubilee Nov 29, 2022

Choose a reason for hiding this comment

yrashk Dec 2, 2022

Choose a reason for hiding this comment

workingjubilee Nov 29, 2022

Choose a reason for hiding this comment

etylermoss commented Mar 1, 2024

yrashk commented Nov 27, 2022 •

edited

Loading