Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

External binary representation support (SEND/RECEIVE for custom types) #887

Open
wants to merge 1 commit into
base: develop
Choose a base branch
from

Conversation

yrashk
Copy link
Contributor

@yrashk yrashk commented Nov 24, 2022

Problem: text type representation is not always efficient

For this reason, Postgres allows types to have an external binary representation. Also, some clients insist on using binary representation.

Solution: introduce SendRecvFuncs trait and sendrecvfuncs attribute

These are used to specify how external binary representation encoding is accomplished.

@yrashk yrashk force-pushed the external-binary-representation branch from 0564fc4 to 0b5a248 Compare November 24, 2022 20:27
For this reason, Postgres allows types to have an external binary
representation. Also, some clients insist on using binary
representation.

Solution: introduce SendRecvFuncs trait and `sendrecvfuncs` attribute

These are used to specify how external binary representation encoding is
accomplished.
@workingjubilee
Copy link
Member

Several traits all reusing the same pattern suggests we should be attempting to create a more generic interface instead of copypasta.

@yrashk
Copy link
Contributor Author

yrashk commented Nov 27, 2022

I am not against having a "more generic" interface, but since we don't have any specific proposal for one, can we work on getting this one since it's of very practical use as is? Even if under a feature gate if so desired.

I've spent a good amount of time getting it done following the current approach found in the library (in/out type of traits), and it feels that saying "we should now find a common interface" is an unfair treatment of the PR. So I would propose to get it into a mergeable shape first and get it in. We can work on generalizing interfaces for these things as a follow-up. What do you think?

@workingjubilee
Copy link
Member

My concern is not necessarily a blocking concern, just something of note.

@yrashk
Copy link
Contributor Author

yrashk commented Nov 27, 2022

As discussed in private, I understand the concern of SendRecvFuncs superficial similarity to PgVarlenaInOutFuncs. That being said, while they are very similar shape-wise, their distinction lies in the specificity of such a trait (Send/Recv are intended explicitly for SEND/RECEIVE functions of the type).

PgVarlenaInOutFuncs is not suitable for some types (where Copy is not feasible), and thus I didn't consider adopting that trait (or breaking its API). I understand that you have doubts about whether Copy is required and whether it can be relaxed to Clone.

You suggested that perhaps in most cases users will not need different serialization (between varlena and send/receive). While I tend to agree, I don't think this should mean we should not make it possible to specify a send/receive-specific implementation.

I feel like this can be a very productive discussion that can help us design the next iteration of API and make this part of pgx much smoother. I'd love to be a part of the conversation.

My only concern is making external binary format (as a feature) depend on the timeline of this discussion. I suggest getting it done first (with the caveat of some potentially changing APIs; but we're pre 1.0 so we can afford some refining) and then, once there's a good consensus, transition to a better design.

@eeeebbbbrrrr
Copy link
Contributor

Several traits all reusing the same pattern suggests we should be attempting to create a more generic interface instead of copypasta.

Yes

My concern is not necessarily a blocking concern, just something of note.

I think it probably should be. We're at the point where a better approach is necessary.

I've been thinking about this over the holiday and I'd like to take a stab at making all of this better. I have some ideas, but don't quite yet have the time to focus on it.

@yrashk
Copy link
Contributor Author

yrashk commented Nov 28, 2022

If you don't have time for this, perhaps it's a case of "better is the enemy of good," and we should consider something like this PR to provide the functionality first, and improve it later?

@eeeebbbbrrrr
Copy link
Contributor

I don't quite have the time yet. I will soon and I'm not all that excited about adding some code to paper over design flaws now, just to remove it all in say, a week.

@workingjubilee
Copy link
Member

In practice, almost all implementations and users of PgVarlenaInOutFuncs are doing it via macro code, so I am not really worried about breaking users of it who manually touch it.

Copy link
Member

@workingjubilee workingjubilee left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are actually two changesets here, and we can accept one if it takes a slightly different shape.

Comment on lines +194 to +234
/// Reads a range of bytes, modifying the underlying cursor to reflect what was read
///
/// Returns None if the underlying remaining binary is smaller than requested with the range.
///
/// Ranges can start from an offset, resulting in skipped information.
///
/// Most common use-case for this is reading the underlying data in full (`read(..)`)
pub fn read<R: RangeBounds<usize>>(&mut self, range: R) -> Option<&[u8]> {
use std::ffi::c_int;
let remaining = unsafe { (*self.sid).len - (*self.sid).cursor } as usize;
let start = match range.start_bound() {
Bound::Included(bound) => *bound,
Bound::Excluded(bound) => *bound + 1,
Bound::Unbounded => 0,
};
let end = match range.end_bound() {
Bound::Included(bound) => *bound,
Bound::Excluded(bound) => *bound - 1,
Bound::Unbounded => remaining,
};
let total = end - start;

if total > remaining {
return None;
}

// safe: self.sid will never be null
Some(unsafe {
if (*self.sid).data.is_null() {
&[]
} else {
(*self.sid).cursor += start as c_int;
let result = std::slice::from_raw_parts(
(*self.sid).data.add((*self.sid).cursor as usize) as *const u8,
total,
);
(*self.sid).cursor += total as c_int;
result
}
})
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably just impl std::io::Read using similar underlying code, which can be extracted into another PR. str implements a similar .get(..) but it doesn't modify cursors. I wouldn't mind seeing this function with a different name.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can totally do this, but I think it would make most sense to do this after #903

Comment on lines +1 to +40
/*
Portions Copyright 2019-2021 ZomboDB, LLC.
Portions Copyright 2021-2022 Technology Concepts & Design, Inc. <[email protected]>

All rights reserved.

Use of this source code is governed by the MIT license that can be found in the LICENSE file.
*/

#[cfg(any(test, feature = "pg_test"))]
#[pgx::pg_schema]
mod tests {
#[allow(unused_imports)]
use crate as pgx_tests;

use pgx::*;

#[pg_test]
fn test_string_info_read_full() {
let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]);
assert_eq!(string_info.read(..), Some(&[1, 2, 3, 4, 5][..]));
assert_eq!(string_info.read(..), Some(&[][..]));
assert_eq!(string_info.read(..=1), None);
}

#[pg_test]
fn test_string_info_read_offset() {
let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]);
assert_eq!(string_info.read(1..), Some(&[2, 3, 4, 5][..]));
assert_eq!(string_info.read(..), Some(&[][..]));
}

#[pg_test]
fn test_string_info_read_cap() {
let mut string_info = StringInfo::from(vec![1, 2, 3, 4, 5]);
assert_eq!(string_info.read(..=1), Some(&[1][..]));
assert_eq!(string_info.read(1..=2), Some(&[3][..]));
assert_eq!(string_info.read(..), Some(&[4, 5][..]));
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These tests, of course, can also go in to the other PR.

yrashk added a commit to supabase/pg_crdt that referenced this pull request Dec 1, 2022
This is not convenient, especially because it requires a forked version
of `cargo-pgx`

Solution: backport pgcentralfoundation/pgrx#887 into pg_crdt

pgcentralfoundation/pgrx#887 has stalled and it is unknown when
and if it'll make it into the mainline. The approach may change.
@etylermoss
Copy link

Any update on getting this merged for #1364?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants