-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leveraging vectored I/O during serialization #687
Comments
Since you are talking about optimisations, I fear I would want some benchmarks on this. Something that uses criterion to serialise a request and compares that against a hand-written solution? What would be a "bad" request where this matters a lot?
What would be a request where this matters a lot? My "go to example" for "requests causing problems with lots of data" would be x11rb/x11rb-protocol/src/protocol/xproto.rs Line 17028 in e6a7071
x11rb/x11rb-protocol/src/protocol/xproto.rs Line 17076 in e6a7071
But this x11rb/x11rb-protocol/src/protocol/xproto.rs Lines 17127 to 17136 in e6a7071
So, I bet it would be quite simple to write code that does better than this GetImage likely is a bad example for this kind of experiments. I just looked through Another thing that came to my mind while looking through the XML: This proposal means that we now need Anyway, what I really wanted to point at is https://bugs.freedesktop.org/show_bug.cgi?id=23403 and https://gitlab.freedesktop.org/xorg/lib/libxcb/-/issues/36. I think your proposal does not have this problem, but I just wanted to point out that even libxcb did not manage to get (the C equivalent of) the
This is one thing that I always liked about x11rb: I can be quite confident that we do not have any "the compiler decided to screw us up"-problems since we do all the parsing/serialising "by hand" (which of course means more code that basically does not do anything....). So, yeah... dunno what to do, but I'd definitely hope for some strong benchmark results indicating that such a change is worthwhile. After all, we are doing I/O and that should be a lot slower than any of the temporary memory allocations on the way there (except for that useless copy in the current |
Yeah, I definitely think we should get benchmarks up and running first. I just wanted to discuss what a potential trait design could look like. |
Sure! My previous attempts / thoughts can be found in #656 and #657. Quote from #657:
As always, FDs make everything complicated. ;-) (In case the above is not clear: For a request with FDs, the request struct contains the FD. Since FDs are neither Now that I have written this, I wonder whether one could move the problem to the impl Request for RequestWithFDs { stuff }
impl Request for &RequestWithoutFDs { stuff } By implementing the trait for a reference to a struct, one can work around the "consumes the struct". However.... I don't think the lifetimes would work out, would they? |
With my previous suggestion of my pub trait IntoSerializable {
type Target: Serializable;
fn into_serializable(self, fds: &mut Vec<RawFdContainer>) -> Self::Target;
} Where the caller would move the file descriptors into |
So, for your example from above, that would mean: struct FooRequest<'input> {
foo: u32,
bar: u32,
fd: RawFdContainer,
my_list: Cow<'input, [OtherThing]>,
}
struct FooRequestSer<'input> {
body: FooBody,
my_list: Cow<'input, [OtherThing]>,
}
impl<'input> IntoSerializable for FooRequest<'input> {
type Target = FooRequestSer<'input>;
fn into_serializable(self, fds: fds: &mut Vec<RawFdContainer>) -> Self::Target {
fds.push(self.fd);
Self::Target {
body: FooBody {
foo: self.foo,
bar: self.bar,
},
my_list: self.my_list
}
}
}
#[derive(bytemuck::Pod, bytemuck::Zeroable)]
#[repr(C)]
struct FooBody {
foo: u32,
bar: u32,
} Right? So... first we have This could possibly end up in a lot of code, but might work, I guess. If you want, you can experiment some more with this. A self-contained, compilable example would be nice, I guess. I wonder how well the compiler would manage to optimise all those "uselessly move data around" away... Anyway, gotta go. |
I actually did some experiments with this on Godbolt. Anything less than a pointer in size is usually a no-op, and anything more turns into a |
Another potential benefit I noticed while analyzing libcairo's source code: vectored I/O can be used to effectively crop and image without using an intermediate buffer: https://github.com/freedesktop/cairo/blob/master/src/cairo-xcb-connection-core.c#L146-L224 |
Right now, the
Serialize
trait looks something like this:Meanwhile, the
Request
trait has a method like this (ignoring the existence of fds):In practice, this usually means that for the request, the "body" (i.e. the non-list part of the request) is serialized into one vector, and the remaining lists are serialized into a
Vec<u8>
each. While this is good, it means that we're spending time serializing and buffering our data, which is a problem that vectored writes were designed to solve.In
breadx
, I was experimenting with a trait design like this:Where
num_slices
tells the user how manyIoSlice
s the object will use during serialization, andserialize
writes those slices into the given buffer. In my experiments I also added awith_slices
method that takes a callback with the mutableIoSlice
s. By default it used theVec
implementation above, but it could be overridden to use an array and completely avoid heap memory.The main issue with the above, which I discovered during experimentation, is that requests end up having several fields, and dedicating an I/O slice to each one ends up with needing dozens of I/O slices for memory that's mostly contiguous. However, my solution to that was to use bytemuck to convert raw structures to bytes, like so:
What are your thoughts on this? It would certainly be no small amount of work, and I'm not sure it would be worth it, but this is pretty close to what libxcb does through unsafe pointer casting.
The text was updated successfully, but these errors were encountered: