Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize waker size for requirements #40

Draft
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

simonwuelker
Copy link
Contributor

@simonwuelker simonwuelker commented May 9, 2024

This is the result of me tinkering around with the idea described in #4, I would like your approval/disapproval on the basic idea before i write documentation etc.

TLDR; It's possible to reduce the size of a ReceiverWaker if it is known that it will only be used in blocking or non-blocking contexts beforehand. This is currently being decided using crate features, but it would be nice if it was possible to specify this for each channel individually.

Design Idea

This PR uses two traits (SyncWaker / AsyncWaker) to expose different methods on Receiver based on what waker is being used.

There are four different wakers, which are equivalent to the four possible feature combinations right now:

Waker implements SyncWaker implements AsyncWaker Methods on Receiver<T, W> Size
DummyWaker No No try_recv only 0
thread::Thread Yes No All blocking methods 8
task::Waker No Yes All non-blocking methods 16
GenericWaker Yes Yes All 16

To avoid breaking changes, a DefaultWaker is also exported, which equates to the "most permissive" of the available wakers based on the enabled features and, as the name implies, is used by default.

The main design issue currently is how to adjust the crate API so the used waker can be specified.
Currently, channels are being constructed using

fn channel<T>() -> (Sender<T>, Receiver<T>)

Ideally this would be possible:

fn channel<T, W = DefaultWaker>() -> Sender<T, W>, Receiver<T, W>)
where W: Waker

but default values for generic parameters on functions are not allowed.

So the "next best" option would be to either export more constructor functions:

// "Old" API
fn channel<T>() -> (Sender<T, DefaultWaker>, Receiver<T, DefaultWaker>)

// Enable these based on features
fn wakerless_channel<T>() -> (Sender<T, DummyWaker>, Receiver<T, DummyWaker>)
fn strictly_sync_channel<T>() -> (Sender<T, thread::Thread>, Receiver<T, thread::Thread>)
fn strictly_async_channel<T>() -> (Sender<T, task::Waker>, Receiver<T, task::Waker>)

or to move these functions onto the Channel type itself, so the waker type is already clear.

@simonwuelker simonwuelker marked this pull request as draft May 9, 2024 21:32
@simonwuelker simonwuelker force-pushed the optimized-waker branch 2 times, most recently from 4c9e0f1 to 35222da Compare May 9, 2024 21:37
@faern
Copy link
Owner

faern commented May 23, 2024

Hi! Thanks for wanting to contribute. This looks cool and all, but I have some concerns.

To avoid breaking changes, a DefaultWaker is also exported, which equates to the "most permissive" of the available wakers based on the enabled features and, as the name implies, is used by default.

Features need to be additive only. I'm not sure if changing the type of DefaultWaker when the feature set changes is going to cause trouble anywhere 🤔 Technically you are replacing a type with another. But on the other hand, it should change from and to the same thing for every possible observer.

Since I wrote the issue I have maybe changed opinion slightly. This change is pretty complex, and does it really save a lot of space? For this to be worth the extra complexity someone must use this in a way where they have tiny messages, and a huge amount of channels. Like with almost any optimization, I think it would be nice if we had any concrete proof this change is justified by real world usage/benchmarks.

@faern
Copy link
Owner

faern commented May 24, 2024

It's possible that this has small CPU benefits as well due to each waker being simpler than "support all waking methods". However it's really hard to measure. It's hard because you can only perform a single wake event per channel (by design), and allocation and other things related to creating and destroying each channel probably outweigh the waking part by a lot. But yeah, if we can't prove a performance win, then it's probably not worth the complexity anyway, since the same applies for production use, not just benchmarks.

@simonwuelker
Copy link
Contributor Author

Yeah, those are fair concerns (:

I'm also very unsure whether this is worth the effort! (that's part of the reason this is a draft pr)
To address some of your comments:

I'm not sure if changing the type of DefaultWaker when the feature set changes is going to cause trouble anywhere

Hmm, unless someone relies on the result of std::any::type_name (or similar) this shouldn't cause issues. The DefaultWaker only changes to expose more methods as more features are added. But you're not wrong, this is kinda brittle. Especially once you start to consider auto traits!

Like with almost any optimization, I think it would be nice if we had any concrete proof this change is justified by real world usage/benchmarks. [...]
It's possible that this has small CPU benefits as well due to each waker being simpler than "support all waking methods".

I can't imagine there are any measurable performance wins here - at most you're saving one match, but even that might be optimized away when the functions are inlined... Performance was not the motivation here. It just feels clunky to carry around space for a task::Waker when you know that the channel will only be used in sync contexts. Because this is currently controlled with cargo features, either all your channels support async or none do.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants