Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explore whether cloning context can be made more efficient #380

Open
akoshelev opened this issue Jan 12, 2023 · 8 comments
Open

Explore whether cloning context can be made more efficient #380

akoshelev opened this issue Jan 12, 2023 · 8 comments

Comments

@akoshelev
Copy link
Collaborator

akoshelev commented Jan 12, 2023

Our software design for IPA require protocol contexts to be cheaply cloneable because we clone them a lot.

pub struct SemiHonestContext<'a, F: Field> {
    inner: Arc<ContextInner<'a>>,
    step: Step,
}

Making steps cloning inexpensive is tracked in #139, this issue focuses on improvements that can be made on protocol contexts, specifically on ContextInner clone. It looks innocent (atomic load/store on each clone) but when doing it often it may lead to false sharing issues on CPU caches. There may be a better way to do this if we look at the structure of the ContextInner struct:

pub(super) struct ContextInner<'a> {
    pub role: Role,
    pub prss: &'a PrssEndpoint,
    pub gateway: &'a Gateway,
}

There is a role bit and two references that never change no matter how many times we clone the context. There might be an opportunity to cache those values (not references) in one place. Root-level future that is created to run protocol may be a good place to do that and Tokio crate provides some support for it: https://docs.rs/tokio/latest/tokio/task/struct.LocalKey.html.

10000 foot view

This is how query processing may look like

  • Helpers receive a request to process new query. They dynamically allocate roles for each of them and set up PRSS.
  • Each helper stores role, PRSS endpoint and Gateway inside task local data cell (tokio localkey)
  • Query processing is launched by spawning a tokio task (or any other runtime task). In case of tokio it would look like
   tokio::spawn(|| async {
        CONTEXT_INNER.scope((prss, gateway, role), async move {
              let context = SemiHonestContext::new("step");  // no need to pass PRSS or Gateway anymore      
         }).await;
   });

   impl SemiHonestContext {
          pub fn prss(&self) -> &PrssEndpoint {
              CONTEXT_INNER.with(|inner| inner.0) // not sure if that works fine
          }
   }

This design does not require changing protocols code, everything is contained within Infra layer

Complications

  • Our plan is to run IPA software on Cloudflare Workers that do not support Tokio: https://github.com/cloudflare/workers-rs. LocalKey implementation however does not depend on tokio runtime so there may be no issues
  • This change will require changes in our test infrastructure. Every test needs to set up per-task data before running any computation. TestWorld::contexts() function usage becomes problematic, but luckily most of our tests use Runner interface that can easily hide that compexity inside.
@andyleiserson
Copy link
Collaborator

For the malicious case, there's also some data in ContextInner that isn't static -- upgrade_ctx and accumulator. (And I discovered today that upgrades are not just for inputs, they're also used in solved_bits). We may need to revisit the handling of these.

@akoshelev
Copy link
Collaborator Author

For the malicious case, there's also some data in ContextInner that isn't static

that's right, we may decide to continue cloning these or take the similar approach to store context, accumulator and r inside the task data as well.

   MALICIOUS.scope((context, acc, r), async move {
        // all futures spawned inside this block will have access to malicious context
   })

@andyleiserson
Copy link
Collaborator

Another thought... fortunately the malicious validator accumulator is linear, so we can replace the current synchronized accumulator with separate per-thread accumulators.

@martinthomson
Copy link
Member

Wouldn't that constrain all helpers into using the same thread allocation policy?

@andyleiserson
Copy link
Collaborator

Wouldn't that constrain all helpers into using the same thread allocation policy?

I don't think so, but maybe I'm missing something. The idea is that it doesn't matter what order the terms are added, or via what combinations of intermediate subterms. All that matters is that by the end you've added all the same terms.

I think we're already in a situation where the order of addition can be different across helpers depending on how futures get scheduled and who wins contention for the lock -- the difference is that right now there is a total ordering on each helper, but if we switch to per-thread accumulators, then there won't be.

(If contention for this lock is an issue, then we ought to be able to see that once we start profiling the malicious protocol. So I wouldn't propose actually working on this unless we confirm the theory.)

@martinthomson
Copy link
Member

I thought that your proposal was to shard the checksum, meaning that you would have N different checksums. You are only proposing to split the accumulator into several pieces, which you later join once all are done. That would work very neatly, I think, provided that you can be sure that you got everything.

@akoshelev
Copy link
Collaborator Author

There was also an idea floating around at that time to use atomics for counting terms. They also have this property of being eventually consistent

@akoshelev
Copy link
Collaborator Author

Just another point how painful it is to deal with lifetimes inside implementations of Context trait. @andyleiserson and myself are constantly running into rust-lang/rust#100013 whenever we work with functions that accept C: Context and return a future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants