Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add CLI command to run hybrid query #1472

Merged

Conversation

eriktaubeneck
Copy link
Member

turns out a bunch of these functions aren't actually OprfIpa specific. I did one small "hack" to convert HybridQueryParams into IpaQueryConfig (it's just a super set and the extra fields are unused at this point) to avoid copying and pasting the entire run_query_and_validate function.

This is mostly complete, but we'll need to swap in HTTPS clients instead of HTTP, so we're blocked on that, cc @akoshelev

@akoshelev
Copy link
Collaborator

not sure what you mean by HTTPS clients - we do have them and they fully support HTTPS

@akoshelev
Copy link
Collaborator

on the integration test side, #1473 should provide a way to write tests that leverage HTTPS

Copy link

codecov bot commented Dec 4, 2024

Codecov Report

Attention: Patch coverage is 9.23913% with 167 lines in your changes missing coverage. Please review.

Project coverage is 93.49%. Comparing base (d8b73fb) to head (4d19d4e).
Report is 5 commits behind head on main.

Files with missing lines Patch % Lines
ipa-core/src/cli/playbook/hybrid.rs 0.00% 84 Missing ⚠️
ipa-core/src/bin/report_collector.rs 13.54% 83 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1472      +/-   ##
==========================================
- Coverage   93.84%   93.49%   -0.35%     
==========================================
  Files         235      236       +1     
  Lines       42453    42627     +174     
==========================================
+ Hits        39839    39855      +16     
- Misses       2614     2772     +158     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@eriktaubeneck eriktaubeneck marked this pull request as ready for review December 4, 2024 19:46
Comment on lines +41 to +54
let leader_clients = &clients[0];
try_join_all(
inputs
.into_iter()
.zip(leader_clients)
.map(|(input_stream, client)| {
client.query_input(QueryInput {
query_id,
input_stream,
})
}),
)
.await
.unwrap();
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akoshelev I ended up needing to create a new function here that could take Vec<[IpaHttpClient<Helper>; 3]>. right now it submits everything to the leader, but this seems like the natural place to do the round robin submission.

@@ -329,6 +358,94 @@ fn write_ipa_output_file(
Ok(())
}

fn write_hybrid_output_file(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks oddly familiar, is it possible to re-use the same code and avoid copy paste here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same old problem with Oprf and Hybrid. we can either spend time supporting both or just delete the other when we remove Oprf.

query_type,
};

let query_id = helper_clients[0][0]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we first do a query status to make sure something else isn't running?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe it panics with "query already running" already, so that's effectively the same, no?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really. The system does't have something like distributed transactions. Think of the following scenario; helper 1 gets the create_query and sends prepare_query to 2 and 3. Say helper 2 return ok but helper 3 says its running something return error. We don't "rollback" helper2 so it's left awaiting inputs.

This problem should be worse now with more states spread around.

Copy link
Member Author

@eriktaubeneck eriktaubeneck Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair, and we have the "kill" endpoint to help clean up that state that sometimes happens. I'm inclined to try and get this merged as is (which is the same as Oprf Ipa, but with shards), and punt on making this better once we can actually run a query. For the immediate testing needs, we'll be restarting the binaries regularly anyways.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't think we have a sharded kill, but we can easily restart the k8s cluster.

I you think adding a query_status is going to take much time then I agree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

certainly something we should add, but we likely want to do that across all shards, so let's punt for now.

@eriktaubeneck
Copy link
Member Author

only check fail is code patch, which is coming in #1476. merging

@eriktaubeneck eriktaubeneck merged commit 26ee498 into private-attribution:main Dec 5, 2024
11 of 12 checks passed
@eriktaubeneck eriktaubeneck deleted the hybrid-rc-cli branch December 5, 2024 06:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants