Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancement: Secure Data Transmission for all_reduce in TDX-based Distributed ML Training #61

Open
antchainmappic opened this issue Apr 2, 2024 · 0 comments

Comments

@antchainmappic
Copy link

Dear oneCCL Team,

We are reaching out to request an enhancement for Intel oneCCL that targets secure data transmission for distributed machine learning (ML) training workloads. Specifically, We are looking for built-in encryption within oneCCL’s all_reduce operation, which is critical for secure gradient sharing across nodes equipped with Intel Trust Domain Extensions (TDX).

Use Case:
Our ML training workflows utilize PyTorch’s Distributed Data Parallel (DDP) running on a cluster of TDX-enabled nodes. While TDX provides a robust isolated execution environment, ensuring data security during all_reduce operations between TDX machines is essential for maintaining the confidentiality of sensitive gradient information.

Requirement:
The feature should enable encryption (preferably conforming to standard protocols such as TLS) for data payloads being communicated across nodes during all_reduce. The goal is to ensure that the in-flight data is protected, complementing TDX's at-rest and in-use security capabilities.

Justification:
Guards against the interception of sensitive data during distributed training
Transparently fortifies existing ML workflows without altering user code
Helps maintain the security posture promised by TDX throughout the data lifecycle

We understand performance is critical, hence suggesting this as an optional toggle where secure transmission could be enabled based on user demand.

Looking forward to your thoughts on this proposal. Thanks for your commitment to advancing collective communications.

Best regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant