Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Trouble using torch-ccl with the mlx provider #67

Open
mwheinz opened this issue Jun 13, 2024 · 1 comment
Open

Trouble using torch-ccl with the mlx provider #67

mwheinz opened this issue Jun 13, 2024 · 1 comment

Comments

@mwheinz
Copy link

mwheinz commented Jun 13, 2024

We've had success using torch-ccl with resnet and other AI workloads to test with libfabric over psm3 but when we try to use libmlx-fi.so, torch-ccl does not seem to see it even when the provider has been copied into the provider directory.

Is this a known limitation of torch-ccl? Is there a make file we need to modify?

TIA.

@ddkalamk
Copy link
Contributor

@mwheinz torch-ccl doesn't work with mlx provider. I think the issue is oneCCL needs thread multiple capability to use multiple workers, and MLX provider doesn't support it so it fails at the init call itself.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants