-
Notifications
You must be signed in to change notification settings - Fork 153
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Making Streaming Dataset framework agnostic: Removing PyTorch dependency #551
Comments
Decoupling from PyTorch would be a hell of a project! We enthusiastically welcome your contributions. Let me list some objections that come to mind offhand -- what do you make of them?
|
@knighton thanks for your comment and support.
I’ll keep you updated on the same! Thanks! |
Appreciate the updates. I would recommend just reading our |
Experimental PR to remove dependency on torch dist: |
@knighton Wow! That was fast! |
🚀 Feature Request
Hey MosaicML team! Thank you so much for this awesome project! I was wondering if there are any plans to make this framework agnostic: Remove the dependency from PyTorch.
Motivation
The general idea of
StreamingDataset
is very useful and I believe the ML community in general will be more thrilled if we decouple this from PyTorch.Implementation
Here are my thoughts on how we can go about this:
IterableDataset
) which can be very easily re-implemented here.CuPy
project comes to rescue. We can have seamless interoperability between CuPy, Jax, Tensorflow and PyTorch Tensors via thedl_pack
API with no copies. And most of the functions in thedistributed.py
file have similar implementations in CuPy's distributed API.StreamingDataLoader
we can have this as an optional install if installing with PyTorch backend.CuPy
instead ofPyTorch
we can keep this framework neutral and also have 0 copy interoperability between Jax, TF and Torch.Additional context
If made framework agnostic:
tf.data
pipelines which works well with Jax and Tensorflow.keras.utils.Sequence
this way we can also use it with Keras-3 which is compatible with TF/Jax/PyTorch backends.Also I will be happy to extend my support on the same if you guys think this is a potential future direction!
The text was updated successfully, but these errors were encountered: