Ray is an open source framework for building distributed and parallel applications. It was developed by the RISELab at UC Berkeley and is now maintained by the Ray team at Anyscale.
With Ray, you can easily scale your Python applications to run on a cluster of machines, using a simple and intuitive API. Ray provides a number of useful features, including distributed computing, parallel processing, and fault tolerance.
para-dist-workspace (Distributed ML) is a Workspace Documented for Deep Learning, Climate Modelling, Molecular Dynamics, and other parallel & distributed applications that creates a procedure of setting up and running the Ray Distributed Framework.
Check out the usage section for further information, including how to install the project.
!!! note
This Documentation is under active development.
-
Distributed computing: Ray allows you to easily distribute your computations across a cluster of machines, enabling you to scale your applications to handle large datasets and complex workloads.
-
Parallel processing: Ray makes it easy to parallelize your code, so that you can take advantage of multiple CPUs or GPUs to speed up your computations.
-
Fault tolerance: Ray provides fault tolerance out of the box, so that your applications can continue running even if one or more machines in your cluster fail.
-
Easy to use: Ray's API is designed to be simple and intuitive, so that you can focus on writing your application logic instead of worrying about the details of distributed computing.
To get started with Distributed Environment , you can follow these steps:
1. Your Can view Our Installation Documentation and run the scripts to install you environment
2. View our Examples and run the scripts to get started with Ray Distributed Framework
3. When got Stuck with the error, you can view our and run the scripts to get started with Ray Distributed Framework
!!! Warning
Scrits are written for Specific Environment, so you may need to update the scripts according to your environment. Overall
!!! Danger 1. Most of the problems, you may face while running the Ray Distributed Framework is due to the Firewall, so you have to configure or disable the firewall to run the Ray Distributed Framework across mupltiple Nodes.
- Ubuntu 20.04 LTS
- Python 3.10.8
- Ray 2.3.0
- Ray Lightning 0.3.0
- Pytorch 1.13.1
- Nvidia GeForce Driver 515.63
- Cuda Version 11.7
- Pytorch Lightning 1.6.0
!!! Note
By default cuda gets installed with the installation of pytorch, but after the release of the new version of pytorch i.e `PyTorch 2.0` release, all the version is by default installs the pycuda for Pytorch, so you need to install the cuda version manually to run cuda programming for other needs.
More information about the cuda version can be found [here](https://pytorch.org/get-started/previous-versions/)
Ray is a powerful framework for building distributed and parallel applications. It simplicity provides a simple and intuitive API. With our additional amount of documentation, you can easily set up and run your Ray Distributed Framework. We hope that you find our Distributed ML Workspace useful and that it helps you to build your own distributed applications.