Skip to content

Commit

Permalink
Ccl/2.5 (#226)
Browse files Browse the repository at this point in the history
* update oneccl to 2021.14 (#224)

Use oneccl from https://github.com/oneapi-src/oneCCL/releases/tag/2021.14 - 3afa1bb7936f57683a2503c34b29c0daca6a9cc

* update readme (#223)

* restore known issue
  • Loading branch information
Chao1Han authored Dec 13, 2024
1 parent a756fd4 commit bce5012
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 17 deletions.
35 changes: 19 additions & 16 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@ We recommend using Anaconda as Python package management system. The followings
| `torch` | `oneccl_bindings_for_pytorch` |
| :-------------------------------------------------------------: | :-----------------------------------------------------------------------: |
| `master` | `master` |
| [v2.5.0](https://github.com/pytorch/pytorch/tree/v2.5.0) | [ccl_torch2.5.0](https://github.com/intel/torch-ccl/tree/ccl_torch2.5.0+xpu) |
| [v2.3.1](https://github.com/pytorch/pytorch/tree/v2.3.1) | [ccl_torch2.3.100](https://github.com/intel/torch-ccl/tree/ccl_torch2.3.100+xpu) |
| [v2.1.0](https://github.com/pytorch/pytorch/tree/v2.1.0) | [ccl_torch2.1.400](https://github.com/intel/torch-ccl/tree/ccl_torch2.1.400+xpu) |
| [v2.1.0](https://github.com/pytorch/pytorch/tree/v2.1.0) | [ccl_torch2.1.300](https://github.com/intel/torch-ccl/tree/ccl_torch2.1.300+xpu) |
Expand All @@ -59,7 +60,7 @@ The usage details can be found in the README of corresponding branch.

- Python 3.8 or later and a C++17 compiler

- PyTorch v2.3.1
- PyTorch v2.5.1

## Build Option List

Expand Down Expand Up @@ -93,6 +94,7 @@ The following launch options are supported in Intel® oneCCL Bindings for PyTorc

```bash
git clone https://github.com/intel/torch-ccl.git && cd torch-ccl
git checkout ccl_torch2.5.0+xpu
git submodule sync
git submodule update --init --recursive
```
Expand All @@ -114,22 +116,23 @@ The following launch options are supported in Intel® oneCCL Bindings for PyTorc

Wheel files are available for the following Python versions. Please always use the latest release to get started.

| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 | Python 3.11 |
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: | :---------: |
| 2.3.100 | | |||||
| 2.1.400 | | |||||
| 2.1.300 | | |||||
| 2.1.200 | | |||||
| 2.1.100 | | |||||
| 2.0.100 | | |||||
| 1.13 | ||||| |
| 1.12.100 | ||||| |
| 1.12.0 | ||||| |
| 1.11.0 | ||||| |
| 1.10.0 ||||| | |
| Extension Version | Python 3.6 | Python 3.7 | Python 3.8 | Python 3.9 | Python 3.10 | Python 3.11 | Python 3.12 |
| :---------------: | :--------: | :--------: | :--------: | :--------: | :---------: | :---------: | :---------: |
| 2.5.1 | | | |||||
| 2.3.100 | | ||||| |
| 2.1.400 | | ||||| |
| 2.1.300 | | ||||| |
| 2.1.200 | | ||||| |
| 2.1.100 | | ||||| |
| 2.0.100 | | ||||| |
| 1.13 | ||||| | |
| 1.12.100 | ||||| | |
| 1.12.0 | ||||| | |
| 1.11.0 | ||||| | |
| 1.10.0 ||||| | | |

```bash
python -m pip install oneccl_bind_pt==2.3.100 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
python -m pip install oneccl_bind_pt==2.5.0 --extra-index-url https://pytorch-extension.intel.com/release-whl/stable/xpu/us/
```

**Note:** Please set proxy or update URL address to https://pytorch-extension.intel.com/release-whl/stable/xpu/cn/ if you meet connection issue.
Expand Down Expand Up @@ -270,7 +273,7 @@ mpirun -n 2 -l python profiling.py

## Known Issues

For Point-to-point communication, directly call dist.send/recv after initializing the process group in launch script will trigger runtime error. Because all ranks of the group are expected to participate in this call to create communicators in our current implementation, while dist.send/recv only has a pair of ranks' participation. As a result, dist.send/recv should be used after collective call, which ensures all ranks' participation. The further solution for supporting directly call dist.send/recv after initializing the process group is still under investigation.
For Point-to-point communication, directly call dist.send/recv after initializing the process group in launch script will trigger runtime error. Because all ranks of the group are expected to participate in this call to create communicators in our current implementation, while dist.send/recv only has a pair of ranks' participation. As a result, dist.send/recv should be used after collective call, which ensures all ranks' participation.

## License

Expand Down
2 changes: 1 addition & 1 deletion third_party/oneCCL

0 comments on commit bce5012

Please sign in to comment.