[Feature Request] Implement distributed.py in the c++ #15061

dmakoviichuk-tt · 2024-11-14T19:15:21Z

Is your feature request related to a problem? Please describe.
We have a lot of classes implemented in the distribute.py which we need in c++.
But we don't have pytorch in c++. Other library (xtensor) should be used instead.
Also we have to_torch/from_torch functions in python but there are no convenient versions of them.

Describe the solution you'd like

Introduce xtensor in the ttnn. We already have it in the tt-train https://github.com/tenstorrent/tt-metal/blob/main/tt-train/cmake/dependencies.cmake#L57
Easiest way to add it is CPM:
CPMAddPackage(NAME xtl GITHUB_REPOSITORY xtensor-stack/xtl GIT_TAG 0.7.7 OPTIONS "XTL_ENABLE_TESTS OFF")
CPMAddPackage(NAME xtensor GITHUB_REPOSITORY xtensor-stack/xtensor GIT_TAG 0.25.0 OPTIONS "XTENSOR_ENABLE_TESTS OFF")
Implement to_vector/from_vector/from_view and to_xtensor/from_xtensor in c++. Please take a look at the reference:

tt-metal/tt-train/sources/ttml/core/tt_tensor_utils.cpp

Line 183 in 758f8c9

tt::tt_metal::Tensor from_vector<float, DataType::BFLOAT16>(

Need to make sure that all types are supported. In my reference I didn't add support for 4-8 bit types.
Implement all Sharding and Replicating strategies described in the distributed.py.
Reuse c++ implementations in python.

Seems like xtensor can support our custom bfloat16. So it might be useful in some cases.

Describe alternatives you've considered
I've considered to implement a few hacks in the tt-train.

staylorTT · 2024-12-03T16:17:29Z

@cfjchu Do we have any updates for this bug? either ETA or otherwise from your team?

omilyutin-tt · 2024-12-03T17:19:09Z

@staylorTT we are looking for a simpler multi-device sharding API than what is currently provided in distributed.py; in parallel, I'm adding creation functions to/from vector/view + the support for xtensor.

If the multi-device sharding is a blocker for you, would it work to provide from_vector / from_xarray + an extension similar to what we have in distributed.py? I think we can get that in within a week or two, but I'm afraid that fleshing out ttnn-native API might take more time as there are some unknowns.

nsmithtt · 2024-12-04T05:27:57Z

@omilyutin-tt I think that API is workable for us. Ideally if this is a TTNN API eventually it accepts a tensor type and returns a tensor type (multi-device sharded). It also seems like it might be straightforward to achieve this by just bouncing a host storage tensor through std::vector and turning around and calling this proposed API. Anyway, we'll work with what we can get in the short term, thank you for looking into this :)

### Ticket #15061 ### What's changed * Refactor `DistributedTensorConfig` in it's own header * Use typed `struct` to represent `MeshShape` and `MeshOffset` ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12210236362) - [X] New/Existing tests provide coverage for changes

…++ ttnn tensors (#15886) ### Ticket #15755 ### Problem description Multi-device tensor distribution currently works through `distributed.py`, which relies on PyTorch libraries to perform sharding / concatenation. ### What's changed * Add xtensor to ttnn. * Lower facilities from tt-train down to ttnn. In particular: `chunk`, `concatenate` functions along with some conversion utils, and the relevant tests. * Add `distributed_tensor.hpp` header with the multi-device distribution APIs. **In follow up PRs:** * Support bf4 / bf8 and other formats in `from_vector` / `to_vector` and other overloads. * Support outputting a tilized tensor. * Migrate functionality from `pytensor.cpp` to using the new APIs. ### Checklist - [x] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12333746639/job/34427015707) (failure in clang-tidy in unreleated tt-train directory) - [X] [code analysis run](https://github.com/tenstorrent/tt-metal/actions/runs/12360844971) - [x] [T3K unit + frequent + model reg tests](https://github.com/tenstorrent/tt-metal/actions/runs/12360656141) - same breakage on main. - [X] New/Existing tests provide coverage for changes

omilyutin-tt · 2024-12-17T18:29:37Z

#15886 adds the initial support for distributing a tensor across devices. I'm working on a couple of follow ups to support more data types, handle tilized layouts, also some performance optimizations. Please report any issues you are encountering!

…rmats (#16105) ### Ticket #15061 ### Problem description `to_vector` / `from_vector` don't support some of the special cases, which prevents a more widespread adoption (distributing tensors across mesh of devices in particular). ### What's changed * Support tilized layouts. * Support bf4 / bf8 data types with auto-padding. * Extended `chunk` / `concat` support for the added types. ### Next steps * Optimize certain operations on-device, such as tilization, whenever possible. * Perform auto-padding in tilized layouts / when using sharding. * Switching pytensor logic to using `from_vector` API. ### Checklist - [X] [Post commit CI passes](https://github.com/tenstorrent/tt-metal/actions/runs/12422597810) - [X] New/Existing tests provide coverage for changes --------- Co-authored-by: Oleg Milyutin <[email protected]>

dmakoviichuk-tt added the feature-request External feature request label Nov 14, 2024

dmakoviichuk-tt assigned cfjchu Nov 14, 2024

cfjchu assigned omilyutin-tt Nov 14, 2024

omilyutin-tt mentioned this issue Nov 24, 2024

Port ShardTensorToMesh into C++ #10291

Closed

prajaramanTT added P1 forge labels Nov 27, 2024

wooseokTT mentioned this issue Dec 4, 2024

Enable lowering ttir all_reduce and mesh_shard to ttnn and flatbuffer tenstorrent/tt-mlir#1432

Merged

omilyutin-tt mentioned this issue Dec 5, 2024

#15061: Refactor utilities related to Mesh infra #15757

Merged

2 tasks

omilyutin-tt mentioned this issue Dec 17, 2024

#15061: Extended {to,from}_vector to support tilized layout, bf4/8 formats #16105

Merged

2 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Implement distributed.py in the c++ #15061

[Feature Request] Implement distributed.py in the c++ #15061

dmakoviichuk-tt commented Nov 14, 2024

staylorTT commented Dec 3, 2024

omilyutin-tt commented Dec 3, 2024

nsmithtt commented Dec 4, 2024

omilyutin-tt commented Dec 17, 2024

[Feature Request] Implement distributed.py in the c++ #15061

[Feature Request] Implement distributed.py in the c++ #15061

Comments

dmakoviichuk-tt commented Nov 14, 2024

staylorTT commented Dec 3, 2024

omilyutin-tt commented Dec 3, 2024

nsmithtt commented Dec 4, 2024

omilyutin-tt commented Dec 17, 2024