Vectorize local shard retrieval #5826

jonb377 · 2023-11-20T09:18:09Z

To capitalize on the improvements in #5824 and #5825, moving tensor shards to CPU should be batched. This change does the following:

Make _get_local_shards and _get_local_shard_replica_and_indices operate on lists of tensors instead of individual tensors.
Updates _sharded_cpu_state_dict to use the batched method across all sharded tensors in the state_dict.

With all three changes applied, the amount of time spent transferring the state_dict to CPU for a 2B parameter model decreases from >10s to 3.4s, which unblocks training much more quickly.

yeounoh · 2023-11-23T19:25:14Z

torch_xla/csrc/init_python_bindings.cpp

-          XLATensorPtr xtensor = bridge::GetXlaTensor(input);
+  m.def(
+      "_get_local_shard_replica_and_indices",
+      [](const std::vector<at::Tensor>& input)


nit. rename input to inputs or input_tensors

yeounoh

LGTM

jonb377 requested a review from yeounoh November 20, 2023 09:18

jonb377 self-assigned this Nov 20, 2023

jonb377 force-pushed the jonbolin/copy-pool branch from b9d8a7c to 8992668 Compare November 20, 2023 12:06

jonb377 force-pushed the jonbolin/vectorize-local-shards branch from 8d0b5bf to bad9356 Compare November 20, 2023 12:07

yeounoh reviewed Nov 23, 2023

View reviewed changes

yeounoh approved these changes Nov 23, 2023

View reviewed changes

jonb377 force-pushed the jonbolin/copy-pool branch from 8992668 to fa65980 Compare November 29, 2023 21:47

jonb377 force-pushed the jonbolin/vectorize-local-shards branch from bad9356 to 7ecc38d Compare November 29, 2023 21:59

jonb377 force-pushed the jonbolin/copy-pool branch from fa65980 to c8f7315 Compare November 30, 2023 23:21

jonb377 force-pushed the jonbolin/vectorize-local-shards branch from 7ecc38d to 48e9943 Compare November 30, 2023 23:21

Base automatically changed from jonbolin/copy-pool to master December 1, 2023 01:55

jonb377 added 2 commits December 1, 2023 01:56

Vectorize local shard retrieval

4cc2afe

Rename input

b673c07

jonb377 force-pushed the jonbolin/vectorize-local-shards branch from 48e9943 to b673c07 Compare December 1, 2023 01:57

jonb377 merged commit c919973 into master Dec 1, 2023
19 checks passed

jonb377 deleted the jonbolin/vectorize-local-shards branch December 1, 2023 18:22

ManfeiBai pushed a commit to ManfeiBai/PyTorchXLA that referenced this pull request Dec 1, 2023

Vectorize local shard retrieval (pytorch#5826)

9740734

chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023

Vectorize local shard retrieval (pytorch#5826)

3313bca

golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024

Vectorize local shard retrieval (#5826)

b67c4dc

bhavya01 pushed a commit that referenced this pull request Apr 22, 2024

Vectorize local shard retrieval (#5826)

9d129ba

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Vectorize local shard retrieval #5826

Vectorize local shard retrieval #5826

jonb377 commented Nov 20, 2023

yeounoh Nov 23, 2023

yeounoh left a comment

Vectorize local shard retrieval #5826

Vectorize local shard retrieval #5826

Conversation

jonb377 commented Nov 20, 2023

yeounoh Nov 23, 2023

Choose a reason for hiding this comment

yeounoh left a comment

Choose a reason for hiding this comment