[LoweringContext] Support an optimized parameter mapping for SPMD #8460

rpsilva-aws · 2024-12-05T23:49:12Z

Currently, the existing parameter mapping for the lowering context is not well suited for SPMD. In case of large models, it will cause a large synchronous bottleneck when transferring all device data to the host. This is caused by each ReplicateShardedData computation that gathers and reassembles each sharded data across multiple devices. This is by design, since it is expected to collect all parameters regardless of their allocation.

In this PR, we introduce a new mapping that does not invoke the sharded replication, but instead uses references to the device data. This is generally sufficient and preferred in most cases, where the user only wants to access the validate parameters (those that are not returned as -1 from tensor_parameter_id, as 'fake' parameters).

rpsilva-aws · 2024-12-05T23:49:43Z

Re-opened from #8453, cleaned up the merge commit.

rpsilva-aws mentioned this pull request Dec 5, 2024

[LoweringContext] Support an optimized parameter mapping for SPMD #8453

Closed

tengyifei self-requested a review December 5, 2024 23:51

tengyifei added the tpuci label Dec 5, 2024

tengyifei marked this pull request as ready for review December 5, 2024 23:52

tengyifei approved these changes Dec 5, 2024

View reviewed changes

[LoweringContext] Support an optimized parameter mapping for SPMD

9858577

rpsilva-aws force-pushed the rpsilva_lc_mapping_v3 branch from 8fd7ac7 to 9858577 Compare December 5, 2024 23:53

tengyifei merged commit 5d11f66 into pytorch:master Dec 7, 2024
12 checks passed

rpsilva-aws deleted the rpsilva_lc_mapping_v3 branch December 9, 2024 19:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[LoweringContext] Support an optimized parameter mapping for SPMD #8460

[LoweringContext] Support an optimized parameter mapping for SPMD #8460

rpsilva-aws commented Dec 5, 2024

rpsilva-aws commented Dec 5, 2024

[LoweringContext] Support an optimized parameter mapping for SPMD #8460

[LoweringContext] Support an optimized parameter mapping for SPMD #8460

Conversation

rpsilva-aws commented Dec 5, 2024

rpsilva-aws commented Dec 5, 2024