Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPMD] suppor DTensor API integration #5776

Merged
merged 6 commits into from
Nov 13, 2023
Merged

Conversation

yeounoh
Copy link
Contributor

@yeounoh yeounoh commented Nov 7, 2023

This is a follow-up to pytorch/pytorch#92909, DTensor & XLA integration. In this PR, we address

  • move PyTorch/XLA SPMD API to toch_xla/distributed/spmd
  • introduce xla DTensor API to torch_xla/distributed/spmd so DTensor implementation can access.

@yeounoh yeounoh self-assigned this Nov 7, 2023
@yeounoh yeounoh marked this pull request as draft November 7, 2023 23:20
@yeounoh yeounoh added the DO_NOT_MERGE_YET For PRs which cannot be merged, despite tests passing label Nov 7, 2023
@yeounoh yeounoh force-pushed the dtensor_integration_support branch 7 times, most recently from 6cef706 to ccb2c4c Compare November 8, 2023 22:21
@yeounoh yeounoh removed the DO_NOT_MERGE_YET For PRs which cannot be merged, despite tests passing label Nov 8, 2023
@yeounoh yeounoh marked this pull request as ready for review November 8, 2023 22:28
@yeounoh
Copy link
Contributor Author

yeounoh commented Nov 8, 2023

Tested locally, @JackCaoG @jonb377 @alanwaketan I am moving/grouping the SPMD API under toch_xla/experimental/spmd. Later we can move the spmd package out of experimental, when we graduate Beta.

docs/spmd.md Show resolved Hide resolved
@yeounoh yeounoh force-pushed the dtensor_integration_support branch 5 times, most recently from e60ce71 to 19fd03a Compare November 9, 2023 17:35
@yeounoh
Copy link
Contributor Author

yeounoh commented Nov 9, 2023

Tested locally, @JackCaoG @jonb377 @alanwaketan I am moving/grouping the SPMD API under toch_xla/experimental/spmd. Later we can move the spmd package out of experimental, when we graduate Beta.

Had a chat with @JackCaoG , we will move out of experimental and to distributed from BETA.


log = logging.getLogger(__name__)

TORCH_XLA_INITIALIZED = False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this TORCH_XLA_INITIALIZED flag

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, we don't need this anymore, due to the fact that this code now lives in torch_xla.

) -> None:
if TORCH_XLA_INITIALIZED:
# TODO(yeounoh) replace this with xr.use_spmd() when we deprecate the flag.
os.environ["XLA_USE_SPMD"] = "1"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not just call xr.use_spmd() now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there could be a case where the caller already has initialized some tensors before entering the API. We may want to support both, at least in the implementation. I will leave as a TODO here.

Copy link
Collaborator

@JackCaoG JackCaoG left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't read the whole or too careful since I felt like most of them are just moving the code around. Is that true?

Copy link
Collaborator

@jonb377 jonb377 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks Yeounoh!

docs/spmd.md Outdated
@@ -46,8 +46,8 @@ import numpy as np
import torch
import torch_xla.core.xla_model as xm
import torch_xla.runtime as xr
import torch_xla.experimental.xla_sharding as xs
from torch_xla.experimental.xla_sharding import Mesh
import torch_xla.distributed.spmd.xla_sharding as xs
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you think about making the import path just import torch_xla.distributed.spmd as xs? Not sure if xs is still the right abbreviation in that case (maybe we can think of it as xla spmd), but it would make the spmd package the single entrypoint in userspace.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, let's do that.

@yeounoh yeounoh force-pushed the dtensor_integration_support branch 5 times, most recently from be48d90 to c0cdd77 Compare November 13, 2023 17:54
@yeounoh yeounoh force-pushed the dtensor_integration_support branch from c7b779a to 794cbc7 Compare November 13, 2023 20:24
@yeounoh yeounoh merged commit e60428d into master Nov 13, 2023
1 check failed
Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving forward, should we use dtensor directly even within our codebase?

@jonb377 do you want to update HF to reflect this change? We can maybe a couple of days in case this get reverted : ).

@JackCaoG
Copy link
Collaborator

@alanwaketan for HF my major concern is the backward compatibility. I think We should at least wait until 2.2 release is out before updating.

@alanwaketan
Copy link
Collaborator

@alanwaketan for HF my major concern is the backward compatibility. I think We should at least wait until 2.2 release is out before updating.

Sounds good to me.

mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Nov 16, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
zpcore pushed a commit that referenced this pull request Nov 21, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
lsy323 pushed a commit to lsy323/xla that referenced this pull request Nov 28, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
ManfeiBai pushed a commit that referenced this pull request Nov 29, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
pytorchmergebot pushed a commit to pytorch/pytorch that referenced this pull request Mar 7, 2024
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* [SPMD] move SPMD package to torch_xla/experimental/spmd, introduce shadow xla DTensor API.

* support backward compatibility of the old imports

* Move spmd out of experimental

* Update spmd.md for distributed/spmd
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants