Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conditionally set default TPU settings in __init__.py #5696

Merged
merged 3 commits into from
Oct 10, 2023

Conversation

will-cromar
Copy link
Collaborator

@will-cromar will-cromar commented Oct 10, 2023

Replace _set_missing_env with standard library call.

Only set TPU_MEGACORE on TPU v4. Otherwise, use the default value.

@@ -266,6 +266,9 @@ def configure_topology(local_rank: int,
os.environ.setdefault(xenv.TPU_VISIBLE_CHIPS, str(local_rank))
os.environ.setdefault(xenv.TPU_PROCESS_PORT, str(ports[local_rank]))

if version() == 4:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed that this will be called all the time and will be before any modeling code involved tpu?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, this will only get called for sure when using torchrun or xmp.spawn.. Good catch.

It makes the most logical sense to me to set these up during client initialization, but that happens in C++ and all of the TPU utilities are in Python. I'll make this setting conditional in __init__.py for now.

@will-cromar will-cromar changed the title Set TPU_MEGACORE in configure_topology Conditionally set default TPU settings in __init__.py Oct 10, 2023
Copy link
Collaborator

@alanwaketan alanwaketan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks, Will.

@will-cromar will-cromar merged commit a98656d into master Oct 10, 2023
17 checks passed
zpcore pushed a commit that referenced this pull request Oct 19, 2023
* Set TPU_MEGACORE in configure_topology

* remove

* Move back to __init__.py
ghpvnist pushed a commit to ghpvnist/xla that referenced this pull request Oct 31, 2023
* Set TPU_MEGACORE in configure_topology

* remove

* Move back to __init__.py
mbzomowski pushed a commit to mbzomowski-test-org/xla that referenced this pull request Nov 16, 2023
* Set TPU_MEGACORE in configure_topology

* remove

* Move back to __init__.py
chunnienc pushed a commit to chunnienc/xla that referenced this pull request Dec 14, 2023
* Set TPU_MEGACORE in configure_topology

* remove

* Move back to __init__.py
golechwierowicz pushed a commit that referenced this pull request Jan 12, 2024
* Set TPU_MEGACORE in configure_topology

* remove

* Move back to __init__.py
bhavya01 pushed a commit that referenced this pull request Apr 22, 2024
* Set TPU_MEGACORE in configure_topology

* remove

* Move back to __init__.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants