-
Notifications
You must be signed in to change notification settings - Fork 487
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conditionally set default TPU settings in __init__.py
#5696
Conversation
torch_xla/_internal/tpu.py
Outdated
@@ -266,6 +266,9 @@ def configure_topology(local_rank: int, | |||
os.environ.setdefault(xenv.TPU_VISIBLE_CHIPS, str(local_rank)) | |||
os.environ.setdefault(xenv.TPU_PROCESS_PORT, str(ports[local_rank])) | |||
|
|||
if version() == 4: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it guaranteed that this will be called all the time and will be before any modeling code involved tpu?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this will only get called for sure when using torchrun
or xmp.spawn
.. Good catch.
It makes the most logical sense to me to set these up during client initialization, but that happens in C++ and all of the TPU utilities are in Python. I'll make this setting conditional in __init__.py
for now.
__init__.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks, Will.
* Set TPU_MEGACORE in configure_topology * remove * Move back to __init__.py
* Set TPU_MEGACORE in configure_topology * remove * Move back to __init__.py
* Set TPU_MEGACORE in configure_topology * remove * Move back to __init__.py
* Set TPU_MEGACORE in configure_topology * remove * Move back to __init__.py
* Set TPU_MEGACORE in configure_topology * remove * Move back to __init__.py
* Set TPU_MEGACORE in configure_topology * remove * Move back to __init__.py
Replace
_set_missing_env
with standard library call.Only set
TPU_MEGACORE
on TPU v4. Otherwise, use the default value.