Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Busy wait utils in dist #3396

Merged
merged 8 commits into from
Jun 14, 2024
Merged

Busy wait utils in dist #3396

merged 8 commits into from
Jun 14, 2024

Conversation

dakinggg
Copy link
Contributor

@dakinggg dakinggg commented Jun 12, 2024

What does this PR do?

This PR adds additional utils for busy waiting on a node using a signal file lock. The main difference between the added utils and the current approach used is that the signal file has a randomly generated identified appended to it, to better support use cases that include multiple runs using a shared file system that may interfere with each other. Future PRs will replace the signal file uses in Composer (and after release Foundry) with these utils, but just want to start with the util implementation to keep PRs small.

What issue(s) does this change relate to?

Related to mosaicml/llm-foundry#1253

Before submitting

  • Have you read the contributor guidelines?
  • Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
  • Did you update any related docs and document your change?
  • Did you update any related tests and add any new tests related to your change? (see testing)
  • Did you run the tests locally to make sure they pass?
  • Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

@dakinggg dakinggg marked this pull request as ready for review June 12, 2024 17:18
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you update calls in composer to use these helper fns?

composer/utils/dist.py Outdated Show resolved Hide resolved
@dakinggg dakinggg requested a review from mvpatel2000 June 12, 2024 22:04
Copy link
Contributor

@mvpatel2000 mvpatel2000 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Minor comment

composer/utils/dist.py Outdated Show resolved Hide resolved
@dakinggg dakinggg disabled auto-merge June 13, 2024 17:09
@dakinggg dakinggg merged commit e494f9b into mosaicml:dev Jun 14, 2024
17 checks passed
mvpatel2000 pushed a commit to mvpatel2000/composer that referenced this pull request Jul 21, 2024
mvpatel2000 pushed a commit that referenced this pull request Jul 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants