Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enforce compute cluster version #1601

Closed
wants to merge 8 commits into from
Closed

Conversation

KuuCi
Copy link
Contributor

@KuuCi KuuCi commented Oct 18, 2024

To resolve:
https://databricks.slack.com/archives/C06RXC1NPQC/p1727371779924089

Essentially, we check if the compute runtime version is higher than our databricks-connect version to determine whether to use 'dbconnect' or not, but we don't check if the compute runtime version is too high. There is no formula to determine which version is too high, so we should just enforce the customer using a computer that has the same major runtime as the databricks-connect version

Test:
Error should be thrown:
image
with the following compute:
image

But should work:

@KuuCi KuuCi requested a review from a team as a code owner October 18, 2024 21:47
@KuuCi KuuCi marked this pull request as draft October 18, 2024 21:49
Copy link
Collaborator

@dakinggg dakinggg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this check potentially too restrictive? do we actually need an exact match? or we just don't know?

@@ -451,6 +451,7 @@ def convert_dataset_hf_from_args(
ValueError: If the output directory already contains the requested splits
ValueError: If `concat_tokens` is set but `tokenizer` is not
"""
os.environ['WORLD_SIZE'] = '1'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

are these changes supposed to be included?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually going to close this pr, during testing seems like every version I tested worked (strangely), so perhaps old bug was actually transient, and was solved not by the version change but simply restarting the cluster. Will revisit if it comes up again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Original motivation is we don't know what works and what gets deprecated

@KuuCi
Copy link
Contributor Author

KuuCi commented Oct 18, 2024

Not needed as all versions currently work... see above comment.

@KuuCi KuuCi closed this Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants