Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docs] troubleshooting guide #2133

Merged
merged 9 commits into from
Nov 13, 2023

Conversation

MKhalusova
Copy link
Contributor

This PR adds a "Troubleshooting guide" to consolidate all the tips and tools that can be helpful when troubleshooting issues on one page.

What the page includes:

  1. Logging. That’s one of the first steps in troubleshooting. Essentially, the content (except API reference) has moved into the troubleshooting guide (with some style updates)
  2. Hanging code and timeout errors: addresses the mismatched shape of tensors and introduces the debug mode. The debug.md page was removed.
  3. CUDA out of memory (“How to avoid CUDA out of memory” moved here)
  4. Non-reproducible results between device setups: briefly covers possible reasons and refers to the existing doc.
  5. Performance issues on different GPUs: notes on the limitations of using two GPUs
  6. Ask for help: where to get further help

A centralized guide can make it easier to troubleshoot common issues as opposed to multiple docs.

@HuggingFaceDocBuilderDev
Copy link

HuggingFaceDocBuilderDev commented Nov 8, 2023

The documentation is not available anymore as the PR was closed or merged.

Comment on lines 52 to 53
- local: usage_guides/troubleshooting
title: Troubleshooting guide
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not wondering if these may have more value in the Getting Started or Tutorials area. Or put this at the top of the table of contents for this section

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm ok with moving it into Tutorials, out of all the suggestions, it feels like the most fitting.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I did a first passthrough on these, overall they're great!

docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/usage_guides/troubleshooting.md Outdated Show resolved Hide resolved
Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for centralizing this all into a core doc!

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Maria, this is a great addition and hopefully will help many users to come. I left a few small comments.

Also wondering if this would be a good place to mention that kernel issue @muellerzr, maybe in the "Hanging code and timeout errors" section.

docs/source/basic_tutorials/troubleshooting.md Outdated Show resolved Hide resolved
docs/source/basic_tutorials/troubleshooting.md Outdated Show resolved Hide resolved
@MKhalusova
Copy link
Contributor Author

Also wondering if this would be a good place to mention that kernel issue @muellerzr, maybe in the "Hanging code and timeout errors" section.

Could you please provide some context about the kernel issue you mentioned?

@BenjaminBossan
Copy link
Member

Could you please provide some context about the kernel issue you mentioned?

Users sometimes report hanging processes when using older Linux kernels, e.g. here #1929. I thought this could fit here but it's not a must have/can be added later.

@MKhalusova
Copy link
Contributor Author

I have addressed the feedback and added:

a) a small section on Early stopping (re: #1940)
b) a small section on Linux kernel issue

Copy link
Member

@BenjaminBossan BenjaminBossan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additions. LGTM.

Copy link
Collaborator

@muellerzr muellerzr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Beautiful! 🤩 Thanks!

@MKhalusova MKhalusova merged commit 2b53a90 into huggingface:main Nov 13, 2023
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants