-
Notifications
You must be signed in to change notification settings - Fork 534
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add how to use multiple streams in loaders in yaml #831
Conversation
1. [FAQ: How many GPUs do I need to train a LLM?](#howmandygpus) | ||
1. [FAQ: Optimizing Performance](#optimizingperformance) | ||
- [LLM Pretraining and Finetuning](#llm-pretraining-and-finetuning) | ||
- [Table of Contents](#table-of-contents) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the self reference to the table of contents here
|
||
```yaml | ||
train_loader: | ||
name: finetuning |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the finetuning dataloader doesn't actually support streams. not for any technical reason, just haven't update it, but this example should be in the pretraining data section and use the text
dataloader
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cli99 mind finishing this PR real quick?
Done in another PR |
Pull request was closed
Users asks for an example to use multiple streams (https://github.com/mosaicml/streaming?tab=readme-ov-file#seamless-data-mixing) through yaml file.