Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add how to use multiple streams in loaders in yaml #831

Closed
wants to merge 6 commits into from

Conversation

cli99
Copy link
Contributor

@cli99 cli99 commented Jan 2, 2024

Users asks for an example to use multiple streams (https://github.com/mosaicml/streaming?tab=readme-ov-file#seamless-data-mixing) through yaml file.

@cli99 cli99 requested a review from dakinggg January 2, 2024 23:37
@cli99 cli99 enabled auto-merge (squash) January 3, 2024 17:57
@cli99 cli99 requested review from vchiley and mvpatel2000 January 4, 2024 17:23
1. [FAQ: How many GPUs do I need to train a LLM?](#howmandygpus)
1. [FAQ: Optimizing Performance](#optimizingperformance)
- [LLM Pretraining and Finetuning](#llm-pretraining-and-finetuning)
- [Table of Contents](#table-of-contents)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we need the self reference to the table of contents here


```yaml
train_loader:
name: finetuning
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the finetuning dataloader doesn't actually support streams. not for any technical reason, just haven't update it, but this example should be in the pretraining data section and use the text dataloader

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cli99 mind finishing this PR real quick?

@dakinggg
Copy link
Collaborator

dakinggg commented Apr 4, 2024

Done in another PR

@dakinggg dakinggg closed this Apr 4, 2024
auto-merge was automatically disabled April 4, 2024 21:35

Pull request was closed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants