Add how to use multiple streams in loaders in yaml #831

cli99 · 2024-01-02T23:37:05Z

Users asks for an example to use multiple streams (https://github.com/mosaicml/streaming?tab=readme-ov-file#seamless-data-mixing) through yaml file.

dakinggg · 2024-01-04T18:43:52Z

scripts/train/README.md

-1. [FAQ: How many GPUs do I need to train a LLM?](#howmandygpus)
-1. [FAQ: Optimizing Performance](#optimizingperformance)
+- [LLM Pretraining and Finetuning](#llm-pretraining-and-finetuning)
+      - [Table of Contents](#table-of-contents)


I don't think we need the self reference to the table of contents here

dakinggg · 2024-01-04T18:45:18Z

scripts/train/README.md

+
+```yaml
+train_loader:
+    name: finetuning


the finetuning dataloader doesn't actually support streams. not for any technical reason, just haven't update it, but this example should be in the pretraining data section and use the text dataloader

@cli99 mind finishing this PR real quick?

dakinggg · 2024-04-04T21:35:58Z

Done in another PR

cli99 added 2 commits January 2, 2024 15:29

add how to use multiple streams in loaders in yaml

c7b4c8f

fix format

98165b9

cli99 requested a review from dakinggg January 2, 2024 23:37

Merge branch 'main' into update-readme-mds

be5c2c3

cli99 requested review from jacobfulano and abhi-mosaic January 3, 2024 17:57

cli99 enabled auto-merge (squash) January 3, 2024 17:57

cli99 requested review from vchiley and mvpatel2000 January 4, 2024 17:23

cli99 and others added 2 commits January 4, 2024 09:23

Merge branch 'main' into update-readme-mds

11166ab

Merge branch 'main' into update-readme-mds

a53364a

dakinggg requested changes Jan 4, 2024

View reviewed changes

Merge branch 'main' into update-readme-mds

d8502da

dakinggg closed this Apr 4, 2024

auto-merge was automatically disabled April 4, 2024 21:35
Pull request was closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add how to use multiple streams in loaders in yaml #831

Add how to use multiple streams in loaders in yaml #831

cli99 commented Jan 2, 2024

dakinggg Jan 4, 2024

dakinggg Jan 4, 2024

dakinggg Feb 2, 2024

dakinggg commented Apr 4, 2024

Add how to use multiple streams in loaders in yaml #831

Add how to use multiple streams in loaders in yaml #831

Conversation

cli99 commented Jan 2, 2024

dakinggg Jan 4, 2024

Choose a reason for hiding this comment

dakinggg Jan 4, 2024

Choose a reason for hiding this comment

dakinggg Feb 2, 2024

Choose a reason for hiding this comment

dakinggg commented Apr 4, 2024