Skip to content

Commit

Permalink
Update comment
Browse files Browse the repository at this point in the history
  • Loading branch information
irenedea committed May 25, 2024
1 parent a6d54b7 commit e1afb46
Showing 1 changed file with 1 addition and 1 deletion.
2 changes: 1 addition & 1 deletion scripts/data_prep/convert_text_to_mds.py
Original file line number Diff line number Diff line change
Expand Up @@ -94,7 +94,7 @@ def __iter__(self) -> Iterable[Dict[str, bytes]]:
# Add the EOS token to the buffer to separate files.
buffer += self.eos_tokens

# Finish up the last of the tokens.
# Yield any remaining samples of size max_length.
while len(buffer) >= self.max_length:
concat_sample = buffer[:self.max_length]
buffer = buffer[self.max_length:] if self.should_wrap else []
Expand Down

0 comments on commit e1afb46

Please sign in to comment.