Skip to content

Commit

Permalink
Check for EOS
Browse files Browse the repository at this point in the history
  • Loading branch information
manuelburger committed Jun 30, 2024
1 parent ac4e377 commit dcaf79b
Showing 1 changed file with 3 additions and 0 deletions.
3 changes: 3 additions & 0 deletions src/nanotron/data/petagraph_dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -266,6 +266,9 @@ def generate(self):
if current_tokens is None:
current_tokens = new_tokens
else:
# Check the last token of the current sequence
# is an EOS token
assert current_tokens[-1] == self._eos_token_id
current_tokens = np.concatenate([current_tokens, new_tokens])

if len(current_tokens) >= self.maxlen:
Expand Down

0 comments on commit dcaf79b

Please sign in to comment.