Skip to content

Commit

Permalink
fix null byte error
Browse files Browse the repository at this point in the history
  • Loading branch information
Andrew Lapp committed May 15, 2024
1 parent 392288e commit d2066ad
Showing 1 changed file with 6 additions and 0 deletions.
6 changes: 6 additions & 0 deletions outlines/fsm/regex.py
Original file line number Diff line number Diff line change
Expand Up @@ -825,6 +825,12 @@ def reduced_vocabulary(
)
)
for token_tuple, token_ids in vocabulary.items():
# numpy doesn't track null bytes in arrays
# np.fromiter('\x00' ...) results in an empty string [""]
# https://github.com/numpy/numpy/issues/26275
if token_tuple == "\x00":
continue

token_tuple_np = np.fromiter(token_tuple, dtype=np.dtype("U2"))
token_ids_np = np.fromiter(token_ids, dtype=np.dtype("int64"))
vocabulary_nb.append((token_tuple_np, token_ids_np))
Expand Down

0 comments on commit d2066ad

Please sign in to comment.