You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Oct 31, 2022. It is now read-only.
I m trying to encode a 1.7g txt file for training purposes. After starting the encode process from cmd I could see in task manager resources being drained but after ~30m everything went back to idle while the console output has not moved from reading files 0%. From what i can tell i have gpu working too with cudart64_101.dll loading.
System spec:
Gtx 970
i5-8400
8G ram+nvme ssd
Pls help cause scrapping this much was hard
Later Edit:
2nd try produced this error eventually
Traceback (most recent call last):
File "encode.py", line 31, in
main()
File "encode.py", line 25, in main
chunks = load_dataset(enc, args.in_text, args.combine, encoding=args.encoding)
File "C:_stash\openAI\gpt-2\src\load_dataset.py", line 35, in load_dataset
tokens = np.stack(enc.encode(raw_text))
File "C:_stash\openAI\gpt-2\src\encoder.py", line 100, in encode
bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
MemoryError
Later Later Edit:
encoding the folder containing individual text files w/o merging them into a single file worked fine
The text was updated successfully, but these errors were encountered:
I m trying to encode a 1.7g txt file for training purposes. After starting the encode process from cmd I could see in task manager resources being drained but after ~30m everything went back to idle while the console output has not moved from reading files 0%. From what i can tell i have gpu working too with cudart64_101.dll loading.
System spec:
Gtx 970
i5-8400
8G ram+nvme ssd
Pls help cause scrapping this much was hard
Later Edit:
2nd try produced this error eventually
Traceback (most recent call last):
File "encode.py", line 31, in
main()
File "encode.py", line 25, in main
chunks = load_dataset(enc, args.in_text, args.combine, encoding=args.encoding)
File "C:_stash\openAI\gpt-2\src\load_dataset.py", line 35, in load_dataset
tokens = np.stack(enc.encode(raw_text))
File "C:_stash\openAI\gpt-2\src\encoder.py", line 100, in encode
bpe_tokens.extend(self.encoder[bpe_token] for bpe_token in self.bpe(token).split(' '))
MemoryError
Later Later Edit:
encoding the folder containing individual text files w/o merging them into a single file worked fine
The text was updated successfully, but these errors were encountered: