You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Implement a centralized, reusable utility or module for handling binary file operations which can be used across modules. This utility should:
Standardize reading and writing headers, binary data, and indices.
Support modular integration with existing components like EmbeddedStreamData and others.
Reduce code duplication while improving readability and maintainability.
Motivation
Currently, there is duplicated code for reading and writing binary files across multiple modules and functions, including:
EmbeddedStreamData
PackedDataGenerator
LargeFileLinesReader
shuffle_tokenized_data()
This redundancy increases maintenance overhead and the risk of inconsistencies. For example, reading headers, writing index data, and handling binary streams are repeated in different forms, leading to potential bugs and inefficiencies.
The text was updated successfully, but these errors were encountered:
Feature request
Implement a centralized, reusable utility or module for handling binary file operations which can be used across modules. This utility should:
EmbeddedStreamData
and others.Motivation
Currently, there is duplicated code for reading and writing binary files across multiple modules and functions, including:
EmbeddedStreamData
PackedDataGenerator
LargeFileLinesReader
shuffle_tokenized_data()
This redundancy increases maintenance overhead and the risk of inconsistencies. For example, reading headers, writing index data, and handling binary streams are repeated in different forms, leading to potential bugs and inefficiencies.
The text was updated successfully, but these errors were encountered: