Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chapter 5 - Finetune - Disk occupation space problem #194

Open
jupiterMJM opened this issue Oct 24, 2024 · 0 comments
Open

Chapter 5 - Finetune - Disk occupation space problem #194

jupiterMJM opened this issue Oct 24, 2024 · 0 comments

Comments

@jupiterMJM
Copy link

Hi everyone!
I got a little problem when running the following course: https://huggingface.co/learn/audio-course/chapter5/fine-tuning .
I've understand all the command but I run into an error when running the command

common_voice = common_voice.map(
    prepare_dataset, remove_columns=common_voice.column_names["train"], num_proc=1
)

This command aims to transform audio data into log-mel diagramme.
I've got the following error: OSError: [Errno 28] No space left on device
which is quite clear.
After a little investigation, I've noticed the creation of temp file in my folder that were created when I launch this command. Here is what I think happens: the .map function transform every single audio data into the log-mel image and try to store it somewhere (the disk due to the fact that the RAM isn't enough). However, this tempfile can weigh up to several hundred Go !!!!!

Therefore, here is my question:

  • is there a way to change how the transformation into log-mel is done ? Like not to create all log-mel at one time, but more like batches when it is needed ?
  • if not, can someone tell me the ratio space_occupied_after_map_function over weight_of_the_dataset ?

Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant