You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:
When I follow the course and try to execute
# use librosa to get example's duration from the audio file
new_column = [librosa.get_duration(path=x) for x in minds["path"]]
it will fail because the path, or x in the code snippet, looks something like /storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav (which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).
Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the 'path' value in the dataset with the local path on my machine?
Alternatively, one could change the function call of librosa.get_duration(path=x) and pass the audio array and the sampling_rate instead, e.g.
new_column = [librosa.get_duration(y=x["array"], sr=x["sampling_rate"]) for x in minds["audio"]]
The text was updated successfully, but these errors were encountered:
Hey
I am not sure what the expected behaviour is and whether its my mistake, an error in the course or of the utilised dataset but I noticed the following in Chapter 1 - Preprocessing:
When I follow the course and try to execute
it will fail because the path, or
x
in the code snippet, looks something like/storage/hf-datasets-cache/all/datasets/27907695716030-config-parquet-and-info-PolyAI-minds14-941a5af2/downloads/extracted/a87e442545495cdb67dfdcbc9d4f35d234c9f8e471449b2db58d7c81b62f001a/en-AU~PAY_BILL/response_4.wav
(which is the exact content as provided by the unmodified dataset as can be seen on the datasets page but does not exist on my machine).Do I use the load_dataset function in a wrong way? Do I have to specify a path to explicitly save or cache the data somewhere? Is there a way that will automatically replace the
'path'
value in the dataset with the local path on my machine?Alternatively, one could change the function call of
librosa.get_duration(path=x)
and pass the audio array and the sampling_rate instead, e.g.The text was updated successfully, but these errors were encountered: