Replies: 1 comment 3 replies
-
You can use def gen_from_pa_dataset(pa_dataset):
for pa_batch in pa_dataset.to_batches():
yield from pa_batch.to_pylist()
ds = Dataset.from_generator(gen_from_pa_dataset, gen_kwargs={"pa_dataset": train_ds}) But this use case is too specific to have a dedicated packaged builder, at least for now. |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi!
Is it possible to infer the label from the directory structure for tabular datasets with many files (in either parquet or csv format)?
I see that it's possible for audio and image datasets, but I haven't managed to do the same for tabular ones.
Also, I tried with
pyarrow
tabular datasets and it works, but then of course you don't have the automatictrain
,test
,validation
identification:Thanks a lot!
Beta Was this translation helpful? Give feedback.
All reactions