Replies: 2 comments 2 replies
-
To_pandas materializes the whole table. You should project on the pyarrow_dataset and then collect. |
Beta Was this translation helpful? Give feedback.
-
this looks a bit too verbose :)
|
Beta Was this translation helpful? Give feedback.
-
I am using this read a list of files from a delta table, but based on memory usage, it seems the reader is loading the whole table first ? is there a better way to read only a specific column?
existing_files = DeltaTable(Destination,storage_options=storage_options).to_pandas(columns=["file"])['file'].unique().tolist()
Beta Was this translation helpful? Give feedback.
All reactions