Read files with polars #244

Andilun · 2023-09-01T10:45:14Z

Changes

The PseudoData.from_file function will now use polars when reading parquet formatted files. This should improve performance when reading parquet.

This resolves issue#227

Reason for not using polars for 'csv' files

I have opted not to use Polars for reading CSV files because it would necessitate us to handle a convertion between Polars 'dtypes' and Pandas 'dtype'.

Currently, all the file formats we support reading can either have data types specified with the dtype keyword argument or derive the schema from the file. We should avoid requiring users to define data types differently for CSV files.

dtypes vs dtype

Pandas.read_csv dtypes:dtype or dict of {Hashabledtype}
Polars.read_csv dtype: Mapping[str, PolarsDataType] | Sequence[PolarsDataType] | None = None

QOL changes:

Add example of how to ovveride service url and token

src/dapla_pseudo/v1/supported_file_format.py

mmwinther

Looks good 👍

Andilun added 2 commits September 1, 2023 11:53

Read csv and parquet with polars

64db6b0

Add example of how to ovveride service url and token

73a4841

Andilun requested a review from a team as a code owner September 1, 2023 10:45

Andilun added 2 commits September 1, 2023 14:34

Revert to reading csv with pandas and fix tests

ceb63f6

Fix typing issue for PARQUET enum

ffb286d

bjornandre reviewed Sep 1, 2023

View reviewed changes

src/dapla_pseudo/v1/supported_file_format.py Outdated Show resolved Hide resolved

mmwinther reviewed Sep 4, 2023

View reviewed changes

src/dapla_pseudo/v1/supported_file_format.py Outdated Show resolved Hide resolved

Map filetypes to their corresponding reader functions

3b91a44

mmwinther approved these changes Sep 13, 2023

View reviewed changes

Andilun merged commit db8db61 into main Sep 13, 2023
13 checks passed

Andilun deleted the read-files-with-polars branch September 13, 2023 11:13

Andilun mentioned this pull request Sep 13, 2023

Use polars for file IO where possible #227

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read files with polars #244

Read files with polars #244

Andilun commented Sep 1, 2023 •

edited

Loading

mmwinther left a comment

Read files with polars #244

Read files with polars #244

Conversation

Andilun commented Sep 1, 2023 • edited Loading

Changes

Reason for not using polars for 'csv' files

dtypes vs dtype

QOL changes:

mmwinther left a comment

Choose a reason for hiding this comment

Andilun commented Sep 1, 2023 •

edited

Loading