Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Read files with polars #244

Merged
merged 5 commits into from
Sep 13, 2023
Merged

Read files with polars #244

merged 5 commits into from
Sep 13, 2023

Conversation

Andilun
Copy link
Contributor

@Andilun Andilun commented Sep 1, 2023

Changes

The PseudoData.from_file function will now use polars when reading parquet formatted files. This should improve performance when reading parquet.

This resolves issue#227

Reason for not using polars for 'csv' files

I have opted not to use Polars for reading CSV files because it would necessitate us to handle a convertion between Polars 'dtypes' and Pandas 'dtype'.

Currently, all the file formats we support reading can either have data types specified with the dtype keyword argument or derive the schema from the file. We should avoid requiring users to define data types differently for CSV files.

dtypes vs dtype

Pandas.read_csv dtypes:dtype or dict of {Hashabledtype}
Polars.read_csv dtype: Mapping[str, PolarsDataType] | Sequence[PolarsDataType] | None = None

QOL changes:

@Andilun Andilun requested a review from a team as a code owner September 1, 2023 10:45
Copy link
Member

@mmwinther mmwinther left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good 👍

@Andilun Andilun merged commit db8db61 into main Sep 13, 2023
13 checks passed
@Andilun Andilun deleted the read-files-with-polars branch September 13, 2023 11:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Use polars for file IO where possible
3 participants