Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature/Addition] parsons.databases.database.DatabaseCreateStatement.detect_data_type() should detect more data types than str, float, int #942

Open
austinweisgrau opened this issue Dec 4, 2023 · 7 comments
Labels
enhancement Impact - something should be added to or changed about Parsons that isn't causing a current breakage medium priority Priority - this doesn't need to be addressed immediately, but will broadly impact Parsons users

Comments

@austinweisgrau
Copy link
Collaborator

parsons.databases.database.DatabaseCreateStatement.detect_data_type() is only able to currently detect data types str, float, and int. As a result, downstream methods (e.g. parsons.databases.redshift.rs_create_table.RedshiftCreateTable.data_type()) that generate DDL statements to create database tables are only able to infer data types for database tables for these 3 types.

It should be possible to infer types for more data types, including datetimes, dates, and booleans.

Detailed Description

Context

When using parsons.Redshift.copy() to upload a parsons Table into a Redshift database, parsons should be able to infer data types for most column types and pass those into Redshift accurately. However, the current implementation only accomplishes this for string, float, and integer types. All other types are coerced into strings. There should be enough information available in a parsons table to detect and accurately represent other types, including datetimes, dates, and booleans.

Possible Implementation

postgres, mysql, and redshift database connectors all use the underlying detect_data_type() method, so whatever change is made here will need to be compatible with all 3 downstream connectors and SQL generators.

Priority

@austinweisgrau austinweisgrau added the enhancement Impact - something should be added to or changed about Parsons that isn't causing a current breakage label Dec 4, 2023
@Jason94
Copy link
Collaborator

Jason94 commented Dec 4, 2023

Yes, I can confirm this is really annoying for boolean columns. I have to use bool_col = 'True' as a workaround.

@austinweisgrau
Copy link
Collaborator Author

Figured out that parsing BOOLs works if parsons.databases.database.constants.DO_PARSE_BOOLS is set to True. There doesn't seem to be any documentation about why this defaults to False or anything.

@austinweisgrau
Copy link
Collaborator Author

austinweisgrau commented Dec 4, 2023

The feature was added 2 years ago with 766cfae and it seems to have been turned off by default for "reverse compatibility," which doesn't really make sense to me 🤔

@Jason94
Copy link
Collaborator

Jason94 commented Dec 5, 2023

Oh dear, they couldn't have even made that an argument to the function or something? This would definitely be a breaking change, but a really good one... Maybe we can target flipping the default for the next major release? (I know Shauna is cutting one right now.)

@austinweisgrau
Copy link
Collaborator Author

Ya I made the PR #943 on the major-release branch

@Jason94
Copy link
Collaborator

Jason94 commented Dec 11, 2023

Are we ready to close this issue, @austinweisgrau ?

@austinweisgrau
Copy link
Collaborator Author

I think it would still be nice to have support for loading datetime and date type objects to databases where possible!

@shaunagm shaunagm added the medium priority Priority - this doesn't need to be addressed immediately, but will broadly impact Parsons users label Dec 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Impact - something should be added to or changed about Parsons that isn't causing a current breakage medium priority Priority - this doesn't need to be addressed immediately, but will broadly impact Parsons users
Projects
None yet
Development

No branches or pull requests

3 participants