-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Handle exception for invalid column names containing . #102
base: master
Are you sure you want to change the base?
Conversation
""" | ||
:param column: column to which constraint is to be applied | ||
""" | ||
if "." in column: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is quite strict, perhaps simply enclosing the field with ticks will do, like f"`column.name`"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@herminio-iovio passing through ticks also does not work and returns the same error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use dot with column name in Spark/SQL is quite legitimate, basically to fetch values from fields that are nested. Deequ/PyDeequ as a library does not seem to be the right place to handle this since it's hard to guess customer's intent. For example, this might break some existing use-cases where people actually want to profile/check nested column types.
I think it's up to the caller or other higher level library / service to make the call (e.g. escape with "`" or rename)
hi @chenliu0831 thanks for the review |
@sidharthbolar right I think I understand the purpose, but column with dot is a valid usage for structured column (JSON type). Do you mind add a test to verify if that works? Looks like you actually want a better Pythonic error (than the vanilla Spark error) when such column with dot does not exist? If so, how about add a helper to check column existence instead? |
Issue #, if available:
#99
Description of changes:
Columns with dot in their name are throwing an Analysis Exception
Based on analysis done , the exception is being thrown by the core scala deeque library
Have proposed to handle this exception to notify the caller that a column name with "." should not be passed
the exception handled is added to all check methods and implemented via a static method
Local Tests were passing and have added an additional UT test_invalidColumnException to verify the behaviour
*version = "1.0.1"
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.