-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add some SparkLikeLazyFrame
methods
#1633
Conversation
@@ -173,16 +176,87 @@ def sort( | |||
|
|||
flat_by = flatten([*flatten([by]), *more_by]) | |||
if isinstance(descending, bool): | |||
descending = [descending] | |||
descending = [descending] * len(flat_by) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This took me some mental sanity to figure out. I was like "does pyspark do not know how sorting works?" for a good 5 minutes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cool, thanks!
for tests, maybe after this one we should start adding it to constructor
and xfailing things?
We have a lot of expressions that would fail π I would say to wait a little bit more for now |
I am not sure if the failing test should be considered an edge case, or it is worth to create a temporary name mapping layer/step to have it run successfully π€ It would definitly add a fair bit of complexity |
thanks! i think a loud error like
is fine, i think it's ok to assert that for pyspark it raises |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @FBruzzesi !
What type of PR is this? (check all applicable)
Related issues
Should we start tracking pyspark methods?
Checklist
If you have comments or can explain your changes, please do so below
It adds:
drop_nulls
rename
unique
join
And fixes
sort
TODO
There is one test failing, because of how I implemented join to coalesce columns. The test is
test_left_join_overlapping_column
which tries to map a column calledd
to one calledantananarivo
, which already exist, and would try to rename such one toantananarivo_right
. Yet fails to do so