feat: add some `SparkLikeLazyFrame` methods #1633

FBruzzesi · 2024-12-20T21:29:28Z

What type of PR is this? (check all applicable)

Related issues

Should we start tracking pyspark methods?

Checklist

Code follows style guide (ruff)
Tests added
Documented the changes

If you have comments or can explain your changes, please do so below

It adds:

drop_nulls
rename
unique
join

And fixes sort

TODO

There is one test failing, because of how I implemented join to coalesce columns. The test is test_left_join_overlapping_column which tries to map a column called d to one called antananarivo, which already exist, and would try to rename such one to antananarivo_right. Yet fails to do so

FBruzzesi · 2024-12-20T21:32:17Z

narwhals/_spark_like/dataframe.py

@@ -173,16 +176,87 @@ def sort(

        flat_by = flatten([*flatten([by]), *more_by])
        if isinstance(descending, bool):
-            descending = [descending]
+            descending = [descending] * len(flat_by)


This took me some mental sanity to figure out. I was like "does pyspark do not know how sorting works?" for a good 5 minutes

tests/spark_like_test.py

narwhals/_spark_like/dataframe.py

MarcoGorelli

cool, thanks!

for tests, maybe after this one we should start adding it to constructor and xfailing things?

narwhals/_spark_like/dataframe.py

FBruzzesi · 2024-12-21T07:59:34Z

for tests, maybe after this one we should start adding it to constructor and xfailing things?

We have a lot of expressions that would fail 🙃 I would say to wait a little bit more for now

FBruzzesi · 2024-12-26T18:43:41Z

I am not sure if the failing test should be considered an edge case, or it is worth to create a temporary name mapping layer/step to have it run successfully 🤔 It would definitly add a fair bit of complexity

MarcoGorelli · 2024-12-28T16:00:12Z

thanks! i think a loud error like

E               pyspark.errors.exceptions.captured.AnalysisException: [COLUMN_ALREADY_EXISTS] The column `antananarivo_right` already exists. Consider to choose another name or rename the existing column.

is fine, i think it's ok to assert that for pyspark it raises

narwhals/_spark_like/dataframe.py

MarcoGorelli

thanks @FBruzzesi !

feat: add some spark-like frame methods

d2d299c

FBruzzesi commented Dec 20, 2024

View reviewed changes

tests/spark_like_test.py Show resolved Hide resolved

MarcoGorelli reviewed Dec 20, 2024

View reviewed changes

narwhals/_spark_like/dataframe.py Outdated Show resolved Hide resolved

MarcoGorelli reviewed Dec 20, 2024

View reviewed changes

narwhals/_spark_like/dataframe.py Outdated Show resolved Hide resolved

FBruzzesi added 3 commits December 21, 2024 09:03

rm with_row_index

d319487

Merge branch 'main' into feat/spark-like-frame-methods

c62d2aa

rm maintain_order warning

812203c

FBruzzesi added 3 commits December 30, 2024 15:30

raise for left join overlapping columns

889948d

rename via select

240d1fc

use create mapping in rename method

09bb0ae

FBruzzesi commented Dec 30, 2024

View reviewed changes

narwhals/_spark_like/dataframe.py Show resolved Hide resolved

MarcoGorelli approved these changes Dec 31, 2024

View reviewed changes

MarcoGorelli added the enhancement New feature or request label Dec 31, 2024

MarcoGorelli merged commit f1baf90 into main Dec 31, 2024
24 checks passed

MarcoGorelli deleted the feat/spark-like-frame-methods branch December 31, 2024 09:24

lucas-nelson-uiuc mentioned this pull request Dec 31, 2024

feat: add SparkLike* methods to satisfy DataFrame Tutorial #1693

Open

10 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: add some `SparkLikeLazyFrame` methods #1633

feat: add some `SparkLikeLazyFrame` methods #1633

FBruzzesi commented Dec 20, 2024 •

edited

Loading

FBruzzesi Dec 20, 2024

MarcoGorelli left a comment

FBruzzesi commented Dec 21, 2024

FBruzzesi commented Dec 26, 2024

MarcoGorelli commented Dec 28, 2024

MarcoGorelli left a comment

feat: add some SparkLikeLazyFrame methods #1633

feat: add some SparkLikeLazyFrame methods #1633

Conversation

FBruzzesi commented Dec 20, 2024 • edited Loading

What type of PR is this? (check all applicable)

Related issues

Checklist

If you have comments or can explain your changes, please do so below

TODO

FBruzzesi Dec 20, 2024

Choose a reason for hiding this comment

MarcoGorelli left a comment

Choose a reason for hiding this comment

FBruzzesi commented Dec 21, 2024

FBruzzesi commented Dec 26, 2024

MarcoGorelli commented Dec 28, 2024

MarcoGorelli left a comment

Choose a reason for hiding this comment

feat: add some `SparkLikeLazyFrame` methods #1633

feat: add some `SparkLikeLazyFrame` methods #1633

FBruzzesi commented Dec 20, 2024 •

edited

Loading