Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add some SparkLikeLazyFrame methods #1633

Merged
merged 7 commits into from
Dec 31, 2024

Conversation

FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Dec 20, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Should we start tracking pyspark methods?

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

It adds:

  • drop_nulls
  • rename
  • unique
  • join

And fixes sort

TODO

There is one test failing, because of how I implemented join to coalesce columns. The test is test_left_join_overlapping_column which tries to map a column called d to one called antananarivo, which already exist, and would try to rename such one to antananarivo_right. Yet fails to do so

@@ -173,16 +176,87 @@ def sort(

flat_by = flatten([*flatten([by]), *more_by])
if isinstance(descending, bool):
descending = [descending]
descending = [descending] * len(flat_by)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This took me some mental sanity to figure out. I was like "does pyspark do not know how sorting works?" for a good 5 minutes

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cool, thanks!

for tests, maybe after this one we should start adding it to constructor and xfailing things?

narwhals/_spark_like/dataframe.py Outdated Show resolved Hide resolved
@FBruzzesi
Copy link
Member Author

for tests, maybe after this one we should start adding it to constructor and xfailing things?

We have a lot of expressions that would fail πŸ™ƒ I would say to wait a little bit more for now

@FBruzzesi
Copy link
Member Author

I am not sure if the failing test should be considered an edge case, or it is worth to create a temporary name mapping layer/step to have it run successfully πŸ€” It would definitly add a fair bit of complexity

@MarcoGorelli
Copy link
Member

thanks! i think a loud error like

E               pyspark.errors.exceptions.captured.AnalysisException: [COLUMN_ALREADY_EXISTS] The column `antananarivo_right` already exists. Consider to choose another name or rename the existing column.

is fine, i think it's ok to assert that for pyspark it raises

Copy link
Member

@MarcoGorelli MarcoGorelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks @FBruzzesi !

@MarcoGorelli MarcoGorelli added the enhancement New feature or request label Dec 31, 2024
@MarcoGorelli MarcoGorelli merged commit f1baf90 into main Dec 31, 2024
24 checks passed
@MarcoGorelli MarcoGorelli deleted the feat/spark-like-frame-methods branch December 31, 2024 09:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants