Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(MLOP-1985): optional params #347

Merged
merged 4 commits into from
Nov 13, 2023
Merged

feat(MLOP-1985): optional params #347

merged 4 commits into from
Nov 13, 2023

Conversation

ralphrass
Copy link
Contributor

Why? 📖

The eager_evaluation param forces Spark to apply the currently mapped changes to a DataFrame. When this parameter is set to False, Spark follows its standard behavior of lazy evaluation. Lazy evaluation can improve Spark's performance as it allows Spark to build the best version of the execution plan.
The row_deduplication is not applicable to all scenarios. For instance, when using Delta operations, row deduplication can be done through merge operations.

What? 🔧

  • Add new arguments to enable/disable eager evaluation and row deduplication in Source and FeatureSet classes.

Type of change

Please delete options that are not relevant.

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)
  • This change requires a documentation update
  • Release

How everything was tested? 📏

  • Databricks notebooks

@ralphrass ralphrass self-assigned this Nov 8, 2023
@ralphrass ralphrass requested a review from a team as a code owner November 8, 2023 19:14
@ralphrass ralphrass changed the title feat: optional params feat(MLOP-1985): optional params Nov 8, 2023
"""

def __init__(self, readers: List[Reader], query: str) -> None:
def __init__(
self, readers: List[Reader], query: str = "", eager_evaluation: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why did you add this empty string as default value?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

"""

def __init__(self, readers: List[Reader], query: str) -> None:
def __init__(
self, readers: List[Reader], query: str = "", eager_evaluation: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

"""

def __init__(self, readers: List[Reader], query: str) -> None:
def __init__(
self, readers: List[Reader], query: str = "", eager_evaluation: bool = True,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the docs, we have: "Lazy evaluation can improve Spark's performance as it allows Spark to build the best version of the execution plan.". Looks like that False as default is better, or am I wrong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or is it True to maintain some compatibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, it's defaulted True so we can turn it off gradually in our pipelines.

Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@ralphrass ralphrass merged commit 9bcca0e into staging Nov 13, 2023
5 checks passed
ralphrass added a commit that referenced this pull request Nov 14, 2023
ralphrass added a commit that referenced this pull request Nov 14, 2023
* feat(MLOP-1985): optional params (#347)

---------
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants