Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: has_operation #1807

Draft
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

camriddell
Copy link
Contributor

@camriddell camriddell commented Jan 13, 2025

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below

Review Items

  • Does this conflict with Narwhals internal stable/version API?
  • Fix the typing here (specified return type is a shallow proxy to the real behavior)

Wanted to get feedback for an idea I had to implement has_operation. The current API implements the following changes

  • custom metaproperty decorator that behaves similarly to property (only the getter syntax other functionality would need to be implemented or would need to inherit from property itself).
    • enables chained syntax to move from an Expression/Series to its constituent namespaces without requiring an instance
    • nw.Expr.dt.date is valid syntax.
  • add has_operation to check if a given function passed using the above syntax is available for any given backend.

What would be nice for this feature would be if these components were more readily accessible from the narwhals namespace, dynamic imports based on internalized mappings can become tricky to maintain (see narwhals/utils:has_operation locals.backend_mapping) Instead these expressions could be added to the Implementation enum providing more informative metadata (such as import location lookups as well as namespace import names).


Feedback on the concept as a whole or the specific implementation is welcome! If @MarcoGorelli likes this approach, I do believe some additional work is required here to get this working correctly with the internal versioning interface so guidance in this spot would be appreciated.

Here are a few examples to show what this PR contains

from narwhals import has_operation
import narwhals as nw
import polars as pl
import pandas as pd
import pyarrow as pa
import duckdb
import dask.dataframe as dd


print(
    f"{has_operation(pl, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(pl, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(pl, nw.Series.mean)                 =  }",  #  True
    f"{has_operation(pl, nw.Series.str.to_uppercase)     =  }",  #  True
    f"{has_operation(pl, nw.DataFrame.explode)           =  }",  #  True
    f"{has_operation(pl, nw.LazyFrame.drop_nulls)        =  }",  #  True
    "",
    f"{has_operation(pd, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(pd, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(pd, nw.Series.mean)                 =  }",  #  True
    f"{has_operation(pd, nw.Series.str.to_uppercase)     =  }",  #  True
    f"{has_operation(pd, nw.DataFrame.explode)           =  }",  #  True
    f"{has_operation(pd, nw.LazyFrame.drop_nulls)        =  }",  #  False
    "",
    f"{has_operation(pa, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(pa, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(pa, nw.Series.mean)                 =  }",  #  True
    f"{has_operation(pa, nw.Series.str.to_uppercase)     =  }",  #  True
    f"{has_operation(pa, nw.DataFrame.explode)           =  }",  #  False
    f"{has_operation(pa, nw.LazyFrame.drop_nulls)        =  }",  #  False
    "",
    f"{has_operation(duckdb, nw.Expr.mean)               =  }",  #  True
    f"{has_operation(duckdb, nw.Expr.dt.date)            =  }",  #  True
    f"{has_operation(duckdb, nw.Series.mean)             =  }",  #  False
    f"{has_operation(duckdb, nw.Series.str.to_uppercase) =  }",  #  False
    f"{has_operation(duckdb, nw.DataFrame.explode)       =  }",  #  False
    f"{has_operation(duckdb, nw.LazyFrame.drop_nulls)    =  }",  #  False
    "",
    f"{has_operation(dd, nw.Expr.mean)                   =  }",  #  True
    f"{has_operation(dd, nw.Expr.dt.date)                =  }",  #  True
    f"{has_operation(dd, nw.Series.mean)                 =  }",  #  False
    f"{has_operation(dd, nw.Series.str.to_uppercase)     =  }",  #  False
    f"{has_operation(dd, nw.DataFrame.explode)           =  }",  #  False
    f"{has_operation(dd, nw.LazyFrame.drop_nulls)        =  }",  #  True
    sep='\n'
)
has_operation(pl, nw.Expr.mean)                   =  True
has_operation(pl, nw.Expr.dt.date)                =  True
has_operation(pl, nw.Series.mean)                 =  True
has_operation(pl, nw.Series.str.to_uppercase)     =  True
has_operation(pl, nw.DataFrame.explode)           =  True
has_operation(pl, nw.LazyFrame.drop_nulls)        =  True

has_operation(pd, nw.Expr.mean)                   =  True
has_operation(pd, nw.Expr.dt.date)                =  True
has_operation(pd, nw.Series.mean)                 =  True
has_operation(pd, nw.Series.str.to_uppercase)     =  True
has_operation(pd, nw.DataFrame.explode)           =  True
has_operation(pd, nw.LazyFrame.drop_nulls)        =  False

has_operation(pa, nw.Expr.mean)                   =  True
has_operation(pa, nw.Expr.dt.date)                =  True
has_operation(pa, nw.Series.mean)                 =  True
has_operation(pa, nw.Series.str.to_uppercase)     =  True
has_operation(pa, nw.DataFrame.explode)           =  False
has_operation(pa, nw.LazyFrame.drop_nulls)        =  False

has_operation(duckdb, nw.Expr.mean)               =  True
has_operation(duckdb, nw.Expr.dt.date)            =  True
has_operation(duckdb, nw.Series.mean)             =  False
has_operation(duckdb, nw.Series.str.to_uppercase) =  False
has_operation(duckdb, nw.DataFrame.explode)       =  False
has_operation(duckdb, nw.LazyFrame.drop_nulls)    =  False

has_operation(dd, nw.Expr.mean)                   =  True
has_operation(dd, nw.Expr.dt.date)                =  True
has_operation(dd, nw.Series.mean)                 =  False
has_operation(dd, nw.Series.str.to_uppercase)     =  False
has_operation(dd, nw.DataFrame.explode)           =  False
has_operation(dd, nw.LazyFrame.drop_nulls)        =  True

@camriddell camriddell changed the title Feat/has operation feat: has_operation Jan 13, 2025
@MarcoGorelli
Copy link
Member

thanks Cam! i'm out teaching this week so have very limited time, but i'll take a look soon!

@MarcoGorelli
Copy link
Member

thanks for doing this!

would we need to remove blanket notimplementederrors like

def join_asof(
self: Self,
other: Self,
*,
left_on: str | None,
right_on: str | None,
by_left: list[str] | None,
by_right: list[str] | None,
strategy: Literal["backward", "forward", "nearest"],
suffix: str,
) -> Self:
msg = "join_asof is not yet supported on PyArrow tables" # pragma: no cover
raise NotImplementedError(msg)

for this to be accurate?

@camriddell
Copy link
Contributor Author

thanks for doing this!

would we need to remove blanket notimplementederrors like

def join_asof(
self: Self,
other: Self,
*,
left_on: str | None,
right_on: str | None,
by_left: list[str] | None,
by_right: list[str] | None,
strategy: Literal["backward", "forward", "nearest"],
suffix: str,
) -> Self:
msg = "join_asof is not yet supported on PyArrow tables" # pragma: no cover
raise NotImplementedError(msg)

for this to be accurate?

This is correct, though we may be able to detect this type of blanket NotImplementedErrors through the AST- though there will be some large edges that will appear

β‘  AST

from inspect import getsource
from textwrap import dedent
from ast import parse, NodeVisitor

from narwhals._arrow.dataframe import ArrowDataFrame

class Visitor(NodeVisitor):
    def __init__(self):
        self.has_notimplemented = False
        self.has_return = False
        super().__init__()

    def visit_Return(self, node):
        self.has_return = True
    def visit_Raise(self, node):
        if node.exc.func.id == 'NotImplementedError':
            self.has_notimplemented = True


source = dedent(getsource(ArrowDataFrame.join_asof))
tree = parse(source)
v = Visitor()
v.visit(tree)

print(f'{v.has_notimplemented = }')
print(f'{v.has_return         = }')

if v.has_notimplemented and not v.has_return:
    print('This method is likely not implemented')

# v.has_notimplemented = True
# v.has_return         = False
# This method is likely not implemented

β‘‘ Not Implemented Decorator (store on method)

Something more reliable may be to implement a NotImplemented decorator that attaches this metadata either to the method itself or registers it in some globally accessible place for later lookups.

from functools import wraps

def not_implemented(msg):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            raise NotImplementedError(msg)
        wrapper.not_implemented = True # store info on the method itself
        return wrapper
    return decorator


class T:
    @not_implemented('Not available in ...')
    def f(self):
        pass

    def g(self):
        pass

t = T()
# t.f()

print(
    f"{hasattr(T.f, 'not_implemented') = }",
    f"{hasattr(T.g, 'not_implemented') = }",
    sep='\n'
)


T.f()
hasattr(T.f, 'not_implemented') = True
hasattr(T.g, 'not_implemented') = False

Traceback (most recent call last):
  File "/home/cameron/.vim-excerpt", line 32, in <module>
    T.f()
  File "/home/cameron/.vim-excerpt", line 8, in wrapper
    raise NotImplementedError(msg)
NotImplementedError: Not available in ...

β‘’ Not Implemented Decorator (store in registry)

from functools import wraps

def not_implemented(msg):
    def decorator(func):
        @wraps(func)
        def wrapper(*args, **kwargs):
            raise NotImplementedError(msg)

        not_implemented.registry.add(wrapper)
        return wrapper

    if not hasattr(not_implemented, 'registry'):
        not_implemented.registry = set()
    return decorator


class T:
    @not_implemented('Not available in ...')
    def f(self):
        pass

    def g(self):
        pass

t = T()

print(
    f'{not_implemented.registry        = }',
    f'{T.f in not_implemented.registry = }',
    f'{T.g in not_implemented.registry = }',
    sep='\n',
    end='\n'*2
)

t.f()
not_implemented.registry        = {<function T.f at 0x76976cc3a200>}
T.f in not_implemented.registry = True
T.g in not_implemented.registry = False

Traceback (most recent call last):
  File "/home/cameron/.vim-excerpt", line 36, in <module>
    t.f()
  File "/home/cameron/.vim-excerpt", line 8, in wrapper
    raise NotImplementedError(msg)
NotImplementedError: Not available in ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants