Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support functools.partial handlers #6

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

Sacrimento
Copy link

This PR introduces functools.partial usage in order to pass arbitrary arguments to handlers.
Every handler type can easily be wrapped.

The usage can be tested as described in the docs

Handlers can now be wrapped by `functools.partial`,
and take arbitrary arguments.
@Sacrimento Sacrimento force-pushed the support_partial_handlers branch from 4becf5a to 74581ba Compare July 8, 2023 07:51
@Rogdham
Copy link
Owner

Rogdham commented Aug 12, 2023

Thank you for the PR! I understand that the usecase is to be able to pass some context from the caller of Parser(…).iter_from(…) into the handler themselves.

At a first glance, I feel like using partials for that usecase is a hack, but I am unsure if it's a good or a bad one.

I took the time to think about this (and have improved a few things like typing on a branch of mine).

My main concern is that this would be cumbersome for nested handlers. For example, it would mean doing something like this:

# before
class Handler:
    sub_handler: SubHandler

# after
class Handler:
    def __init__(self, context):
        self.sub_handler = partial(SubHandler, context)

… and if you have a chain of handlers you would have to basically pass that context from one to the other using partial.


Now, taking a step back, the usecase of passing some data down the call chain is quite frequent, but the usual solutions may not be the best here. Below is what comes to my mind.

1. Use a global variable

Of course this works, but creates concurrency issues if you parse several XML documents in parallel.

2. Use a contextvar

Just like 1, but creates concurrency issues if you parse several XML documents in parallel in the same thread.

3. Pass variables as parameters from caller to callee

In regular code, this is maybe the most obvious choice.

def f(*args):
    ...
    g(*args)
    ...

But it is not possible because bigxml is calling the callee itself, not the user.

However, bigxml could detect a special parameter in the handler signature by its name (say bigxml_context), and automatically pass it in that case with the value that we took in the call to iter_from.

This could look like this:

@xml_handle_element("vehicles", "vehicle")
def handler(node, bigxml_context):
    if int(node.attributes["speed"]) > bigxml_context:
        yield node.text

with open("vehicles.xml", "rb") as stream:
    for vehicule in Parser(stream).iter_from(Handler, bigxml_context=150):
        print(vehicule)

4. Have caller and callee been methods of the same class instance (and store the data as an attribute on the instance)

This works if your handler is a function (by transforming it to a method), but not if it's a class.

class Handler:
    def __init__(self, speed_threshold):
        self.speed_threshold = speed_threshold

    @xml_handle_element("vehicles", "vehicle")
    def handler(self, node):
        if int(node.attributes["speed"]) > self.speed_threshold:
            yield node.text

with open("vehicles.xml", "rb") as stream:
    for vehicule in Parser(stream).iter_from(Handler(150)):
        print(vehicule)

But not being able to use class handlers is quite limiting I think.

5. Define the callee inside the caller so that it can access its scope (and use nonlocal for writing)

This works at the cost of re-defining the handlers each time you parse some XML.

def parse_stream(stream, speed_threshold):
    @xml_handle_element("vehicles", "vehicle")
    def handler(node):
        if int(node.attributes["speed"]) > speed_threshold:
            yield node.text

    yield from Parser(stream).iter_from(handler)

with open("vehicles.xml", "rb") as stream:
    for vehicule in parse_stream(stream, 150):
        print(vehicule)

It probably has some hidden side-effects, like making debugging harder.


All in all, I think the case 5 is the one that I like the most. What do you think?

@Rogdham
Copy link
Owner

Rogdham commented Aug 13, 2023

Another possible answer is to implement some separation of concerns.

Instead of trying to do everything at the same time, let's the handler only extract the relevant pieces of data, and have a post-processing step afterwards deal with what is needed.

This could look like this:

@xml_handle_element("vehicles", "vehicle")
def handler(node):
    yield (node.text, int(node.attributes["speed"]))


def postprocess(items, speed_threshold):
    for vehicule, speed in items:
        if speed > speed_threshold:
            yield vehicule


with open("vehicles.xml", "rb") as stream:
    for vehicule in postprocess(Parser(stream).iter_from(handler), 150):
        print(vehicule)

I feel like this separation of concerns design makes the code easier to understand, but there may be a performance cost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants