Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add custom serialization check to assert_eq #331

Merged
merged 5 commits into from
Oct 17, 2023

Conversation

rjzamora
Copy link
Member

Closes #323

This PR illustrates one possible way to test that we are not including Expr object instances in the result of Expr.dask. Note that this change helped me find and fix another bug related to #323 (groupby Var).

I didn't think very hard about this solution, so suggestions are welcome.

Note that the tests take about 15% longer on my laptop with this change.

@mrocklin
Copy link
Member

Note that the tests take about 15% longer on my laptop with this change

Would you want to see this merged in with this cost difference? This seems significant to me, and this check seems useful, but not super-important. Do you have thoughts here?

@rjzamora
Copy link
Member Author

Would you want to see this merged in with this cost difference? This seems significant to me, and this check seems useful, but not super-important. Do you have thoughts here?

This pytest runtime regression is the main reason the PR is marked as draft. I would be much happier if we were adding the equivalent of a few tests rather than ~40 new tests. It may make sense to "opt in" to the serialization test in a few places, but I'm still unsure.

@mrocklin
Copy link
Member

Unfortunately I think that you probably want ~40 new tests because you want this tested for every expression today, and also in the future. The assert_eq style checks have historically been quite good at ensuring this kind of hygeine throughout the project (today if you make a new method in dask.dataframe and have badly named keys for example, you get an informative error, even though you didn't write a new test for your new method on this topic). Ideally though we find some way to make these sorts of checks very cheap to run (this is sometimes hard). Maybe it's not possible here though. I don't know.

@phofl
Copy link
Collaborator

phofl commented Oct 12, 2023

How long is the test suite running on your machine? It take 12 seconds on mine, so 15% doesn't really matter at this point

@rjzamora
Copy link
Member Author

Unfortunately I think that you probably want ~40 new tests because you want this tested for every expression today, and also in the future.

This is probably true.

How long is the test suite running on your machine? It take 12 seconds on mine, so 15% doesn't really matter at this point

It's also <20s on mine, so I agree that a couple seconds is not a big deal. I think the consideration here is more-so: "Are we okay with all new tests taking about 15% longer from here on out?" The answer may just be "yes".

dask_expr/tests/_util.py Outdated Show resolved Hide resolved
@rjzamora rjzamora marked this pull request as ready for review October 12, 2023 19:25
@phofl
Copy link
Collaborator

phofl commented Oct 17, 2023

I'd be ok with merging this

@rjzamora
Copy link
Member Author

I'd be ok with merging this

I also agree that the small overhead is worth the mistakes it helps us avoid :)

@phofl phofl merged commit 802a65a into dask:main Oct 17, 2023
4 checks passed
@phofl
Copy link
Collaborator

phofl commented Oct 17, 2023

thx @rjzamora

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Expr instances are serialized as part of the graph
4 participants