Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Map.unionWith is over specialized and not consistent with intersectionWith #984

Open
theobat opened this issue Jan 27, 2024 · 10 comments
Open

Comments

@theobat
Copy link

theobat commented Jan 27, 2024

Hi, I think specializing the Map.unionWith operation to the case where both operands are of the same type Map k a is not a good design.

I think it should be :

unionWith :: (These a b -> c) -> Map k a -> Map k b -> Map k c 

Or, for a less opinionated (but also, worse) signature it could be

unionWith :: (Maybe a -> Maybe b -> c) -> Map k a -> Map k b -> Map k c 

Now I understand this would introduce a pretty significant breaking change which is bad, but there are various options for this, such as deprecating unionWith or renaming it to introduce the proper one ?

@jwaldmann
Copy link
Contributor

jwaldmann commented Jan 27, 2024

  • Please no. This would break a lot of (my) code.
  • Can you give a concrete example where you need this, and argue that you can't use https://hackage.haskell.org/package/containers-0.7/docs/Data-Map-Merge-Strict.html#v:merge
  • Performance: with identical value types (in arguments and result, as it is for current unionWith), consider the case that keys of arguments are from disjoint intervals (e.g., all keys in left arg < all keys in right arg). Then implementation might not need to do a complete traversal. For your suggested "proper" type, it must?

Example for last item: run this as ghci script

import qualified Data.Map.Strict as M

a = M.fromList $ take (10^7) $ zip [   0 :: Int ..] $ repeat ()
b = M.fromList $ take (10^7) $ zip [   1 :: Int ..] $ repeat ()
c = M.fromList $ take (10^7) $ zip [10^7 :: Int ..] $ repeat ()

length $ M.elems a
length $ M.elems b
length $ M.elems c

:set +s

length $ M.elems $ M.union a b -- overlapping

length $ M.elems $ M.union a c -- disjoint

I am seeing

10000001 -- overlapping
(0.92 secs, 1,340,080,824 bytes)
20000000 -- disjoint
(0.28 secs, 1,120,118,728 bytes)

@theobat
Copy link
Author

theobat commented Jan 28, 2024

Please no. This would break a lot of (my) code.

  • It's not like it's likely to be approved, and my message is about the "design", I also mentioned that a breaking change like that is not desirable. But backward compatibility has nothing to do with finding the correct design and the correct API. And once there's a consensus about what "should" be, you can find a path towards the change that keeps everyone happy, or do nothing if the status quo is the consensus.

Can you give a concrete example where you need this, and argue that you can't use https://hackage.haskell.org/package/containers-0.7/docs/Data-Map-Merge-Strict.html#v:merge

  • I mean of course no, this is a function able to perform pretty much any union, difference or intersection that you could think of. I use this function indeed. But my point is about the correct API, and the fact that unionWith signature has several implications which are far from obvious :
    • It only runs the "with" function on the intersection of values from both operands : why ? Because its type signature force this without any motivation or reason.
    • It enforces a stable union, which is a very narrow use case if you compare to all the unions that you can do, and I've never used the unionWith function as is stands today, whereas I've needed the union of distinct maps several times.

Besides, this is the whole point, I'm not comfortable with the fact that in order to have a values modifying union, you have to use 5 lines/words of a complex function instead of a single one. It's simply not a good design of the API for me.

Performance: with identical value types (in arguments and result, as it is for current unionWith), consider the case that keys of arguments are from disjoint intervals (e.g., all keys in left arg < all keys in right arg). Then implementation might not need to do a complete traversal. For your suggested "proper" type, it must?

I'm sympathetic to the performance argument, but this is also something that should not be traded for a very narrow API.
And running unionWith on disjoint key sets as it stands today is a noop, because the "with" function will not be used. Use union instead. Unless of course you want to transform the values, in case you should use the correct API for unions which let you do just that.

@jwaldmann
Copy link
Contributor

performance ... should not be traded for ...

Some programmers may prefer "the" most general API; some applications might require best possible performance.

unionWith on disjoint key sets as it stands today is a noop

no it's not, because the implementation does not know that inputs are disjoint. I wrote "disjoint intervals" because that's a case that I can imagine is easier to detect (log instead of linear). Even after that, it requires work, in re-balancing (but log work, not linear).

My concern is that a generic implementation needs to traverse everything (so, linear again). This can be worked around by reified transformation functions (e.g., preserveMissing) but not with the general type that you suggest?

@theobat
Copy link
Author

theobat commented Jan 29, 2024

Some programmers may prefer "the" most general API; some applications might require best possible performance.

Fine so you agree that one is not intrinsically above the other, so at the very least, both functions should exist. The fact that the "with" function only run on the intersection of maps is very narrow indeed and not documented.
And it seems I'm not the only one puzzled by the current API.
It's very awkward that the optimized narrow function is simple and exposed whereas the general and intuitive one is not. I say intuitive because one cannot say in good faith that a union with a combination function F only running on the intersection is intuitive, I mean it's simply not in the name !

My concern is that a generic implementation needs to traverse everything (so, linear again). This can be worked around by reified transformation functions (e.g., preserveMissing) but not with the general type that you suggest?

Then let's have two functions... ? What's the problem with that ? I expect people will say that it's redundant with the merge, but honestly everything in the API is redundant with the merge function at some point.

@konsumlamm
Copy link
Contributor

I say intuitive because one cannot say in good faith that a union with a combination function F only running on the intersection is intuitive, I mean it's simply not in the name !

I find it very intuitive, what else would the combining function be used for?

@theobat
Copy link
Author

theobat commented Jan 29, 2024

@konsumlamm, ok so, if you had to guess without a type signature what the type signature would be you'd expect union with, to operate the combination function only on the intersection case ? Do you really expect me to believe that ?
Of course once you have the type signature and you think about it for 3 seconds it becomes obivous that it can't be anything else which has nothing to do with it being intuitive...

@jwaldmann
Copy link
Contributor

basic set "theory": in union a b, the two sets a, b might be disjoint, or not. I expect a combing function for union to handle the non-disjoint case, because - what else would it do?

@theobat
Copy link
Author

theobat commented Jan 29, 2024

I made it clear in the first post what it should handle : all the cases, which is not the case here (and this has nothing to do with disjoint or not by the way, excluding performance aspects).

@konsumlamm
Copy link
Contributor

@konsumlamm, ok so, if you had to guess without a type signature what the type signature would be you'd expect union with, to operate the combination function only on the intersection case ? Do you really expect me to believe that ?

Yes, that's exactly what I would expect. I don't care if you believe me, why should I lie to you?

@konsumlamm
Copy link
Contributor

For the record, I'm not saying that I find a function like you're proposing unreasonable. In fact, I think it would be a good addition, although the fact that there is no These type in base or containers (or any other core library) makes it less appealing. I just don't agree that the current unionWith is bad design or unintuitive.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants