-
Notifications
You must be signed in to change notification settings - Fork 696
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove CabalParsing
class, specialize to ParsecParser
#10081
Remove CabalParsing
class, specialize to ParsecParser
#10081
Conversation
9c43166
to
4b35d84
Compare
4b35d84
to
b983469
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
This is a MAJOR breaking change. If you do this, you could rip-off whole As an author of |
How much of the benefits could you attain by |
2564616
to
9d83792
Compare
Good point! I've updated the patch to get rid of the
This patch demonstrates how you could begin adding I don't disagree that this would be a breaking change, but I'll leave it up to the Cabal maintainers to determine if the overhead is worth the flexibility in parser implementation. |
Then fix GHC. What you do in this patch is manual optimization. If specialization is fragile, that's GHC bug. If specialization and worker wrapper phase each other, that's also a GHC bug. Many have asked for 100% guaranteed monomorphisable abstraction (like C++ templates and Rust traits). GHC should be able to deliver. In C++ and Rust you can write code at higher abstraction level (i.e. against an interface) with guaranteed optimisation behaviour. GHC should be able to do that too. IMO 55 milliseconds per file is acceptable absolute value (IIRC this value is also skewed by few outilers, like huge files like |
9d83792
to
ad1a0aa
Compare
Remove the `CharParsing` and `Parsing` classes, specialize all parsing logic to `Parsec`. Rename the `Distribution.Parsec.Parsec` class to `CabalParsec` to avoid ambiguity with `Text.Parsec.Parsec`.
ad1a0aa
to
71d9ff1
Compare
No way I am taking a decision by myself here 😂 I am aware that it is a breaking change, but I am not convinced of value of the CharParsing machinery. However, voices against its removal should be listened to. TBH I care more about the two removed modules than the performance improvement (which at 30% is pretty impressive!). |
TBH, not-so-long ago I made an MiniSat experiment, and in some cases it's 100x faster, never slower. EDIT: And at that time I wished that
Performance wise, if GHC was clever enough to specialise, then e.g. using https://hackage.haskell.org/package/flatparse one could get another 20% or even 100% speedup. |
Removing code is good, but given that the code is not bitrotted, we need to ensure it's not used and not likely to be used. Could somebody perhaps ask on Haskell discourse? Then we could add a deprecation note to release notes and selected functions. If no reaction from the users in a couple of release cycles, I think it would be fine to remove. After all, if the mechanism is needed in the future, it can be re-added. |
I guess two questions that come to mind are:
I would expect cabal-the-library to provide me with a way to parse cabal files. I would not expect it to provide me with a way to do so while also swapping out the parser system! |
Yes.
It's thinking too narrowly. One can use e.g. And in fact, the
I don't believe that. When the need arises, the half year (at best) delay and all the social/political debate whether it's actually needed will be too much for not blocking issues. An example, #9098 which is a relatively simple change, took nine month to get released. And there were comments like
which are not encouraging. I didn't proposed that just because it would be fun. I needed (and still need and use) that functionality. |
But good code is not monolithic, but modular. You should be able to swap parts. E.g. |
@FinleyMcIlwaine It seems your benchmark is quite specialised to optimising this problem in particular, on a typical workload then the overhead of cabal file parsing is much smaller I imagine? If users won't notice the difference between one or the other then the flexibility of the current approach seems worthwhile. |
@phadej ok, you make good points and I am changing my mind about this.
The code base is large and (as you prove!) not well understood. Things like CharParsing ought to have a proper clearly defined place, otherwise they get lost in the "noise". Repeating what I said earlier, I care less about the performance impact than how much I care about the maintainability of the codebase. The hierarchy is: class Alternative m => Parsing m
class Parsing m => CharParsing m
class (CharParsing m, MonadPlus m, MonadFail m) => CabalParsing m |
I agree with the points above as well. I'll close this. |
@FinleyMcIlwaine perhaps in comparison the 7%/9% seem little, but I still think it would be worthwhile to |
I can open another PR if you think it'd be worthwhile. My only concern is that the necessary combination of |
Overloaded calls including
CabalParsing
dictionaries add a lot of overhead to the.cabal
file parser. As far as I can tell, the only instance ofCabalParsing
ever written is forParsecParser
. Removing the class and specializing references to it toParsecParser
results in a ~30% reduction in parsing times and a ~20% reduction in parsing allocations. I got these numbers by running:On current HEAD and this branch:
As I mentioned on #10080, I'm not sure about the feasibility of this change, but I am proposing it nonetheless.
Since this essentially obsoletes the
CharParsing
andParsing
classes as well, I've removed those too. So all the parsing logic is completely specialized toParsec
. I also renamed theDistribution.Parsec.Parsec
class toDistribution.Parsec.CabalParsec
to avoid ambiguity withText.Parsec
.