Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove CabalParsing class, specialize to ParsecParser #10081

Closed

Conversation

FinleyMcIlwaine
Copy link
Contributor

@FinleyMcIlwaine FinleyMcIlwaine commented Jun 7, 2024

Overloaded calls including CabalParsing dictionaries add a lot of overhead to the .cabal file parser. As far as I can tell, the only instance of CabalParsing ever written is for ParsecParser. Removing the class and specializing references to it to ParsecParser results in a ~30% reduction in parsing times and a ~20% reduction in parsing allocations. I got these numbers by running:

cabal run hackage-tests -- parsec +RTS -s

On current HEAD and this branch:

tag m/s per file total allocations
HEAD 0.55 690,340,703,552
this branch 0.38 550,792,331,272

As I mentioned on #10080, I'm not sure about the feasibility of this change, but I am proposing it nonetheless.

Since this essentially obsoletes the CharParsing and Parsing classes as well, I've removed those too. So all the parsing logic is completely specialized to Parsec. I also renamed the Distribution.Parsec.Parsec class to Distribution.Parsec.CabalParsec to avoid ambiguity with Text.Parsec.

@FinleyMcIlwaine FinleyMcIlwaine force-pushed the finley/step-x-rebased branch 2 times, most recently from 9c43166 to 4b35d84 Compare June 7, 2024 15:56
@FinleyMcIlwaine FinleyMcIlwaine force-pushed the finley/step-x-rebased branch from 4b35d84 to b983469 Compare June 7, 2024 16:12
Copy link
Collaborator

@andreabedini andreabedini left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@phadej
Copy link
Collaborator

phadej commented Jun 13, 2024

This is a MAJOR breaking change.

If you do this, you could rip-off whole CharParsing machinery, which exists so people who don't like parsec don't need to use parsec (e.g. they could use megaparsec to get better error messages - that requires work, but currently it's possible).

As an author of CharParsing I think this change, while has performance benefits, is not acceptable.

@alt-romes
Copy link
Collaborator

How much of the benefits could you attain by SPECIALISEing a lot? Oleg's point is strong.

@FinleyMcIlwaine FinleyMcIlwaine force-pushed the finley/step-x-rebased branch 2 times, most recently from 2564616 to 9d83792 Compare June 16, 2024 05:01
@FinleyMcIlwaine
Copy link
Contributor Author

If you do this, you could rip-off whole CharParsing machinery

Good point! I've updated the patch to get rid of theCharParsing and Parsing classes as well, which speeds things up even a bit more. I've updated the numbers in the PR description.

How much of the benefits could you attain by SPECIALISEing a lot?

This patch demonstrates how you could begin adding SPECIALIZE pragmas to remove some of the overloading. However, the performance improvements aren't as good (~7% reduction in allocations and ~9% reduction in parse times. See my blog post) and the specialization is somewhat fragile, since in some cases we need to delay inlining to ensure the rewrite rules can fire before worker wrapper.

I don't disagree that this would be a breaking change, but I'll leave it up to the Cabal maintainers to determine if the overhead is worth the flexibility in parser implementation.

@phadej
Copy link
Collaborator

phadej commented Jun 16, 2024

This patch demonstrates how you could begin adding SPECIALIZE pragmas to remove some of the overloading. However, the performance improvements aren't as good ... and the specialization is somewhat fragile, since in some cases we need to delay inlining to ensure the rewrite rules can fire before worker wrapper.

Then fix GHC.

What you do in this patch is manual optimization. If specialization is fragile, that's GHC bug. If specialization and worker wrapper phase each other, that's also a GHC bug.

Many have asked for 100% guaranteed monomorphisable abstraction (like C++ templates and Rust traits). GHC should be able to deliver. In C++ and Rust you can write code at higher abstraction level (i.e. against an interface) with guaranteed optimisation behaviour. GHC should be able to do that too.


IMO 55 milliseconds per file is acceptable absolute value (IIRC this value is also skewed by few outilers, like huge files like acme-everything; median would be better). If there is an abstraction cost, it's fine. Hopefully GHC will do better in the future.

@FinleyMcIlwaine FinleyMcIlwaine force-pushed the finley/step-x-rebased branch from 9d83792 to ad1a0aa Compare June 17, 2024 04:22
Remove the `CharParsing` and `Parsing` classes, specialize all parsing logic to
`Parsec`. Rename the `Distribution.Parsec.Parsec` class to `CabalParsec` to
avoid ambiguity with `Text.Parsec.Parsec`.
@andreabedini
Copy link
Collaborator

No way I am taking a decision by myself here 😂

I am aware that it is a breaking change, but I am not convinced of value of the CharParsing machinery. However, voices against its removal should be listened to. TBH I care more about the two removed modules than the performance improvement (which at 30% is pretty impressive!).

@geekosaur
Copy link
Collaborator

I have to say I don't agree with @phadej. It's not on us to support people who want to rewire part of cabal internals. Or should we also revert #9282 because someone might want to replace the solver? I don't see a difference here.

@phadej
Copy link
Collaborator

phadej commented Jun 17, 2024

should we also revert #9282 because someone might want to replace the solver? I

TBH, not-so-long ago I made an MiniSat experiment, and in some cases it's 100x faster, never slower. EDIT: And at that time I wished that cabal-install had a dummy&greedy solver which would pick the first package version satisfying the constraints. That would make feeding cabal-install external solutions (be it my solver results, ot Stackage snapshot) by constuction very fast. I'm sad to learn that flexibility is cut, so I won't hope of adding greedy-solver.


It's not on us to support people who want to rewire part of cabal internals.

Cabal is a library. The more use-cases it supports, the objectively better it is. I do think that the Cabal and cabal-install should in perfect world be split into separate repositories. It feels that many people (incl. some maintainers) do think about Cabal only from a narrow cabal-install perspective.


Performance wise, if GHC was clever enough to specialise, then e.g. using https://hackage.haskell.org/package/flatparse one could get another 20% or even 100% speedup.

@Mikolaj
Copy link
Member

Mikolaj commented Jun 17, 2024

Removing code is good, but given that the code is not bitrotted, we need to ensure it's not used and not likely to be used. Could somebody perhaps ask on Haskell discourse? Then we could add a deprecation note to release notes and selected functions. If no reaction from the users in a couple of release cycles, I think it would be fine to remove. After all, if the mechanism is needed in the future, it can be re-added.

@michaelpj
Copy link
Collaborator

I guess two questions that come to mind are:

  1. Are there any current (or even historical) users of this flexiblity? @phadej did you actually use it somewhere?
  2. Are there any other tools that provide this kind of ability to swap out the parsing system used by its parser?

I would expect cabal-the-library to provide me with a way to parse cabal files. I would not expect it to provide me with a way to do so while also swapping out the parser system!

@phadej
Copy link
Collaborator

phadej commented Jun 17, 2024

@michaelpj

@phadej did you actually use it somewhere?

Yes.

Are there any other tools that provide this kind of ability to swap out the parsing system used by its parser?

It's thinking too narrowly. One can use e.g. megaparsec to parse build-depends like structure. Or use CabalParsing combinators (like parsecCommaNonEmpty) to build parsers for *.cabal like syntax, but not .cabal files. EDIT: or not necessarily even .cabal like files. Parsing package names, version ranges etc. is common task for many tools.

And in fact, the cabal-install does that. And cabal-install could swap out of parsec (as it's not constrained by GHC boot lib requirement), for something faster and with better error reporting. Not that parsing cabal.project files is a bottleneck, or that parsec errors are terrible; but both can be improved.


After all, if the mechanism is needed in the future, it can be re-added.

I don't believe that. When the need arises, the half year (at best) delay and all the social/political debate whether it's actually needed will be too much for not blocking issues.

An example, #9098 which is a relatively simple change, took nine month to get released. And there were comments like

I don't see a reason to introduce such a warning.

which are not encouraging. I didn't proposed that just because it would be fun. I needed (and still need and use) that functionality.

@phadej
Copy link
Collaborator

phadej commented Jun 17, 2024

I would expect cabal-the-library to provide me with a way to parse cabal files. I would not expect it to provide me with a way to do so while also swapping out the parser system!

But good code is not monolithic, but modular. You should be able to swap parts. E.g. cabal-fields can be used to parse the files outer structure, and then use the rest of Cabal provided functionality to parse the "inner structure" (i.e. fields), and maybe there you want to use something else than parsec too. However, I would not advice anyone to re-implement e.g. version range parsing, you get it wrong. The old issues like commercialhaskell/stack#3345 were the motivation for CharParsing machinery. Make the code as general as possible, so people would have as little excuses to not use Cabal(-syntax) as possible.

@mpickering
Copy link
Collaborator

@FinleyMcIlwaine It seems your benchmark is quite specialised to optimising this problem in particular, on a typical workload then the overhead of cabal file parsing is much smaller I imagine?

If users won't notice the difference between one or the other then the flexibility of the current approach seems worthwhile.

@andreabedini
Copy link
Collaborator

@phadej ok, you make good points and I am changing my mind about this.

It feels that many people (incl. some maintainers) do think about Cabal only from a narrow cabal-install perspective.
...
But good code is not monolithic, but modular.

The code base is large and (as you prove!) not well understood. Things like CharParsing ought to have a proper clearly defined place, otherwise they get lost in the "noise". Distribution.Compat.CharParsing didn't really scream the importance you are highlighting now.

Repeating what I said earlier, I care less about the performance impact than how much I care about the maintainability of the codebase.

The hierarchy is:

class Alternative m => Parsing m
class Parsing m => CharParsing m
class (CharParsing m, MonadPlus m, MonadFail m) => CabalParsing m

@FinleyMcIlwaine
Copy link
Contributor Author

I agree with the points above as well. I'll close this.

@alt-romes
Copy link
Collaborator

@FinleyMcIlwaine perhaps in comparison the 7%/9% seem little, but I still think it would be worthwhile to SPECIALISE in a best-effort basis the parser to parsec for those 7-9%.

@FinleyMcIlwaine
Copy link
Contributor Author

I can open another PR if you think it'd be worthwhile. My only concern is that the necessary combination of SPECIALIZE and a phase controlled INLINE pragmas might be a bit fragile and unnecessarily complex for such a small absolute performance boost.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants