Remove `CabalParsing` class, specialize to `ParsecParser` #10081

FinleyMcIlwaine · 2024-06-07T15:07:53Z

Overloaded calls including CabalParsing dictionaries add a lot of overhead to the .cabal file parser. As far as I can tell, the only instance of CabalParsing ever written is for ParsecParser. Removing the class and specializing references to it to ParsecParser results in a ~30% reduction in parsing times and a ~20% reduction in parsing allocations. I got these numbers by running:

cabal run hackage-tests -- parsec +RTS -s

On current HEAD and this branch:

tag	m/s per file	total allocations
HEAD	0.55	690,340,703,552
this branch	0.38	550,792,331,272

As I mentioned on #10080, I'm not sure about the feasibility of this change, but I am proposing it nonetheless.

Since this essentially obsoletes the CharParsing and Parsing classes as well, I've removed those too. So all the parsing logic is completely specialized to Parsec. I also renamed the Distribution.Parsec.Parsec class to Distribution.Parsec.CabalParsec to avoid ambiguity with Text.Parsec.

Patches conform to the coding conventions.
Changelog

andreabedini

Looks good to me.

phadej · 2024-06-13T14:37:57Z

This is a MAJOR breaking change.

If you do this, you could rip-off whole CharParsing machinery, which exists so people who don't like parsec don't need to use parsec (e.g. they could use megaparsec to get better error messages - that requires work, but currently it's possible).

As an author of CharParsing I think this change, while has performance benefits, is not acceptable.

alt-romes · 2024-06-14T15:50:31Z

How much of the benefits could you attain by SPECIALISEing a lot? Oleg's point is strong.

FinleyMcIlwaine · 2024-06-16T05:02:12Z

If you do this, you could rip-off whole CharParsing machinery

Good point! I've updated the patch to get rid of theCharParsing and Parsing classes as well, which speeds things up even a bit more. I've updated the numbers in the PR description.

How much of the benefits could you attain by SPECIALISEing a lot?

This patch demonstrates how you could begin adding SPECIALIZE pragmas to remove some of the overloading. However, the performance improvements aren't as good (~7% reduction in allocations and ~9% reduction in parse times. See my blog post) and the specialization is somewhat fragile, since in some cases we need to delay inlining to ensure the rewrite rules can fire before worker wrapper.

I don't disagree that this would be a breaking change, but I'll leave it up to the Cabal maintainers to determine if the overhead is worth the flexibility in parser implementation.

phadej · 2024-06-16T14:36:58Z

This patch demonstrates how you could begin adding SPECIALIZE pragmas to remove some of the overloading. However, the performance improvements aren't as good ... and the specialization is somewhat fragile, since in some cases we need to delay inlining to ensure the rewrite rules can fire before worker wrapper.

Then fix GHC.

What you do in this patch is manual optimization. If specialization is fragile, that's GHC bug. If specialization and worker wrapper phase each other, that's also a GHC bug.

Many have asked for 100% guaranteed monomorphisable abstraction (like C++ templates and Rust traits). GHC should be able to deliver. In C++ and Rust you can write code at higher abstraction level (i.e. against an interface) with guaranteed optimisation behaviour. GHC should be able to do that too.

IMO 55 milliseconds per file is acceptable absolute value (IIRC this value is also skewed by few outilers, like huge files like acme-everything; median would be better). If there is an abstraction cost, it's fine. Hopefully GHC will do better in the future.

Remove the `CharParsing` and `Parsing` classes, specialize all parsing logic to `Parsec`. Rename the `Distribution.Parsec.Parsec` class to `CabalParsec` to avoid ambiguity with `Text.Parsec.Parsec`.

andreabedini · 2024-06-17T07:03:15Z

No way I am taking a decision by myself here 😂

I am aware that it is a breaking change, but I am not convinced of value of the CharParsing machinery. However, voices against its removal should be listened to. TBH I care more about the two removed modules than the performance improvement (which at 30% is pretty impressive!).

geekosaur · 2024-06-17T07:43:28Z

I have to say I don't agree with @phadej. It's not on us to support people who want to rewire part of cabal internals. Or should we also revert #9282 because someone might want to replace the solver? I don't see a difference here.

phadej · 2024-06-17T08:46:39Z

should we also revert #9282 because someone might want to replace the solver? I

TBH, not-so-long ago I made an MiniSat experiment, and in some cases it's 100x faster, never slower. EDIT: And at that time I wished that cabal-install had a dummy&greedy solver which would pick the first package version satisfying the constraints. That would make feeding cabal-install external solutions (be it my solver results, ot Stackage snapshot) by constuction very fast. I'm sad to learn that flexibility is cut, so I won't hope of adding greedy-solver.

It's not on us to support people who want to rewire part of cabal internals.

Cabal is a library. The more use-cases it supports, the objectively better it is. I do think that the Cabal and cabal-install should in perfect world be split into separate repositories. It feels that many people (incl. some maintainers) do think about Cabal only from a narrow cabal-install perspective.

Performance wise, if GHC was clever enough to specialise, then e.g. using https://hackage.haskell.org/package/flatparse one could get another 20% or even 100% speedup.

Mikolaj · 2024-06-17T09:04:47Z

Removing code is good, but given that the code is not bitrotted, we need to ensure it's not used and not likely to be used. Could somebody perhaps ask on Haskell discourse? Then we could add a deprecation note to release notes and selected functions. If no reaction from the users in a couple of release cycles, I think it would be fine to remove. After all, if the mechanism is needed in the future, it can be re-added.

michaelpj · 2024-06-17T09:35:15Z

I guess two questions that come to mind are:

Are there any current (or even historical) users of this flexiblity? @phadej did you actually use it somewhere?
Are there any other tools that provide this kind of ability to swap out the parsing system used by its parser?

I would expect cabal-the-library to provide me with a way to parse cabal files. I would not expect it to provide me with a way to do so while also swapping out the parser system!

phadej · 2024-06-17T10:19:12Z

@michaelpj

@phadej did you actually use it somewhere?

Yes.

Are there any other tools that provide this kind of ability to swap out the parsing system used by its parser?

It's thinking too narrowly. One can use e.g. megaparsec to parse build-depends like structure. Or use CabalParsing combinators (like parsecCommaNonEmpty) to build parsers for *.cabal like syntax, but not .cabal files. EDIT: or not necessarily even .cabal like files. Parsing package names, version ranges etc. is common task for many tools.

And in fact, the cabal-install does that. And cabal-install could swap out of parsec (as it's not constrained by GHC boot lib requirement), for something faster and with better error reporting. Not that parsing cabal.project files is a bottleneck, or that parsec errors are terrible; but both can be improved.

After all, if the mechanism is needed in the future, it can be re-added.

I don't believe that. When the need arises, the half year (at best) delay and all the social/political debate whether it's actually needed will be too much for not blocking issues.

An example, #9098 which is a relatively simple change, took nine month to get released. And there were comments like

I don't see a reason to introduce such a warning.

which are not encouraging. I didn't proposed that just because it would be fun. I needed (and still need and use) that functionality.

phadej · 2024-06-17T10:36:31Z

I would expect cabal-the-library to provide me with a way to parse cabal files. I would not expect it to provide me with a way to do so while also swapping out the parser system!

But good code is not monolithic, but modular. You should be able to swap parts. E.g. cabal-fields can be used to parse the files outer structure, and then use the rest of Cabal provided functionality to parse the "inner structure" (i.e. fields), and maybe there you want to use something else than parsec too. However, I would not advice anyone to re-implement e.g. version range parsing, you get it wrong. The old issues like commercialhaskell/stack#3345 were the motivation for CharParsing machinery. Make the code as general as possible, so people would have as little excuses to not use Cabal(-syntax) as possible.

mpickering · 2024-06-17T13:19:25Z

@FinleyMcIlwaine It seems your benchmark is quite specialised to optimising this problem in particular, on a typical workload then the overhead of cabal file parsing is much smaller I imagine?

If users won't notice the difference between one or the other then the flexibility of the current approach seems worthwhile.

andreabedini · 2024-06-17T15:34:59Z

@phadej ok, you make good points and I am changing my mind about this.

It feels that many people (incl. some maintainers) do think about Cabal only from a narrow cabal-install perspective.
...
But good code is not monolithic, but modular.

The code base is large and (as you prove!) not well understood. Things like CharParsing ought to have a proper clearly defined place, otherwise they get lost in the "noise". Distribution.Compat.CharParsing didn't really scream the importance you are highlighting now.

Repeating what I said earlier, I care less about the performance impact than how much I care about the maintainability of the codebase.

The hierarchy is:

class Alternative m => Parsing m
class Parsing m => CharParsing m
class (CharParsing m, MonadPlus m, MonadFail m) => CabalParsing m

FinleyMcIlwaine · 2024-06-17T15:39:14Z

I agree with the points above as well. I'll close this.

alt-romes · 2024-06-17T15:40:56Z

@FinleyMcIlwaine perhaps in comparison the 7%/9% seem little, but I still think it would be worthwhile to SPECIALISE in a best-effort basis the parser to parsec for those 7-9%.

FinleyMcIlwaine · 2024-06-17T15:49:45Z

I can open another PR if you think it'd be worthwhile. My only concern is that the necessary combination of SPECIALIZE and a phase controlled INLINE pragmas might be a bit fragile and unnecessarily complex for such a small absolute performance boost.

FinleyMcIlwaine force-pushed the finley/step-x-rebased branch 2 times, most recently from 9c43166 to 4b35d84 Compare June 7, 2024 15:56

FinleyMcIlwaine mentioned this pull request Jun 7, 2024

Is the CabalParsing class necessary? #10080

Closed

FinleyMcIlwaine force-pushed the finley/step-x-rebased branch from 4b35d84 to b983469 Compare June 7, 2024 16:12

andreabedini approved these changes Jun 9, 2024

View reviewed changes

FinleyMcIlwaine force-pushed the finley/step-x-rebased branch 2 times, most recently from 2564616 to 9d83792 Compare June 16, 2024 05:01

FinleyMcIlwaine requested a review from andreabedini June 16, 2024 05:03

FinleyMcIlwaine force-pushed the finley/step-x-rebased branch from 9d83792 to ad1a0aa Compare June 17, 2024 04:22

Remove CabalParsing class, specialize to ParsecParser

71d9ff1

Remove the `CharParsing` and `Parsing` classes, specialize all parsing logic to `Parsec`. Rename the `Distribution.Parsec.Parsec` class to `CabalParsec` to avoid ambiguity with `Text.Parsec.Parsec`.

FinleyMcIlwaine force-pushed the finley/step-x-rebased branch from ad1a0aa to 71d9ff1 Compare June 17, 2024 04:54

andreabedini requested review from Mikolaj, ulysses4ever and Kleidukos June 17, 2024 06:38

Mikolaj added type: discussion Cabal: parser labels Jun 17, 2024

FinleyMcIlwaine closed this Jun 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove `CabalParsing` class, specialize to `ParsecParser` #10081

Remove `CabalParsing` class, specialize to `ParsecParser` #10081

FinleyMcIlwaine commented Jun 7, 2024 •

edited

Loading

andreabedini left a comment

phadej commented Jun 13, 2024

alt-romes commented Jun 14, 2024

FinleyMcIlwaine commented Jun 16, 2024

phadej commented Jun 16, 2024

andreabedini commented Jun 17, 2024

geekosaur commented Jun 17, 2024

phadej commented Jun 17, 2024 •

edited

Loading

Mikolaj commented Jun 17, 2024 •

edited

Loading

michaelpj commented Jun 17, 2024

phadej commented Jun 17, 2024 •

edited

Loading

phadej commented Jun 17, 2024 •

edited

Loading

mpickering commented Jun 17, 2024

andreabedini commented Jun 17, 2024

FinleyMcIlwaine commented Jun 17, 2024

alt-romes commented Jun 17, 2024

FinleyMcIlwaine commented Jun 17, 2024

Remove CabalParsing class, specialize to ParsecParser #10081

Remove CabalParsing class, specialize to ParsecParser #10081

Conversation

FinleyMcIlwaine commented Jun 7, 2024 • edited Loading

andreabedini left a comment

Choose a reason for hiding this comment

phadej commented Jun 13, 2024

alt-romes commented Jun 14, 2024

FinleyMcIlwaine commented Jun 16, 2024

phadej commented Jun 16, 2024

andreabedini commented Jun 17, 2024

geekosaur commented Jun 17, 2024

phadej commented Jun 17, 2024 • edited Loading

Mikolaj commented Jun 17, 2024 • edited Loading

michaelpj commented Jun 17, 2024

phadej commented Jun 17, 2024 • edited Loading

phadej commented Jun 17, 2024 • edited Loading

mpickering commented Jun 17, 2024

andreabedini commented Jun 17, 2024

FinleyMcIlwaine commented Jun 17, 2024

alt-romes commented Jun 17, 2024

FinleyMcIlwaine commented Jun 17, 2024

Remove `CabalParsing` class, specialize to `ParsecParser` #10081

Remove `CabalParsing` class, specialize to `ParsecParser` #10081

FinleyMcIlwaine commented Jun 7, 2024 •

edited

Loading

phadej commented Jun 17, 2024 •

edited

Loading

Mikolaj commented Jun 17, 2024 •

edited

Loading

phadej commented Jun 17, 2024 •

edited

Loading

phadej commented Jun 17, 2024 •

edited

Loading