Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: make atto parser faster than the scala parser combinator one #586

Merged
merged 1 commit into from
Nov 15, 2024

Conversation

mbaechler
Copy link
Collaborator

For some reason, I'd rather replace the scala parser combinator implementation with something else.

I read that the atto parser is not the default one because it's slower than the current one ( #468 (comment) )

This PR fixes some correctness issues and makes it faster than the current one, it may be enough to allow the switch?

Here are my numbers regarding performances

attoParser before                                10-35 2,4,6 * ? * *  avgt   20  69.297 ± 0.223  us/op
attoParser after                                 10-35 2,4,6 * ? * *  avgt   10  43.819 ± 2.469  us/op
parserCombinators                                10-35 2,4,6 * ? * *  avgt   20  43.239 ± 0.148  us/op

attoParser before  * 5,10,15,20,25,30,35,40,45,50,55/2 * ? * mon-fri  avgt   20  82.511 ± 0.298  us/op
attoParser after   * 5,10,15,20,25,30,35,40,45,50,55/2 * ? * mon-fri  avgt   10  45.966 ± 0.409  us/op
parserCombinators  * 5,10,15,20,25,30,35,40,45,50,55/2 * ? * mon-fri  avgt   20  60.507 ± 0.212  us/op

attoParser before                                    10-65 * * * * *  avgt   20  13.337 ± 0.052  us/op
attoParser after                                     10-65 * * * * *  avgt   10   9.366 ± 0.053  us/op
parserCombinators                                    10-65 * * * * *  avgt   20  26.571 ± 0.117  us/op

attoParser before                            * */10 5-10 ? * mon-fri  avgt   20  62.982 ± 0.262  us/op
attoParser after                             * */10 5-10 ? * mon-fri  avgt   10  32.665 ± 0.265  us/op
parserCombinators                            * */10 5-10 ? * mon-fri  avgt   20  42.597 ± 0.214  us/op

attoParser before     */30 10,20,40 5-15,25-35/4 ? 1,3,7,oct-dec sun  avgt   20  38.053 ± 0.194  us/op
attoParser after      */30 10,20,40 5-15,25-35/4 ? 1,3,7,oct-dec sun  avgt   10  26.602 ± 0.175  us/op
parserCombinators     */30 10,20,40 5-15,25-35/4 ? 1,3,7,oct-dec sun  avgt   20  47.154 ± 0.181  us/op

private val sexagesimal: Parser[Int] = int.filter(x => x >= 0 && x < 60)
private val decimal: Parser[Int] = int.filter(x => x >= 0)
private val literal: Parser[String] = takeWhile1(_ >= ' ')
private def oneOrTwoDigitsPositiveInt: Parser[Int] = {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the optimization that makes it much faster


private val sexagesimal: Parser[Int] = oneOrTwoDigitsPositiveInt.filter(x => x >= 0 && x < 60)

private val literal: Parser[String] = takeWhile1(x => x != ' ' && x != '-')
Copy link
Collaborator Author

@mbaechler mbaechler Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correctness fix as things like mon-fri were failing previously

sepBy1(b, comma)
.filter(_.size > 1)
.map(values => SeveralNode.fromSeq[F](values.toList).get)
sepBy(b, comma)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We were actually checking the size several times, now it's done a single time

case ParseResult.Done(_, result) => Right(result)
case ParseResult.Fail("", _, _) => Left(ExprTooShort)
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a correctness fix : a too short expression was not detected as such

@@ -166,8 +184,9 @@ package object atto {
} yield CronExpr(sec, min, hour, day, month, weekDay)

def parse(e: String): Either[Error, CronExpr] =
(cron.parseOnly(e): @unchecked) match {
(phrase(cron).parseOnly(e): @unchecked) match {
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a correctness fix : it ensure the string is parsed entirely

@mbaechler
Copy link
Collaborator Author

Hi @alonsodomin , it seems you are not active anymore on github and cron4s, any chance you consider discussing about sharing the maintenance of cron4s project?

@alonsodomin
Copy link
Owner

hello @mbaechler, thanks for your submitted work and sorry for reviewing this earlier (btw, best way to nudge me into reviewing something is to add me as a reviewer in the PR). Anyway, I'm on and off at Github but definitely more active on other repos than this one.

I have no issue on sharing ownership of this repo, it's indeed useful to the community (even though I don't use it as much) and I think it would be great to do a clean up of the library. I'll email you my contact so we can continue the conversation.

Regarding the changes addressed in this PR, how confident you feel after this improvements to make Atto the default parser? In my view this should move in that direction but also there should be a period of time in which the scala combinators parser is still available and that users could choose to use it (before finally deleting it from the code).

@mbaechler
Copy link
Collaborator Author

Thank you for your answer.

Regarding the changes addressed in this PR, how confident you feel after this improvements to make Atto the default parser?

I'm not very confident, right now.
I wanted to see if lifting some barriers would make that hypothesis acceptable.
If it is, I would first implement a property test that compare parsing results for both parser on a wide variety of inputs (think golden master testing)
Once proven that both are doing the same thing, I would propose a switch.

In my view this should move in that direction but also there should be a period of time in which the scala combinators parser is still available and that users could choose to use it (before finally deleting it from the code).

It sounds reasonable.

@alonsodomin
Copy link
Owner

in that case, it may be worth moving the atto impl out of the bench code and into a separate module and make the test kit target it to at least ensure that functionality between the two is fully preserved.

@mbaechler
Copy link
Collaborator Author

in that case, it may be worth moving the atto impl out of the bench code and into a separate module and make the test kit target it to at least ensure that functionality between the two is fully preserved.

I will do in another commit, do you think we can merge this one?

@mbaechler
Copy link
Collaborator Author

in that case, it may be worth moving the atto impl out of the bench code and into a separate module and make the test kit target it to at least ensure that functionality between the two is fully preserved.

I will do in another commit, do you think we can merge this one?

I guess, as it touches a single file that is not a part of any library artifact, I can merge it safely.

@mbaechler mbaechler merged commit f7c2905 into alonsodomin:master Nov 15, 2024
18 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants