Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add bracketing to lexer #68

Merged
merged 25 commits into from
Dec 23, 2024
Merged

feat: Add bracketing to lexer #68

merged 25 commits into from
Dec 23, 2024

Conversation

croyzor
Copy link
Collaborator

@croyzor croyzor commented Dec 17, 2024

Revived from #32. I was mistaken to close it.

Copy link
Collaborator

@acl-cqc acl-cqc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly this is nice - Bracket.hs does feel a little overblown, but generally it seems much nicer/neater than the last time we had bracketing done earlier, and there are some big improvements in error messages etc. too :).

I'm not quite gonna approve though until you can refactor within such that I understand it ;-) (and maybe improve brackets(Worker) :-) )

brat/Data/Bracket.hs Outdated Show resolved Hide resolved
brat/Brat/FC.hs Outdated Show resolved Hide resolved
brat/Brat/Error.hs Outdated Show resolved Hide resolved
,"at"
,show openFC
]
show (UnexpectedClose b) = unwords ["There is no"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pretty much an OpenCloseMismatch, without an opening (or where the opening is beginning of file) - consider combining the two, i.e.OpenCloseMismatch (Maybe (FC, BracketType)) BracketType. Or not, the conceptual similarity could just be confusing, up to you...


instance Show BToken where
show (FlatTok t) = show t
show (Bracketed _ b ts) = showOpen b ++ show ts ++ showClose b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

show ts here will insert commas between the elements of ts (a list), right? Shouldn't we also insert commas after the opening bracket and before the closing bracket? And presumably we do something about the comma token itself...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Or use showTokens ts rather than show ts, and then there'll be no separators added between tokens?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I realize you can't showTokens ts here because that needs a Proxy argument but nonetheless, this inserts a comma between the tokens - is that what you want? I tried changing show ts to concatMap show ts which is similar but without the inserted commas and all tests still pass, suggesting we have no coverage of this so we may not even have been aware of the show-inserted commas???

brat/Brat/Lexer/Bracketed.hs Outdated Show resolved Hide resolved
brat/Brat/Lexer/Bracketed.hs Outdated Show resolved Hide resolved
brat/Brat/Lexer/Bracketed.hs Outdated Show resolved Hide resolved
brat/examples/karlheinz.brat Outdated Show resolved Hide resolved
acl-cqc and others added 2 commits December 23, 2024 09:20
This is targetted at #68.

After some preliminary refactorings to remove `Bwd` in #72, I realized
`brackets` and `within` were 90% the same, but resisted my first
attempts to combine them ;). Changing the contract on `within` (here
renamed `helper` and made local to `brackets`) allowed this to proceed.

Note the second commit makes explicit, using `Maybe....NonEmpty` an
invariant that in the first commit is just comments and
incomplete-pattern-matches; you may prefer the first way though.

---------

Co-authored-by: Craig Roy <[email protected]>
@croyzor croyzor requested a review from acl-cqc December 23, 2024 09:21
Copy link
Collaborator

@acl-cqc acl-cqc left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me although please have a think about these 3 comments

import Text.Megaparsec (PosState(..), SourcePos(..), TraversableStream(..), VisualStream(..))
import Text.Megaparsec.Pos (mkPos)

opener :: Tok -> Maybe BracketType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest

enum OpenClose = Opening(BracketType) | Closing(BracketType) -- maybe there's a better name
openClose :: Tok -> Maybe OpenClose

Then you can do foo ... | Just (Opening b) <- openClose(blah) = ... and so on, and you get a much better case openClose ... of

Then zap both opener and closer, or if you really want, you can do

opener t | Just(Opening b) <- openClose t  = Just(b)
opener _ = Nothing

Could also make OpenClose incorporate the Maybe directly inside it, i.e. Open(BracketType) | Close(BracketType) | Neither

closeTok Square = RSquare
closeTok Brace = RBrace

eofErr :: FC -> BracketType -> Error
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I suggested you curry this earlier and now I'm gonna make the reverse suggestion - sorry! But the "unit" of bracket, in openCloseMismatchErr, is the (FC, BracketType) and I think it has to be that way, so we should probably use that here and in unexpectedCloseErr too - this would simplify brackets as you would then be able to let-bind locals of type (FC, BracketType)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think I'm really getting it, but have uncurried anyway


instance Show BToken where
show (FlatTok t) = show t
show (Bracketed _ b ts) = showOpen b ++ show ts ++ showClose b
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, so I realize you can't showTokens ts here because that needs a Proxy argument but nonetheless, this inserts a comma between the tokens - is that what you want? I tried changing show ts to concatMap show ts which is similar but without the inserted commas and all tests still pass, suggesting we have no coverage of this so we may not even have been aware of the show-inserted commas???

@acl-cqc
Copy link
Collaborator

acl-cqc commented Dec 23, 2024

And, thanks @croyzor - nice work! :-)

@croyzor
Copy link
Collaborator Author

croyzor commented Dec 23, 2024

Re: Show BToken - this exists so that we can use it in the implementation of VisualStream which we need to pack up error messages but afaict, we don't actually call showTokens in our tests. It might come up when using the dbg debugging function from the parser library? So something vaguely sensible would be nice

@croyzor croyzor merged commit 7725c16 into main Dec 23, 2024
2 checks passed
@croyzor croyzor deleted the refactor/parser-wc branch December 23, 2024 16:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants