-
Notifications
You must be signed in to change notification settings - Fork 35
/
FunctionsAndTypesForParsing.lhs
142 lines (102 loc) · 4.96 KB
/
FunctionsAndTypesForParsing.lhs
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
[[functions-and-types-for-parsing]]
= Functions and types for parsing
In this file is the source and explanation for the parsing functions
which we've been using, and some limited notes about the wrappers and
full types in Parsec.
> module FunctionsAndTypesForParsing where
>
> import Text.Parsec (ParseError)
> import Text.Parsec.String (Parser)
> import Text.Parsec.String.Parsec (parse)
> import Text.Parsec.String.Char (oneOf)
> import Text.Parsec.String.Combinator (eof,manyTill,anyToken)
> import Control.Applicative ((<$>), (<*>), (<*), (*>), many)
> import Control.Monad (void)
== Functions for parsing
Here are the testing functions which were used earlier:
The basic parse function: this is a pretty simple wrapper. The parse
function from parsec just adds a filename to use in parse errors,
which is set as the empty string here.
> regularParse :: Parser a -> String -> Either ParseError a
> regularParse p = parse p ""
'parse' is a basic function in the family of functions for running
parsers in Parsec. You can compose the parser functions in the Parser
monad, then run the top level function using 'parse' and get back an
'Either ParserError a' as the result. There are a few alternatives to
'parse' in Parsec, mostly when you are using a more general parser
type instead of 'Parser a' (which is an alias for 'ParsecT String ()
Identity a'). Have a look in the Text.Parsec.Prim module for these
<http://hackage.haskell.org/package/parsec-3.1.3/docs/Text-Parsec-Prim.html>.
This function will run the parser, but additionally fail if it doesn't
consume all the input.
> parseWithEof :: Parser a -> String -> Either ParseError a
> parseWithEof p = parse (p <* eof) ""
This function will apply the parser, then also return any left over
input which wasn't parsed.
> parseWithLeftOver :: Parser a -> String -> Either ParseError (a,String)
> parseWithLeftOver p = parse ((,) <$> p <*> leftOver) ""
> where leftOver = manyTill anyToken eof
TODO: what happens when you use 'many anyToken <* eof' variations
instead? Maybe should talk about greediness? Or talk about it in a
better place in the tutorial.
> parseWithWSEof :: Parser a -> String -> Either ParseError a
> parseWithWSEof p = parseWithEof (whiteSpace *> p)
> where whiteSpace = void $ many $ oneOf " \n\t"
You should have a look at the two helper executables, and see if you
can understand the code now. You can see them online here:
<https://github.com/JakeWheat/intro_to_parsing/blob/master/ParseFile.lhs>
<https://github.com/JakeWheat/intro_to_parsing/blob/master/ParseString.lhs>
== type signatures revisited
todo: update this to refer to real parsec instead of the string
wrappers here.
I think you should always use type signatures with Parsec. Because the
Parsec code is really generalized, without the type GHC will refuse to
compile this code. Try commenting out the type signature above and
loading into ghci to see the error message.
There is an alternative: you can get this code to compile without a
type signature by using the NoMonomorphismRestriction language
pragma. You can also see the type signature that GHC will choose for
this function by commenting the type signature and using -Wall and
-XNoMonomorphismRestriction together. Using NoMonomorphismRestriction
is a popular solution to these sorts of problems in haskell.
It's up to you whether you prefer to always write type signatures when
you are developing parsing code, or use the NoMonomorphismRestriction
pragma. Even if you can use NoMonomorphismRestriction, when using
explicit type signatures you usually get much simpler compiler error
messages.
== Parser
The definition of Parser and a partial explanation of the full type
signature.
```
type Parser = Parsec String ()
```
This means that a function returning Parser a parses from a String
with () as the initial state.
The Parsec type is defined like this:
```
type Parsec s u = ParsecT s u Identity
```
ParsecT is a monad transformer, I think it is the primitive one in the
Parsec library, and the 'Parsec' type is a type alias which sets the
base monad to be Identity.
Here is the haddock for the ParsecT type:
`ParsecT s u m a` is a parser with stream type `s`, user state type `u`,
underlying monad `m` and return type `a`.
The full types that you see like this:
```
satisfy :: Stream s m Char => (Char -> Bool) -> ParsecT s u m Char
```
refer to the same things (stream type s, user state type u, underlying
monad m).
We are using String as the stream type (i.e. the input type), () as
the user state type (this effectively means no user state, since ()
only has one value), and the underlying monad is Identity: we are
using no other underlying monad, so `Parser a` expands to `ParsecT
String () Identity a`.
I.e. the source is String, the user state is (), and the underlying monad
is Identity.
== Other information
TODO: Here is some other information on Parsec and Haskell:
links, tutorials on fp, section in rwh, lyah?, old parsec docs,
parsec docs on hackage, other parser combinator libs (uu, polyparse,
trifecta?)