-
Notifications
You must be signed in to change notification settings - Fork 0
/
2007-06-03-295-adventures-in-parsing,-part-3.html
204 lines (174 loc) · 13.9 KB
/
2007-06-03-295-adventures-in-parsing,-part-3.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<link rel="alternate"
type="application/rss+xml"
href="https://magnus.therning.org/feed.xml"
title="RSS feed for https://magnus.therning.org/">
<title>Adventures in parsing, part 3</title>
<meta name="author" content="Magnus Therning"><meta name="referrer" content="no-referrer"><link href= "static/style.css" rel="stylesheet" type="text/css" /><link href= "static/htmlize.css" rel="stylesheet" type="text/css" /><link href= "static/extra_style.css" rel="stylesheet" type="text/css" /></head>
<body>
<div id="preamble" class="status"><div class="nav-bar"><a class="nav-link" href="./index.html">Top</a><a class="nav-link" href="./archive.html">Archive</a><a class="nav-link align-right" href="./feed.xml"><img src="static/rss-feed-icon.png" style="height: 24px;" /></a></div></div>
<div id="content">
<div class="post-date">03 Jun 2007</div><h1 class="post-title"><a href="https://magnus.therning.org/2007-06-03-295-adventures-in-parsing,-part-3.html">Adventures in parsing, part 3</a></h1>
<p>
I got a great many comments, at least by my standards, on my earlier <a href="https://magnus.therning.org/2007-05-29-290-more-adventures-in-parsing.html">two</a> <a href="https://magnus.therning.org/2007-05-27-289-adventures-in-parsing.html">posts</a>
on parsing in Haskell. Especially on the latest one. Conal posted a comment on
the first pointing me towards <code>liftM</code> and its siblings, without telling me that
it would only be the first step towards "applicative style". So, here I go
again…
</p>
<p>
First off, importing <code>Control.Applicative</code>. Apparently <code><|></code> is defined in both
<code>Applicative</code> and in <code>Parsec</code>. I do use <code><|></code> from <code>Parsec</code> so preventing
importing it from <code>Applicative</code> seemed like a good idea:
</p>
<div class="org-src-container">
<pre class="src src-haskell"><span class="org-keyword">import</span> Control.Applicative hiding <span class="org-rainbow-delimiters-depth-1">(</span> <span class="org-rainbow-delimiters-depth-2">(</span><|><span class="org-rainbow-delimiters-depth-2">)</span> <span class="org-rainbow-delimiters-depth-1">)</span>
</pre>
</div>
<p>
Second, Cale pointed out that I need to make an instance for
<code>Control.Applicative.Applicative</code> for <code>GenParser</code>. He was nice enough to point
out how to do that, leaving syntax the only thing I had to struggle with:
</p>
<div class="org-src-container">
<pre class="src src-haskell"><span class="org-keyword">instance</span> <span class="org-type">Applicative</span> <span class="org-rainbow-delimiters-depth-1">(</span><span class="org-type">GenParser</span> c st<span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-keyword">where</span>
pure = return
<span class="org-rainbow-delimiters-depth-1">(</span><span class="org-operator"><*></span><span class="org-rainbow-delimiters-depth-1">)</span> = ap
</pre>
</div>
<p>
I decided to take baby-steps and I started with <code>parseAddress</code>. Here's what it
used to look like:
</p>
<div class="org-src-container">
<pre class="src src-haskell">parseAddress = <span class="org-keyword">let</span>
hexStr2Int = <span class="org-warning">Prelude</span>.read <span class="org-operator">.</span> <span class="org-rainbow-delimiters-depth-1">(</span><span class="org-string">"0x"</span> <span class="org-operator">++</span><span class="org-rainbow-delimiters-depth-1">)</span>
<span class="org-keyword">in</span> <span class="org-keyword">do</span>
start <- liftM hexStr2Int <span class="org-operator">$</span> thenChar <span class="org-string">'-'</span> <span class="org-operator">$</span> many1 hexDigit
end <- liftM hexStr2Int <span class="org-operator">$</span> many1 hexDigit
return <span class="org-operator">$</span> Address start end
</pre>
</div>
<p>
On Twan's suggestion I rewrote it using <code>where</code> rather than <code>let ... in</code> and
since this was my first function I decided to go via the <code>ap</code> function (at the
same time I broke out <code>hexStr2Int</code> since it's used in so many places):
</p>
<div class="org-src-container">
<pre class="src src-haskell">parseAddress = <span class="org-keyword">do</span>
start <- return hexStr2Int `ap` <span class="org-rainbow-delimiters-depth-1">(</span>thenChar <span class="org-string">'-'</span> <span class="org-operator">$</span> many1 hexDigit<span class="org-rainbow-delimiters-depth-1">)</span>
end <- return hexStr2Int `ap` <span class="org-rainbow-delimiters-depth-1">(</span>many1 hexDigit<span class="org-rainbow-delimiters-depth-1">)</span>
return <span class="org-operator">$</span> Address start end
</pre>
</div>
<p>
Then on to applying some functions from <code>Applicative</code>:
</p>
<div class="org-src-container">
<pre class="src src-haskell">parseAddress = Address start end
<span class="org-keyword">where</span>
start = hexStr2Int <span class="org-operator"><$></span> <span class="org-rainbow-delimiters-depth-1">(</span>thenChar <span class="org-string">'-'</span> <span class="org-operator">$</span> many1 hexDigit<span class="org-rainbow-delimiters-depth-1">)</span>
end = hexStr2Int <span class="org-operator"><$></span> <span class="org-rainbow-delimiters-depth-1">(</span>many1 hexDigit<span class="org-rainbow-delimiters-depth-1">)</span>
</pre>
</div>
<p>
By now the use of <code>thenChar</code> looks a little silly so I changed that part into
<code>many1 hexDigit <* char '-'</code> instead. Finally I removed the <code>where</code> part
altogether and use <code><*></code> to string it all together:
</p>
<div class="org-src-container">
<pre class="src src-haskell">parseAddress = Address <span class="org-operator"><$></span>
<span class="org-rainbow-delimiters-depth-1">(</span>hexStr2Int <span class="org-operator"><$></span> many1 hexDigit <span class="org-operator"><*</span> char <span class="org-string">'-'</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>hexStr2Int <span class="org-operator"><$></span> <span class="org-rainbow-delimiters-depth-2">(</span>many1 hexDigit<span class="org-rainbow-delimiters-depth-2">)</span><span class="org-rainbow-delimiters-depth-1">)</span>
</pre>
</div>
<p>
From here on I skipped the intermediate steps and went straight for the last
form. Here's what I ended up with:
</p>
<div class="org-src-container">
<pre class="src src-haskell">parsePerms = Perms <span class="org-operator"><$></span>
<span class="org-rainbow-delimiters-depth-1">(</span> <span class="org-rainbow-delimiters-depth-2">(</span><span class="org-operator">==</span> <span class="org-string">'r'</span><span class="org-rainbow-delimiters-depth-2">)</span> <span class="org-operator"><$></span> anyChar<span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span> <span class="org-rainbow-delimiters-depth-2">(</span><span class="org-operator">==</span> <span class="org-string">'w'</span><span class="org-rainbow-delimiters-depth-2">)</span> <span class="org-operator"><$></span> anyChar<span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span> <span class="org-rainbow-delimiters-depth-2">(</span><span class="org-operator">==</span> <span class="org-string">'x'</span><span class="org-rainbow-delimiters-depth-2">)</span> <span class="org-operator"><$></span> anyChar<span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>cA <span class="org-operator"><$></span> anyChar<span class="org-rainbow-delimiters-depth-1">)</span>
<span class="org-keyword">where</span>
cA a = <span class="org-keyword">case</span> a <span class="org-keyword">of</span>
<span class="org-string">'p'</span> -> Private
<span class="org-string">'s'</span> -> Shared
parseDevice = Device <span class="org-operator"><$></span>
<span class="org-rainbow-delimiters-depth-1">(</span>hexStr2Int <span class="org-operator"><$></span> many1 hexDigit <span class="org-operator"><*</span> char <span class="org-string">':'</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>hexStr2Int <span class="org-operator"><$></span> <span class="org-rainbow-delimiters-depth-2">(</span>many1 hexDigit<span class="org-rainbow-delimiters-depth-2">)</span><span class="org-rainbow-delimiters-depth-1">)</span>
parseRegion = MemRegion <span class="org-operator"><$></span>
<span class="org-rainbow-delimiters-depth-1">(</span>parseAddress <span class="org-operator"><*</span> char <span class="org-string">' '</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>parsePerms <span class="org-operator"><*</span> char <span class="org-string">' '</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>hexStr2Int <span class="org-operator"><$></span> <span class="org-rainbow-delimiters-depth-2">(</span>many1 hexDigit <span class="org-operator"><*</span> char <span class="org-string">' '</span><span class="org-rainbow-delimiters-depth-2">)</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>parseDevice <span class="org-operator"><*</span> char <span class="org-string">' '</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span><span class="org-warning">Prelude</span>.read <span class="org-operator"><$></span> <span class="org-rainbow-delimiters-depth-2">(</span>many1 digit <span class="org-operator"><*</span> char <span class="org-string">' '</span><span class="org-rainbow-delimiters-depth-2">)</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator"><*></span>
<span class="org-rainbow-delimiters-depth-1">(</span>parsePath <span class="org-operator"><|></span> string <span class="org-string">""</span><span class="org-rainbow-delimiters-depth-1">)</span>
<span class="org-keyword">where</span>
parsePath = <span class="org-rainbow-delimiters-depth-1">(</span>many1 <span class="org-operator">$</span> char <span class="org-string">' '</span><span class="org-rainbow-delimiters-depth-1">)</span> <span class="org-operator">*></span> <span class="org-rainbow-delimiters-depth-1">(</span>many1 anyChar<span class="org-rainbow-delimiters-depth-1">)</span>
</pre>
</div>
<p>
I have to say I'm fairly pleased with this version of the parser. It reads about
as easy as the first version and there's none of the "reversing" that <code>thenChar</code>
introduced.
</p>
<p>
<i>Comment by Conal Elliott:</i>
</p>
<p>
A thing of beauty! I'm glad you stuck with it, Magnus.
</p>
<p>
Some much smaller points:
</p>
<ul class="org-ul">
<li>The pattern <code>(== c) <$> anyChar</code> (nicely written, btw) arises three times, so
it might merit a name.</li>
<li>Similarly for <code>hexStr2Int <$> many1 hexDigit</code>, especially when you rewrite <code>f
<$> (a <* b)</code> to <code>(f <$> a) <* b</code>.</li>
<li>The pattern <code>(a <* char ' ') <*> b</code> comes up a lot. How about naming it also,
with a nice infix op, say <code>a <#> b</code>?</li>
<li>The cA definition could use pattern matching instead (e.g., <code>cA 'p' = Private</code>
and <code>cA 's' = Shared</code>).</li>
<li>Some of your parens are unnecessary (3rd line of <code>parseDevice</code> and last of
<code>parseRegion</code>), since application binds more tightly than infix ops.</li>
</ul>
<p>
<i>Comment by Twan van Laarhoven:</i>
</p>
<p>
First of all, note that you don't need parentheses around <code>parseSomething <*
char ' '</code>.
</p>
<p>
You can also simplify things a bit more by combining <code>hexStr2Int <$> many1
hexDigit</code> into a function, then you could say:
</p>
<div class="org-src-container">
<pre class="src src-haskell">parseHex = hexStr2Int <span class="org-operator"><$></span> many1 hexDigit
parseAddress = Address <span class="org-operator"><$></span> parseHex <span class="org-operator"><*</span> char <span class="org-string">'-'</span> <span class="org-operator"><*></span> parseHex
parseDevice = Device <span class="org-operator"><$></span> parseHex <span class="org-operator"><</</span>em<span class="org-operator">></span> char <span class="org-string">':'</span> <span class="org-operator"><*></span> parseHex
</pre>
</div>
<p>
Also, in <code>cA</code>, should there be a case for character other than 'p' or 's'?
Otherwise the program could fail with a pattern match error.
</p>
<p>
<i>Response to Conal and Twan:</i>
</p>
<p>
Conal and Twan, thanks for your suggestions. I'll put them into practice and
post the "final" result as soon as I find some time.
</p>
<div class="taglist"><a href="https://magnus.therning.org/tags.html">Tags</a>: <a href="https://magnus.therning.org/tag-haskell.html">haskell</a> <a href="https://magnus.therning.org/tag-parsec.html">parsec</a> <a href="https://magnus.therning.org/tag-parsing.html">parsing</a> </div>
<div id="comments">Comment <a href=mailto:[email protected]?subject=Comment%20on%20INSERT%20POST%20URL%20HERE>here</a>.</div></div>
<div id="postamble" class="status"><!-- org-static-blog-page-postamble --></div>
</body>
</html>