I needed to decode some SVGs, and used the XML-codec
branch from this PR at commit f9efb78 (last commit at the moment). There was a couple of similar issues with it, with one presented below.
On a simple rects.svg
file the decoder failed:
<?xml version="1.0" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.0//EN" "http://www.w3.org/TR/2001/REC-SVG-20010904/DTD/svg10.dtd">
<svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" id="drawrects-structure-image-BE-06" viewBox="0 0 450 450" width="450" height="450" >
<g id="drawRects">
<rect x="225" y="0" width="225" height="225" style="fill:red" />
<rect x="0" y="225" width="225" height="225" style="fill:yellow" />
</g>
</svg>
It normally takes about half an hour to figure out what happens in such complicated code. Roughly it goes like this...
-
Figure out what's the 'main' rule, the entry point. Thankfully pretty simple, just look for
parse
call, there's just one in the decoder:result: either trace? [ parse-trace data document ] [ parse data document ]
-
I find the
document
rule and insertp: (? p)
after each sub-rule (prolog
andelement
)document: [ opt prolog element ]
-
Try to decode my data and count the number of
p: ...
printed, to figure out where it's stuck -
Go on with this boring routine deeply into subrules, printing the trace after every change, until I find the last rule that should have succeeded but instead failed
-
In this case I find culprit to be
PubidLiteral
:PubidLiteral: [ dq any PubidChar dq | sq any [not sq PubidChar] sq ]
-
By matching input
"-//W3C//DTD SVG 1.0//EN"
with the actual charset ofPubidChar
I can notice that digits aren't there:PubidChar: charset reduce [ space cr lf #"a" '- #"z" #"A" '- #"Z" {-'()+,./:=?;!*#@$_%} ]
So that must be the issue.
Only a matter of getting the progress log and then inspecting it:
-
Include ParSEE from the top of
XML.red
codec:#include %parsee.red
-
Change
parse
call to aparsee
one (one extrae
):result: either trace? [ parse-trace data document ] [ parsee data document ]
-
Try to decode the data, which brings up the progress inspection window where I can see what happened:
In just a minute I know that:
- it didn't succeed past the
<xml>
header - it failed in the
ExternalID
string after"PUBLIC"
- it didn't accept the
3
digit withPubidChar
- it didn't succeed past the
So the solution is easy:
PubidChar: charset reduce [
- space cr lf #"a" '- #"z" #"A" '- #"Z" {-'()+,./:=?;!*#@$_%}
+ space cr lf #"a" '- #"z" #"A" '- #"Z" #"0" '- #"9" {-'()+,./:=?;!*#@$_%}
]