Multiple replacements with peg/match #1422
-
I would like to replace this (simplified) function with a (defn replace-utf8 [str]
(->> str
(string/replace-all "=C3=A4" "ä")
(string/replace-all "=C3=B6" "ö")
(string/replace-all "=C3=BC" "ü")))
(test (replace-utf8 "M=C3=B6ller") "Möller") I tried to do it like in the example in Janet for Mortals: Pegular Expressions: (defn replace-utf8-peg [str]
(peg/match ~(any (+
(/ "=C3=A4" "ä")
(/ "=C3=B6" "ö")
(/ "=C3=BC" "ü")
1))
str))
(test (replace-utf8-peg "M=C3=B6ller") @["Möller"]) But it doesn't work. The test |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 1 reply
-
(defn replace-utf8-peg [str]
(peg/match ~(accumulate (any (+
(/ "=C3=A4" "ä")
(/ "=C3=B6" "ö")
(/ "=C3=BC" "ü")
'1)))
str)) To do what you want (
(defn replace-utf8-peg [str]
(string (peg/replace-all
~{:main (* "=" :hex-byte)
:hex-byte (number (* :hex-digit :hex-digit) 16)
:hex-digit (range "09" "AF")}
(fn [_ digit] (string/from-bytes digit))
str))) Though the lookup-table might be preferable depending on the circumstances. e.g. if you don't want to think about overlong encodings or illegal encodings or something -- this function doesn't validate that the result is UTF-8, it just trusts it. Whereas your approach parses a subset that is definitely valid. |
Beta Was this translation helpful? Give feedback.
1
matches the byte, but doesn't capture it. You can use:To do what you want (
accumulate
will join all the captured characters into a string efficiently, instead of giving you an array of characters at the end).peg/replace
is probably a better fit here, though. And assuming that you want to produce utf-8 encoded strings in your final result (which I guess is ambiguous given the code you provided -- I don't know your file encoding), you can just parse the bytes and don'…