Skip to content

mvc-works/lilac-parser

Repository files navigation

Lilac parser

A toy DSL-based combinator parser with better failure reasons.

Online demo http://repo.mvc-works.org/lilac-parser/

Try with (def a (add 1 2)) or {"json": [1, 2]}.

Usage

Clojars Project

[mvc-works/lilac-parser "0.0.3-a5"]
(require '[lilac-parser.core :refer
            [parse-lilac defparser is+ many+ one-of+ other-than+
             some+ combine+ interleave+ label+
             replace-lilac find-lilac]])

(parse-lilac (string/split "aaaa" "") (many+ (is+ "a")))

Demo of a stupid S-expression parser:

(def number-parser (many+ (one-of+ "1234567890")))

(def space-parser (is+ " "))

(def word-parser (many+ (one-of+ "qwertyuiopasdfghjklzxcvbnm")))

(defparser
 s-expr-parser+
 ()
 identity
 (combine+
  [(is+ "(")
   (some+ (or+ [number-parser word-parser space-parser (s-expr-parser+)]))
   (is+ ")")]))

(parse-lilac (string/split "(def a (add 1 2))" "") (s-expr-parser+))

More demos.

Rules

Rule Example Description
is+ (is+ "a") or (is+ "abc") matches a piece of string
one-of+ (one-of+ "abc") or (one-of+ #{"a" "b" "c"}) matches a character in one of candidates
other-than+ (other-than+ "abc") matches a character that is not listed
optional+ (optional+ (is+ "a")) matching or nothing
some+ (some+ (is+ "a")) matches 0 or more items
many+ (many+ (is+ "a")) matches 1 or more items
or+ (or+ [(is+ "a") (is+ "b")]) matches one among listed items
combine+ (combine+ [(is+ "a") (is+ "b")]) matches items in ecxact order
interleave+ (interleave+ (is+ "a") (is+ ",")) matches two interleaving items
label+ (label+ "just a" (is+ "a")) simpler rule for adding comments in result
unicode-range+ (unicode-range+ 97 122) matches a with unicode in between given

defparser

defparser is a macro for defining parser that can be used recursively. The type is :component, which is like a more complicated version of :label. Notice that s-expr-parser+ defined with defparser is different from a normal rule, it's a function so it need to be called before being used as a rule.

Visual DEMO

lilac-parser would be pretter slow since it tries to store all information during parsing, which results in a piece of EDN data. The result can be rendered into a tree with GUI and that's what is demonstrated in the demo.

An example for EDN data in parsing a JSON number.

{
  :ok? true, :value 112, :parser-node :component, :label :value-parser+
  :rest ("," "1")
  :result {
    :ok? true, :value 112, :parser-node :or
    :rest ("," "1")
    :result {
      :ok? true, :parser-node :label, :label "number", :value 112
      :rest ("," "1")
      :result {
        :ok? true, :value 112, :parser-node :combine
        :rest ("," "1")
        :results [
          {
            :ok? true, :value nil, :parser-node :optional
            :result {
              :ok? false, :message "expects \"-\" but got \"1\"", :parser-node :is
              :rest ["1" "1" "2" "," "1"]
            }
            :rest ["1" "1" "2" "," "1"]
          }
          {
            :ok? true, :parser-node :many
            :value ("1" "1" "2")
            :rest ("," "1")
            :results [
              {
                :ok? true, :value "1", :parser-node :one-of
                :rest ("1" "2" "," "1")
              }
              {
                :ok? true, :value "1", :parser-node :one-of
                :rest ("2" "," "1")
              }
              {
                :ok? true, :value "2", :parser-node :one-of
                :rest ("," "1")
              }
            ]
            :peek-result {
              :ok? false, :message "\",\" is not in \"1234567890\"", :parser-node :one-of
              :rest ("," "1")
            }
          }
          {
            :ok? true, :value nil, :parser-node :optional
            :result {
              :ok? false, :parser-node :combine, :message "failed to combine"
              :result {
                :ok? false, :message "expects \".\" but got \",\"", :parser-node :is
                :rest ("," "1")
              }
              :previous-results []
              :rest ("," "1")
            }
            :rest ("," "1")
          }
        ]
      }
    }
  }
}

Preset rules

Under lilac-parser.preset:

  • lilac-digit matches \d
  • lilac-alphabet matches [a-zA-Z]
  • lilac-chinese-char matches [\u4e00-\u9fa5]
  • lilac-comma-space matches \s*\,\s*

Custom Rule

Parser rules can be expected by injecting functions. It could be quite tricky and is not recommended:

(lilac-parser.core/resigter-custom-rule! :xyz
  (fn [xs rule]
    ; TODO
    ))

(defn xyz+ [xs transform]
  ; TODO
  )

Replacer

A function is also provided for replacing text pieces matching a given rule:

(replace-lilac content rule (fn [x] (str "<<<" x ">>>>")))

which returns :result as well as parsing details in :attempts:

{
  :result "<<<MATCHED>>>>"
  :attempts [
    ; parsing summaries in vector
  ]
}

This is an experimental API serving jobs as a custom regular expression replacer.

Similarly matched pieces can be collected with find-lilac:

(find-lilac content rule)

Workflow

Workflow https://github.com/mvc-works/calcit-workflow

License

MIT