Parsek

Parser library for Kotlin consisting of a tokenizer and expression parser.

Tokenization

Tokenization is the process of splitting the input into a stream of token that is consumed by a parser.

In Parsek, this is distributed between two classes called Lexer and Scanner.

Lexer

The lexer (source, kdoc) is basically an iterator for a stream of tokens that is generated by splitting the input using regular expressions.

Regular expressions are mapped to token types using a function which typically just returns a fixed token type inline. The function can be used to implement a second layer of mapping, but this should be fairly uncommon. Input mapped to null (typically whitespace) will not be reported.

The lexer is usually not used directly; instead, it's handed in to the Scanner, which in turn is used by the parser.

The reason for the Lexer/Scanner split is to separate "raw" parsing from providing a nice and convenient API. The small API surface of the Lexer allows us to easily install additional processing between the Lexer and Scanner, for instance for context-sensitive newline filtering.

Typically, the Lexer is constructed directly inline where the Scanner is constructed.

Token

The token class (source, kdoc) stores the token type (typically a user-defined enum), the token text and the token position. Token instances are generated by the Lexer.

RegularExpressions

The RegularExpressions object (source, kdoc) contains a set of useful regular expressions for source code and data format tokenization.

Scanner

The Scanner class (source, kdoc) provides a simple API for convenient access to the token stream generated by the Lexer.

The scanner provides a notion of a "current" token that can be inspected multiple times -- opposed to iterator.next(), where the current token is "gone" after the call. This makes it easy to hand the scanner with the current token down in a recursive descend parser until it is consumed and processed by the corresponding handler.
It provides unlimited dynamic lookahead.
It provides a tryConsume() convenience method that checks for a given token text and consumes the token and returns true when it was found.

Scanner Use Cases

Typical use cases that only need a scanner and no expression parser are data formats such as JSON or CSV.

For a simple example, please refer to the JSON parser example.

Expression Parser

The configurable expression parser (source, kdoc) operates on a tokenizer, is stateless and should be shared / reused.

For ternary expressions, create a suffix expression and use the supplied tokenizer to consume the rest of the ternary.
Functions / "Apply" can be implemented in a similar way. Alternatively, this can be implemented in primary expression
parsing by checking for an opening brace after the primary expression.
"Grouping" brackets should be implemented where primary expressions are processed, too.

Expression Parser-Based Examples

A simple example evaluating mathematical expressions directly (opposed to building an explicit parse tree) can be found in the tests
A complete PL/0 parser is included in the examples module to illustrate how to use the expression parser and tokenizer for a simple but computational complete language: Parser.kt, Pl0Test.kt
A parser for mathematical expressions: ExpressionParser.kt, ExpressionsTest.kt
A simple example for using the scanner and expression parser to implement a simple indentation-based programming language: mython, MythonTest.kt
A BASIC interpreter using Parsek: https://github.com/stefanhaustein/basik

Name		Name	Last commit message	Last commit date
Latest commit History 90 Commits
convention-plugins		convention-plugins
core		core
examples		examples
gradle/wrapper		gradle/wrapper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.gradle.kts		build.gradle.kts
gradle.properties		gradle.properties
gradlew		gradlew
gradlew.bat		gradlew.bat
settings.gradle.kts		settings.gradle.kts

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Parsek

Tokenization

Lexer

Token

RegularExpressions

Scanner

Scanner Use Cases

Expression Parser

Expression Parser-Based Examples

About

Uh oh!

Releases 6

Packages

Languages

License

kobjects/parsek

Folders and files

Latest commit

History

Repository files navigation

Parsek

Tokenization

Lexer

Token

RegularExpressions

Scanner

Scanner Use Cases

Expression Parser

Expression Parser-Based Examples

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 6

Packages 0

Languages

Packages