Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use nim compiler to parse strange syntax #14

Open
YesDrX opened this issue Aug 25, 2022 · 1 comment
Open

Use nim compiler to parse strange syntax #14

YesDrX opened this issue Aug 25, 2022 · 1 comment

Comments

@YesDrX
Copy link

YesDrX commented Aug 25, 2022

Can we use nim complier as library (runtime rather than compiletime) to parse edge case statements? Like in the example below, there is a if expression. In theory, we can create a shared library, in which there are some C-functions like

bool isIfExpr(char* src, size_t src_size);

To run the following example, you need

nimble install compiler

example

import compiler / [ast, idents, parser, options]
import strutils
import strformat

var
    identCache = newIdentCache()
    configRef = newConfigRef()
    code = """
var tmp = if 1 > 2 : "1 is bigger than 2" else : "1 is not bigger than 2"
"""

proc echoTree(tree : PNode, indent_level : int = 0) : string =
    if tree != nil:    
        if tree.kind == nkIfExpr:
            echo fmt"""
            ====================================================
                    WHAT? WE DETECTED AN IF_EXPRESSION
            ====================================================
            """

        result = result & " ".repeat(4 * indent_level) & tree.kind.`$` & fmt" : ({tree.info.line}:{tree.info.col}) "

        case tree.kind
        of nkCharLit .. nkUInt64Lit:
            result = result & fmt" : {tree.intVal}"&"\n"
        of nkFloatLit .. nkFloat128Lit:
            result = result & fmt" : {tree.floatVal}"&"\n"
        of nkStrLit .. nkTripleStrLit:
            result = result & fmt" : {tree.strVal}"&"\n"
        of nkSym:
            result = result & "\n"
        of nkIdent:
            result = result & fmt": {tree.ident.s}"&"\n"
        else:
            result = result & "\n"
            for son in tree.sons:
                result = result & son.echoTree(indent_level + 1)

echo  code.parseString(identCache, configRef).echoTree

Output

            ====================================================
                    WHAT? WE DETECTED AN IF_EXPRESSION
            ====================================================
            
nkStmtList : (1:0) 
    nkVarSection : (1:0) 
        nkIdentDefs : (1:4) 
            nkIdent : (1:4) : tmp
            nkEmpty : (1:8) 
            nkIfExpr : (1:10) 
                nkElifExpr : (1:13) 
                    nkInfix : (1:15) 
                        nkIdent : (1:15) : >
                        nkIntLit : (1:13)  : 1
                        nkIntLit : (1:17)  : 2
                    nkStmtList : (1:21) 
                        nkStrLit : (1:21)  : 1 is bigger than 2
                nkElseExpr : (1:42) 
                    nkStmtList : (1:49) 
                        nkStrLit : (1:49)  : 1 is not bigger than 2

@aMOPel
Copy link
Owner

aMOPel commented Aug 25, 2022

Interesting idea.

I don't know how to make that work though.
Have you read the tree sitter docs on creating parsers?

The src/parser.c is completely generated from the grammer.js file (using the tree sitter cli). I don't know of any interface to insert things at runtime into parser.c, but there is src/scanner.cc which offers more fine grained control over parsing than the DSL in grammar.js.

Theoretically you could import the nim compiler library as c code or cpp code in the src/scanner.cc. However the way, that the scanner (and probably parser) works is character by character and I don't know how that plays with the nim compiler library.

To give an example, currently the triplestr_lit is done in the scanner.cc, or at least the content and the ending quotes.

https://github.com/aMOPel/tree-sitter-nim/blob/main/grammar.js#L1183

It works like this:
In the grammar.js, we match a triplestr_lit if we find the """ followed by
_multi_string_content rules and a _multi_string_end rule. Those are done in the src/scanner.cc here:

https://github.com/aMOPel/tree-sitter-nim/blob/main/src/scanner.cc#L147

and the way the API works is character by character. You can use
lexer->lookahead to look at the next char,
advance(lexer) to match the next char and go 1 char forward, and
skip(lexer) to not match the next char and go 1 char forward.
(there is also mark_end)

That is pretty much the whole API.
So I don't really know how to make this work with the nim compiler lib, but frankly I never used it, so maybe you have an idea.

I would be curious about the size of the parser, when you would to import the nim compiler.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants