Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I can not find a way of branching between expressions #25

Open
groteck opened this issue Feb 21, 2023 · 4 comments
Open

I can not find a way of branching between expressions #25

groteck opened this issue Feb 21, 2023 · 4 comments
Labels
question Further information is requested

Comments

@groteck
Copy link

groteck commented Feb 21, 2023

Hi, Thanks for the work I really like the library and enjoy how fast I madre progress with relative none idea about language parsers.

Contex: I'm trying to do a parser for the Conventional Commits spec that is something like this:

<type>[optional scope]: <description>

[optional body]

[optional footer(s)]

Example:

feat(homepage): Add a new header to my home page

We need to add a new link to the homepage so customers can do some action that was not there before

closes: #445

In my parser I have the issue that the Body and the Footer of the conventional commit can be close to the same the main difference is the existent tag on the footer. But with the available macros + the regex I don't find a way to branch and say if you find something like .+: It is a footer so process the tags.

Please understand that my knowledge about language parsing is really limited and I'm using this library as one of my first approaches. If there is nonsense here let me know and point me to the wright direction if it's possible.

@shadaj
Copy link
Member

shadaj commented Feb 23, 2023

This requires a bit of creativity with how the grammar is structured to make sure tokenization takes place correctly, but what I'd recommend is having two enum variants that both start by looking for a sequence of tokens without a :, and then only one variant can parse a token after it as well as more text.

pub enum BodyOrFooter {
    Footer(
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        String,
        #[rust_sitter::leaf(text = ":")]
        (),
        #[rust_sitter::leaf(pattern = r"[\w\s]+", transform = |v| v.to_string())]
        String,
    ),
    Body(
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        String,
    ),
}

This works because Tree Sitter always looks for the token matching the [^:]+ regex, and can then branch based on whether the following token is a : or not.

@shadaj shadaj added the question Further information is requested label Feb 23, 2023
@shadaj
Copy link
Member

shadaj commented Feb 23, 2023

Alternatively, due to the structure of Conventional Commits, you may want to just have a separate struct for the Body and Footer. So then you parse a Vec<Body> followed by a Vec<Footer>.

@groteck
Copy link
Author

groteck commented Mar 2, 2023

Hi @shadaj, using your answer I was able to parse the footer and the body, but the second part:

Alternatively, due to the structure of Conventional Commits, you may want to just have a separate struct for the Body and Footer. So then you parse a Vec followed by a Vec

.

This part is a bit more complex, the only way that I found about to parse into a Vec is using:

    pub struct Language {
        pub type_: Type,
        #[rust_sitter::leaf(pattern = r"\s")]
        _whitespace: (),
        #[rust_sitter::leaf(pattern = r".+", transform = |v| v.to_string())]
        pub description: String,
        #[rust_sitter::delimited(
            #[rust_sitter::leaf(text = "/n")]
            ()
        )]
        pub footer: Option<FooterLine>,
        #[rust_sitter::delimited(
            #[rust_sitter::leaf(text = "/n/n")]
            ()
        )]
        pub body: Option<BodyParagraph>,
    }
    
    #[derive(Debug, PartialEq)]
    pub struct FooterLine {
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        pub tag: String,
        #[rust_sitter::leaf(text = ": ")]
        _separator: (),
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        pub value: String,
    }

    #[derive(Debug, PartialEq)]
    pub struct BodyParagraph {
        #[rust_sitter::leaf(pattern = r"[^:]+", transform = |v| v.to_string())]
        pub value: String,
    }

But not really sure what is wrong there 😄

@groteck
Copy link
Author

groteck commented Mar 2, 2023

But Since this was related to branching Ithink we can close the issue as solved and maybe raise the next about vectors in another?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants