Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

token definitions #2

Open
colin-kiegel opened this issue Nov 22, 2015 · 3 comments
Open

token definitions #2

colin-kiegel opened this issue Nov 22, 2015 · 3 comments

Comments

@colin-kiegel
Copy link
Member

Tokens: I like it, that you are only storing references instead of strings. This is something that I tried to, but I postponed it. I think we would need some helper object to support mutability.

I was experimenting with something like this - but I decided it would be way too complex right now to implement a nice + safe interface for it: https://gist.github.com/colin-kiegel/735df3ddb40da853c923.

This is my current definition in lexer/token/mod.rs:

  • renamed Var to Expression
  • split Number to Integer vs. Floating without keeping the string
  • using owned String right now (idea to generalize later - see gist)
pub enum Token {
    _Eof,
    Text(String),
    BlockStart,
    ExpressionStart, // orig. Var
    BlockEnd,
    ExpressionEnd,
    Name(String),
    IntegerNumber(u64), // orig. Number
    FloatingNumber(f64), // orig. Number
    String(String),
    Operator(String),
    Punctuation(Punctuation),
    _InterpolationStart,
    _InterpolationEnd,
}
@Nercury
Copy link
Member

Nercury commented Nov 22, 2015

This looks almost the same. In my code I also have owned TokenValue in addition to TokenValueRef<'c> used for lexing. The 'c is lifetime of original string blob, short for 'code. I found no need to modify token values themselves, they are just the input data when parser creates another structure for node. So the only place I need TokenValue is error messages.

pub enum TokenValueRef<'a> {
    Text(&'a str),
    BlockStart,
    VarStart,
    BlockEnd,
    VarEnd,
    Name(&'a str),
    Value(ConstRef<'a>),
    Operator(&'a str),
    Punctuation(char),
    InterpolationStart,
    InterpolationEnd,
    CommentStart, // Not in vanilla Twig.
}

impl<'a> Into<TokenValue> for TokenValueRef<'a> {
    fn into(self) -> TokenValue {
        ...
    }
}

pub enum TokenValue {
    Text(String),
    BlockStart,
    VarStart,
    BlockEnd,
    VarEnd,
    Name(String),
    Value(Const),
    Operator(String),
    Punctuation(char),
    InterpolationStart,
    InterpolationEnd,
    CommentStart, // Not in vanilla Twig.
}

The differences:

  • I think we can safely get rid of Eof.
  • I combined Integer and Float into Value. As far as I remember, there was some test for big integers. So I had to keep this big integer there as str slice. I think it was for those cases where such value was going to be pushed directly to output. However, if that's the case, it could be stored in text.
  • I think your way of doing punctuation is better.
  • CommentStart is not necessary.

@colin-kiegel
Copy link
Member Author

In my notation everything with an underscore is 'unstable', because I don't need it yet. So _Eof is definetely a candidate for removal. Interpolation is still unimplemented in my parser.

Numbers: I suggest

  • I revert my split of IntegerNumber/FloatingNumber to something like your Value(Const)
  • Could we call it Number(Number)? Because almost everything is a value more or less
  • Number could be an enum with variants
    ** Integer(i64, &'a str)
    ** Floating(f64, &'a str)
    ** Possibly: Unparsed(&'a str) / BigInt or whatever

I was thinking about saving numbers redundantly before, i.e. str-slice + parsed value. This way one has to do the conversion once upfront, but saves later conversions. My implementation would currently have to convert back to string at a later stage. Yours would need to parse values later - possibly multiple times. This would be a mix of both our approaches.

With Punctuation it might also benefit to store the char representation additionally. But that's probably micro-optimazition not really worth it.

@Nercury
Copy link
Member

Nercury commented Dec 6, 2015

_Eof

Will be gone.

Value(Const); Could we call it Number(Number)?

Will do that.

Number could be an enum with variants ** Integer(i64, &'a str) ** Floating(f64, &'a str) ** Possibly: Unparsed(&'a str) / BigInt or whatever

Ok. Sadly, no built-in BigInt in Rust.

I was thinking about saving numbers redundantly before, i.e. str-slice + parsed value. Yours would need to parse values later - possibly multiple times.

I very much intended to have another value representation after the lexing, which would closely match the use case, so I don't see issue at this stage.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants