Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Variables should resolve before keywords. #31

Closed
widavies opened this issue Aug 24, 2020 · 3 comments · Fixed by #89
Closed

Variables should resolve before keywords. #31

widavies opened this issue Aug 24, 2020 · 3 comments · Fixed by #89
Assignees
Labels
bug Indicates an unexpected problem or unintended behavior

Comments

@widavies
Copy link
Contributor

widavies commented Aug 24, 2020

Great library, one small issue I'm having though. It seems like you resolve built-in keywords/functions before the variables (from the ContextModel) are resolved. This yields a couple of problems.

If I bind a variable name like score, the library seems to be trying to resolve e and subsequently doesn't properly resolve score (as I presume its then trying to resolve scor?). This happens whenever any variable name contains a built-in keyword. Other examples of invalid variable names:

variable
ceiling
resin

Basically anything that contains e or a built-in function like sin, ceil, etc.

Also, it should be noted in my use case that variables can be named anything (my users define variables).

So for example, a variable name could be something like qw%d. This is more of a special case, and not really needed, but it would be nice if your code was smart enough to tell that this is probably a variable name and not a wd and d separated by a modulus. Although, if you're resolving variables first, this actually should be a fairly trivial case to handle.

Thanks! I can also take a look at fixing this if you point me in a right direction / have any pointers.

@fkleon fkleon added the bug Indicates an unexpected problem or unintended behavior label Aug 24, 2020
@fkleon
Copy link
Owner

fkleon commented Aug 24, 2020

Thanks for the report, you're right that this is unintentional. It's is an issue with the parser and somewhat related to #10.

For example, score is currently parsed as e(scor):

$ dart bin/interpreter.dart
[..]
score
> Tokens: [(VAR: scor), (EFUNC: e)]
> RPN: [(VAR: scor), (EFUNC: e)]
> Expression: e(scor)

The issue is in the Lexer#tokenize function in lib/src/parser.dart which processes the input as individual characters first, and prioritises known keywords before considering longer variable or function names. This leads to situations where parts of word tokens are parsed incorrectly.

I think this could be improved fairly easily by simply avoiding splitting up coherent word tokens (e.g. potential variable/function names). The qw%d case is a bit trickier as that contains the modulus operator. I'd be inclined to restrict legal variable names to alphanumeric characters only, and probably also requiring the first character to be alphabetic to avoid confusion with number literals.

Feel free to have a stab at this, and let me know if you have any further questions.

On a side note, I've started rewriting the parser to use a grammar-based approach as described in this comment. The WIP code is on the feature-parser-rewrite branch but it's still missing support for parsing functions, and I'm not sure when I'll get around to finishing that off. I'd be happy to accept a contribution to either the existing or new WIP parser code.

@widavies
Copy link
Contributor Author

For now, I may just use a workaround of using a regex replace for all variables instead of using the ContextModel to bind variables. I may try to work on a temporary fix, but I'm rather pressed for time and may just wait for your feature-parser-rewrite. No rush. Thanks!

@fkleon
Copy link
Owner

fkleon commented Feb 5, 2025

The new parser handles this much better, so the bug with parsing identifiers containing e (#35) is hopefully solved.

This leaves two points from this ticket:

  • Variables should resolve before keywords (as per title). It's still not possible to "override" build-in constant symbols and functions. Currently this is not a feature I'd like to support.
  • Variable naming: Naming of identifiers (e.g. variables) is now much more restrictive: starting with a letter or $, followed by any number of word characters. This simplifies parsing a lot, but isn't a fully backwards compatible change so will undoubtedly cause issues for some users. The workaround of "pre-processing" variable names should still work though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants