-
-
Notifications
You must be signed in to change notification settings - Fork 6.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910
base: next
Are you sure you want to change the base?
Proof of concept: custom matcher for sankey langium; DRY, allows imports #4910
Conversation
commit 6187f4b commit message is "change test case with double quotes values"
✅ Deploy Preview for mermaid-js ready!
To edit notification comments on pull requests, go to your Netlify site configuration. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So to summarize your approach, you're using a negative lookahead?
If so, then I wouldn't recommend this approach in terms of performance and complexity.
It would perform 3 more operations for each rule to validate that the following rule isn't a11y title, a11y description, or title for now.
After working on the language server for a while, it appears that we can't perform operations on hidden rules, e.g., directives.
The ideal situation is to store all directives, highlight them with a warning indecting that they're deprecated, and give the ability to perform a quick fix
command to change them the front matter.
If we make the DIRECTIVE
rule a regular terminal rule, then we would also need to validate that it's not a DIRECTIVE
rule with the original three rules.
This would also apply on YAML
rule and any other rule we need to make it a regular instead of hidden in common.langium
Sorry but no, that's not an accurate summary at all. You're missing the entire point of this PR. Again, the point is to show that we can: Use the existing Chevrotain mechanism (hook) to overcome langium's limitation/problem with ordering TOKENS. Whether or not negative lookaheads are or are not used in a particular implementation of a custom matcher is not the point. Again, in the description of this PR I clearly said: There may very well be better ways to implement matchSankeyLinkNode(). I put together a straightforward approach because I was focused on seeing if this approach (using a custom matcher) can work. |
@aloisklink @nirname we should look at this approach and see if it should be used before merging Pie diagram. The plan is to get the release process started next week itself. |
@@ -0,0 +1,24 @@ | |||
grammar Sankey | |||
import "../common/common"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@weedySeaDragon that's much cleaner approach. I'd rather stick to it because it is uniform. As you mentioned there might be better ways to implement matchers. Should we pay attention to them as well? |
/** | ||
* Matches sankey link source and target node | ||
*/ | ||
export const sankeyLinkNodeRegex = /(?:"((?:""|[^"])+)")|([^\n\r,]+(?=%%)|[^\n\r,]+)/; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know this line originates from #4799, nevertheless I think we have to pre-process such things as %%
so as not to include them in grammar / lexers at all if possible. Removing these lines beforehand will reduce overall complexity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
totally agree.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
%% lines are already removed by mermaid before parsing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The parser should handle comments in general, it should be a standalone package.
I'm not sure if we should have that pre-process, but if we do, then it should be in the parser.
But how accurate it is? Some places it's possible to have %%
in general. For example:
pie
"section
%% not comment
name": 1.0
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And I want to highlight that if we modified the input, it would return an inaccurate AST in terms of each attribute value, position, and range.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree, if the new parser can handle the comments in a common manner, it should do it.
We're already facing issues with incorrect line numbers in errors and stuff.
I'm not 100% sure, but seems that Chevrotain supports lexing context hence we could try it. |
@nirname I'm not sure if you're aware of the other new approach, but it has been discussed here: #4751 (comment). |
@nirname - Using lexing context and doing things like pre-processing to remove I'm going to leave this PR as-is for now until a decision is made about the overall approach. (I don't want to spend time on this if it's not going to be used.) I'm happy to make changes if it is key to any discussion and/or decision. |
_tokens: IToken[], | ||
_groups | ||
): CustomMatchResult => { | ||
const regexpsToFail: RegExp[] = [accessibilityDescrRegex, accessibilityTitleRegex, titleRegex]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm still pretty unfamiliar with Langium, I'm sure you're all more experienced than me, but this approach looks good to me.
Although we may want to consider defining this variable in packages/parser/src/language/common/matcher.ts
, especially since then we can use something like https://github.com/spadgos/regex-combiner to create one optimized regex instead of three separate ones.
In the long term, it might be worth asking Langium to add some options about importing tokens and their precedence. I can imagine that other people might encounter the same issues as us, and having syntax like import "../common/common" with xxxx;
might make sense.
@Yokozuna59 can you combine the approach from this PR with yours and push? |
@sidharthv96 Sorry for the late response; I don't actually have the time to do so. Feel free to do so. |
📑 Summary
This shows how we can overcome the problem: When importing one
.langium
grammar file into another, there is no way to control the order of TOKENS.See this comment in PR #4799 and the comments (thread) after it.
This is just a draft/spike to show that this approach works. I created a new branch from
next
, then incorporated all of the changes from @Yokozuna59 's branch.I then added my 3 small commits
📏 Design Decisions
Using an existing mechanism will make future code changes and maintenance much more straightforward. Future coders will be able to read the Chevrotain documentation to understand how the custom matcher works.
Having a solution to the problem (that langium doesn't provide a way to control TOKEN order when importing one
.langium
grammar file into another) means that we can take advantage of common, standardized grammar tokens/rules.Here are the changes I made:
sankeyTokenBuilder
I replacedtokenType.PATTERN = regexPatternFunc(sankeyLinkNodeRegex);
withtokenType.PATTERN = matchSankeyLinkNode;
matchSankeyLinkNode()
ensures that the SANKEY_LINK_NODE does not inadvertently match the diagram title, a11y title, or a11y description . I used the regexps already defined in ../common/matcher(
accessibilityDescrRegex, accessibilityTitleRegex, titleRegex`)sankey.langium
by removing the duplicated code fromcommon/common.langium
and replacing it with a simpleimport "../common/common";
statement.There may very well be better ways to implement
matchSankeyLinkNode().
I put together a straightforward approach because I was focused on seeing if this approach (using a custom matcher) can work.