Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: create sankey parser and integrate sankey parser into mermaid package #4799

Open
wants to merge 35 commits into
base: develop
Choose a base branch
from

Conversation

Yokozuna59
Copy link
Member

@Yokozuna59 Yokozuna59 commented Sep 1, 2023

📑 Summary

Brief description about the content of your PR.

📏 Design Decisions

Describe the way your implementation works or what design decisions you made if applicable.

📋 Tasks

Make sure you

@netlify
Copy link

netlify bot commented Sep 1, 2023

Deploy Preview for mermaid-js ready!

Name Link
🔨 Latest commit 681fbd0
🔍 Latest deploy log https://app.netlify.com/sites/mermaid-js/deploys/66759b018b59f0000826c971
😎 Deploy Preview https://deploy-preview-4799--mermaid-js.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site configuration.

@Yokozuna59 Yokozuna59 requested a review from nirname September 1, 2023 22:21
@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Sep 1, 2023

The remaining issues:

// @ts-ignore - figure how to stop using this approach
return (d.id = Uid.next('linearGradient-')).id;

@codecov
Copy link

codecov bot commented Sep 1, 2023

Codecov Report

Attention: Patch coverage is 0% with 462 lines in your changes missing coverage. Please review.

Project coverage is 5.70%. Comparing base (1e43ad1) to head (8fa0509).

Current head 8fa0509 differs from pull request most recent head 6aaa3ae

Please upload reports for the commit 6aaa3ae to get more accurate results.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           develop   #4799      +/-   ##
==========================================
- Coverage     5.73%   5.70%   -0.03%     
==========================================
  Files          278     282       +4     
  Lines        42019   42165     +146     
  Branches       516     520       +4     
==========================================
- Hits          2409    2407       -2     
- Misses       39610   39758     +148     
Flag Coverage Δ
unit 5.70% <0.00%> (-0.03%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files Coverage Δ
...s/mermaid/src/diagram-api/diagram-orchestration.ts 0.00% <0.00%> (ø)
packages/parser/src/language/info/module.ts 0.00% <0.00%> (ø)
packages/parser/src/language/packet/module.ts 0.00% <0.00%> (ø)
packages/parser/src/language/pie/module.ts 0.00% <0.00%> (ø)
packages/parser/src/language/sankey/index.ts 0.00% <0.00%> (ø)
...kages/mermaid/src/diagrams/sankey/sankeyDiagram.ts 0.00% <0.00%> (ø)
packages/parser/src/language/info/tokenBuilder.ts 0.00% <0.00%> (ø)
...ackages/parser/src/language/packet/tokenBuilder.ts 0.00% <0.00%> (ø)
packages/parser/src/language/pie/tokenBuilder.ts 0.00% <0.00%> (ø)
packages/parser/src/language/index.ts 0.00% <0.00%> (ø)
... and 12 more

... and 23 files with indirect coverage changes

@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Sep 1, 2023

It appears that it uses the node id as name, I'll fix that tomorrow.

Fixed

@Yokozuna59 Yokozuna59 force-pushed the add-sankey-langium-parser branch 8 times, most recently from ae9b5ff to 7ca02da Compare September 1, 2023 23:55
@Yokozuna59 Yokozuna59 marked this pull request as ready for review September 2, 2023 08:19
@Yokozuna59
Copy link
Member Author

@Yokozuna59 Yokozuna59 self-assigned this Sep 2, 2023
Copy link
Member

@sidharthv96 sidharthv96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome job with the types!

The additional complexity of langium is a bit concerning. Are you sure there is no simpler way?

packages/mermaid/src/diagrams/sankey/sankeyRenderer.ts Outdated Show resolved Hide resolved
packages/parser/src/language/sankey/sankey.langium Outdated Show resolved Hide resolved
Comment on lines 4 to 14
export const sankeyLinkSourceRegex = /(?:"((?:""|[^"])+)")|([^\n\r,]+)/;

/**
* Matches sankey link target node
*/
export const sankeyLinkTargetRegex = /,(?:(?:"((?:""|[^"])+)")|([^\n\r,]+))/;

/**
* Matches sankey link value
*/
export const sankeyLinkValueRegex = /,("(?:0|[1-9]\d*)(?:\.\d+)?"|[\t ]*(?:0|[1-9]\d*)(?:\.\d+)?)/;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear why this complex regex is necessary. It feels like there should be an easier way.
When comparing with the deleted .jison file, the langium one is a lot more complex. Which makes it harder for people to create new diagrams.

Something similar is duplicated inside the .langium file too.

terminal SANKEY_LINK_VALUE returns number: /,("(0|[1-9][0-9]*)(\.[0-9]+)?"|[\t ]*(0|[1-9][0-9]*)(\.[0-9]+)?)/;
terminal SANKEY_LINK_TARGET: /,("((?:""|[^"])+)"|[^\n\r,]+)/;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Matchers (most if the time) are just regexes to extract wanted data from tokens.

So here in these matcher regexes, it should match what is between doable quotes without including the wrapper doable quotes or everything until ,.
It could be simplified and replace the first and last char, but we'll need to make sure which pattern has been matched.


As for .langium, I'm not sure if we could simplify it without allowing weird pattern.
So in the value regex, disallowed some patterns:

, "0" %% no space after "
,0.   %% no empty `.`
,00   %% first digit 0 can't have another number after it

We can just use \d+\.?\d* but I wouldn't recommend it.

As for the second terminal rule, I'm not if we could simplify the doable quote part.
I think state in jison are similar to mode in chevrotain, so when the use " it should enters a state where it should consume everything until it finds another ", but I don't know how to use the same rule to enter and exist mode.
But with modes the following code would be invalid (where is should be since sankey supposed to be csv like grammar):

sankey-beta
source",target,0

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we could change langium rules to this:

terminal SANKEY_LINK_VALUE returns number: /,"0|[1-9][0-9]*)(\.[0-9]+)?"/ | /,[\t ]*(0|[1-9][0-9]*)(\.[0-9]+)?)/;
terminal SANKEY_LINK_TARGET: /,"(""|[^"])+"/ | /,[^\n\r,]+/;

Copy link
Member Author

@Yokozuna59 Yokozuna59 Sep 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It appears that it would throw a warning when using this approach:

src/language/sankey/sankey.langium:45:45 - Regular expression flags are only applied if the terminal is not a composition
src/language/sankey/sankey.langium:45:66 - Regular expression flags are only applied if the terminal is not a composition

@Yokozuna59 Yokozuna59 mentioned this pull request Sep 14, 2023
2 tasks
@Yokozuna59
Copy link
Member Author

Yokozuna59 commented Mar 25, 2024

I have asked the Langium team about this issue we're facing; see: eclipse-langium/langium#1429.

They're suggesting to annotate the greedy rule, which in this case is SANKEY_LINK_NODE, with a @greedy, for example, in the JSDoc comment, then use the CommentProvider service to move the greedy rules to the end of the array of rules in the TokenBuilder service. See eclipse-langium/langium#1429 (comment) for more information.

What do you guys think? This approach will allow us to add a comment to the greedy rules showing why we didn't write the regex directly in the grammar and annotate it with @greedy to handle it in the CommonTokenBuilder.

So something like this:

/**
 * @greedy - we can't directly write the rule in grammar because it's greedy
 */
terminal SANKEY_LINK_NODE: /sankey-link-node/;

@weedySeaDragon
Copy link
Contributor

Hi @Yokozuna59 -- You are exactly right. It is not sorting. (sigh)

I will look at the greedy... approach. Thanks for continuing to work with eclipse-langium to try to figure this out!

@Yokozuna59
Copy link
Member Author

@weedySeaDragon I have implemented the greedy approach in this comment (6c7b0f2). Please take a look, and we could revert the comment if it's not applicable.

Copy link
Contributor

@weedySeaDragon weedySeaDragon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great start.

  • needs tests so we can make sure the token order is correct. (I wrote some but can't add commits to your branch. See below)
  • buildTerminalTokens() will not work if there are two or more @greedy annotations (see my comments in code). (See below for my suggested way to fix it.)
  • some minor suggested changes in the code

Tests (parser/src/tests/sankey-tokenBuilder.test.ts)

import { beforeAll, describe, expect, it } from 'vitest';
import { createSankeyServices, type SankeyServices } from '../src/language/index.js';
import type { TokenType, TokenVocabulary } from 'chevrotain';

const sankeyServices: SankeyServices = createSankeyServices().Sankey;

describe('SankeyTokenBuilder', () => {
  describe('token order', () => {
    let tokenVocab: TokenVocabulary;
    let tokenVocabNames: string[];

    beforeAll(() => {
      // Get the ordered tokens (the vocabulary) from the grammar
      tokenVocab = sankeyServices.parser.TokenBuilder.buildTokens(sankeyServices.Grammar, {
        caseInsensitive: sankeyServices.LanguageMetaData.caseInsensitive,
      });
      // coerce the tokenVocab to a type that can use .map
      tokenVocabNames = (tokenVocab as TokenType[]).map((tokenVocabEntry: TokenType) => {
        return tokenVocabEntry.name;
      });
    });

    it('whitespace is always first', () => {
      expect(tokenVocabNames[0]).toEqual('WHITESPACE');
    });
    it('sankey-beta comes after whitespace', () => {
      expect(tokenVocabNames[1]).toEqual('sankey-beta');
    });

    describe('terminal rules with @greedy in comments are put at the end of the ordered list of tokens', () => {
      const NUM_EXPECTED_GREEDY_RULES = 2;

      let greedyGroupStartIndex = 0;
      beforeAll(() => {
        greedyGroupStartIndex = tokenVocabNames.length - NUM_EXPECTED_GREEDY_RULES - 1;
      });

      it('SANKEY_LINK_NODE rule has @greedy so it is in the last group of all @greedy terminal rules', () => {
        // @ts-ignore TokenVocabulary does not have an index type, so ts complains when we use a numeric as an index type
        expect(tokenVocabNames.indexOf('SANKEY_LINK_NODE')).toBeGreaterThanOrEqual(
          greedyGroupStartIndex
        );
      });
      it('SANKEY_LINK_VALUE rule has @greedy so it is in the last group of all @greedy terminal rules', () => {
        // @ts-ignore TokenVocabulary does not have an index type, so ts complains when we use a numeric as an index type
        expect(tokenVocabNames.indexOf('SANKEY_LINK_VALUE')).toBeGreaterThanOrEqual(
          greedyGroupStartIndex
        );
      });
    });

    // console.log(`tokenVocabNames: ${tokenVocabNames}`); // @ts-ignore show this in the console for helpful debugging info
  });
});

Suggested fix for AbstractMermaidTokenBuilder

1. Add a method to test if a rule has a @greedy annotation

  • extracting it into a separate method means it'll be easier to change (or move to different class) later
  // TODO: This responsibility might better belong in CommentProvider (e.g. AbstractMermaidCommentProvider that is a subclass of  CommentProvider).
  public ruleHasGreedyComment(rule: GrammarAST.AbstractRule): boolean {
    const comment = this.commentProvider.getComment(rule);
    return !!comment && /@greedy/.test(comment);
  }

2. Change the buildTerminalTokens method:

protected override buildTerminalTokens(rules: Stream<GrammarAST.AbstractRule>): TokenType[] {
    if (rules.some((rule: GrammarAST.AbstractRule) => this.ruleHasGreedyComment(rule))) {
      const notGreedyRules: GrammarAST.AbstractRule[] = [];
      const lastRules: GrammarAST.AbstractRule[] = [];
      // put terminal rules with @greedy in their comment at the end of the array
      rules.forEach((rule) => {
        if (this.ruleHasGreedyComment(rule)) {
          lastRules.push(rule);
        } else {
          notGreedyRules.push(rule);
        }
      });
      return super.buildTerminalTokens(stream([...notGreedyRules, ...lastRules]));
    } else {
      return super.buildTerminalTokens(rules);
    }
  }

Co-authored-by: Ashley Engelund (weedySeaDragon @ github) <[email protected]>
Signed-off-by: Reda Al Sulais <[email protected]>
@Yokozuna59
Copy link
Member Author

@weedySeaDragon Thanks for your fast review!

I have applied your suggestions locally, but one of the test cases is failing:

 FAIL  packages/parser/tests/sankey-tokenBuilder.test.ts > SankeyTokenBuilder > token order > terminal rules with @greedy in comments are put at the end of the ordered list of tokens > SANKEY_LINK_VALUE rule has @greedy so it is in the last group of all @greedy terminal rules
AssertionError: expected 3 to be greater than or equal to 9
 ❯ packages/parser/tests/sankey-tokenBuilder.test.ts:43:62
     41|       });
     42|       it('SANKEY_LINK_VALUE rule has @greedy so it is in the last group of all @greedy terminal rules', () => {
     43|         expect(tokenVocabNames.indexOf('SANKEY_LINK_VALUE')).toBeGreaterThanOrEqual(
       |                                                              ^
     44|           greedyGroupStartIndex
     45|         );

Anyway, I'm not entirely sure I have followed your suggestion correctly for this comment.

@weedySeaDragon
Copy link
Contributor

@Yokozuna59 - it's failing because it's expecting the SANKEY_LINK_VALUE rule in the grammar to also have a@greedy comment annotation. I totally forgot to make a comment about that.

@Yokozuna59
Copy link
Member Author

@weedySeaDragon Sorry to ask, but why should we annotate it with @greedy? It works fine without it, and it isn't considered greedy.

@weedySeaDragon
Copy link
Contributor

You're right -- it's not totally necessary for the grammar in sankey.langium. The point was to have a grammar that tested having more than one @greedy annotation in it. If there's another way to write a test without using that grammar file, we should definitely do it. I didn't spend the time to read thru the langium test code to figure it out.

If you add the @greedy annotation to the ...VALUE in the sankey grammar and the tests pass locally, I'd say go ahead and remove the 2nd @greedy from sankey.langium, comment out the test that assumes ...VALUE has the @greedy annotation, and change the NUM_EXPECTED_GREEDY_RULES to = 1 and then commit it.
(I'd prefer to leave the commented-out test in so that we can be reminded of it in the future. But others may feel different.)

@Yokozuna59
Copy link
Member Author

The point was to have a grammar that tested having more than one @greedy annotation in it. If there's another way to write a test without using that grammar file, we should definitely do it.

I think the only way is to create sample grammar using createServicesForGrammar and override the default TokenBuilder and use the CommonTokenBuilder.

If you add the @greedy annotation to the ...VALUE in the sankey grammar and the tests pass locally, I'd say go ahead and remove the 2nd @greedy from sankey.langium,

If both rules have the @greey and we declared the ...NODE before the ...VALUE then it won't pass the tests.

@Yokozuna59
Copy link
Member Author

@weedySeaDragon I have created a separate test case for the CommonTokenBuilder: 51896f3

@nirname
Copy link
Contributor

nirname commented Jun 20, 2024

@Yokozuna59 sorry for the long response, I'll skim through the questions ASAP

@Yokozuna59 Yokozuna59 force-pushed the add-sankey-langium-parser branch from 43a6176 to 6aaa3ae Compare June 21, 2024 02:35
Copy link

argos-ci bot commented Jun 21, 2024

The latest updates on your projects. Learn more about Argos notifications ↗︎

Build Status Details Updated (UTC)
default (Inspect) ✅ No changes detected - Dec 31, 2024, 11:15 PM

@Yokozuna59 Yokozuna59 force-pushed the add-sankey-langium-parser branch from 6aaa3ae to 681fbd0 Compare June 21, 2024 15:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants