Improve performance of hidden node parsing #1790
Merged
+126
−48
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I've noticed in a project with a large file with a lot of comments (~20 000 LoC, ~10 000 comments), that parsing started taking a huge performance hit. It seems like our current method of iterating through the CST to add hidden nodes leads to super-linear behavior in the parsing phase.
You can reproduce this by using the test I've added to
langium-parser.test.ts
and run it on the currentmain
branch. While the parse takes only ~150ms on my machine with the new changes, it takes roughly ~5000ms onmain
.This change replaces the existing hidden node placement algorithm with a different one that should perform linearly. Note that the placement of the hidden nodes has not changed with this algorithm. Existing tests (and also a few new ones) verify this behavior.
Some ideas for this new algorithm were taken from #1733.