-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support discarding the CST to preserve memory #1733
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Mark,
I just did a quick 'scan' of the code and asked questions.
When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?
Daily reminder (and note to myself): It would be easier to have code formatting in a separate commit ;)
Thanks
Daniel
As soon as the document is changed (either on disk, or via the |
Sounds all good! Thx |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(see my last review)
)]; | ||
if (goToLink && goToLink.target.$segments) { | ||
const name = this.nameProvider.getNameProperty(goToLink.target); | ||
if (name) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we use the whole composite node as target in case a name is not available?
// Whenever the user reopens the document, the CST will be rebuilt | ||
discardCst(document.parseResult.value); | ||
} | ||
// Discard the diagnostics for the closed document |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we make this configurable (without method overriding)? It could be simply a public property on the service class to turn this behavior on or off?
I imagine there are users/language implementors who would like to keep all problem markers even for closed documents.
@@ -30,7 +30,7 @@ export interface AsyncParser { | |||
* | |||
* @throws `OperationCancelled` if the parsing process is cancelled. | |||
*/ | |||
parse<T extends AstNode>(text: string, cancelToken: CancellationToken): Promise<ParseResult<T>>; | |||
parse<T extends AstNode>(text: string, options: ParserOptions | undefined, cancelToken: CancellationToken): Promise<ParseResult<T>>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The PR contains multiple "minor" breaking changes, but in sum I'd say the changes are significant enough to move this to v4.0. Shall we start planning the new major version more concretely?
@@ -23,6 +23,7 @@ export interface AstNode { | |||
readonly $containerIndex?: number; | |||
/** The Concrete Syntax Tree (CST) node of the text range from which this node was parsed. */ | |||
readonly $cstNode?: CstNode; | |||
readonly $segments?: AstNodeSegments; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a comment explaining this new property.
@@ -61,6 +61,9 @@ export class DefaultDocumentValidator implements DocumentValidator { | |||
|
|||
async validateDocument(document: LangiumDocument, options: ValidationOptions = {}, cancelToken = CancellationToken.None): Promise<Diagnostic[]> { | |||
const parseResult = document.parseResult; | |||
if (!parseResult.value.$cstNode) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why the early-exit here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the validator requires the CST to identify where to place the validation markers. If I want to place a diagnostic on a keyword, the validator doesn't know where to place it, because all information about that keyword was lost when we discarded the CST.
readonly locale: string; | ||
} | ||
|
||
export interface Environment extends EnvironmentInfo { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need this as a separate service? AFAICT the information is only used by the WorkspaceManager.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like this could benefit other services as well and is well suited as a stand-alone service.
@spoenemann I agree, this is something for a v4.0 of Langium. This PR isn't finished anyway - the formatter needs a refactoring as the new way of doing comments in the CST broke the formatter and I still haven't gotten to writing any tests for this. |
This (surprisingly small) change enables us to discard the CST on closed documents (in LSP use cases) to reduce our memory footprint. I've seen several crashes of Langium based language servers on large (> 100mb) workspaces, and this change should help alleviate those issues.
This change includes:
Environment
service that identifies whether we're currently running as a language server. Running in a CLI environment usually requires full access to the CST.CstParserMode
that allows to discard parsed CSTs. The default isRetain
, butDiscard
will be used when we encounter a closed document while running as a language server. This also goes in tandem with the newdiscardCst
function that removes CSTs from ASTs that were parsed inRetain
mode. Useful for documents that have been closed.$segments
property has been added for caching of CST data. In particular, it contains the full range of any AST node and the ranges for all properties. Ranges for keywords are not stored (yet).$segments
property.Specific tests for the discard behavior will follow soon. This PR just shows the feasibility and backwards compatibility of the changes at hand so far.