Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support discarding the CST to preserve memory #1733

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

Conversation

msujew
Copy link
Member

@msujew msujew commented Oct 29, 2024

This (surprisingly small) change enables us to discard the CST on closed documents (in LSP use cases) to reduce our memory footprint. I've seen several crashes of Langium based language servers on large (> 100mb) workspaces, and this change should help alleviate those issues.

This change includes:

  1. A new Environment service that identifies whether we're currently running as a language server. Running in a CLI environment usually requires full access to the CST.
  2. A new CstParserMode that allows to discard parsed CSTs. The default is Retain, but Discard will be used when we encounter a closed document while running as a language server. This also goes in tandem with the new discardCst function that removes CSTs from ASTs that were parsed in Retain mode. Useful for documents that have been closed.
  3. A new $segments property has been added for caching of CST data. In particular, it contains the full range of any AST node and the ranges for all properties. Ranges for keywords are not stored (yet).
  4. This new property is used extensively in the LSP related services to find the correct positions of properties within closed files. Therefore, most services that attempt to find some specific position/range now use the $segments property.

Specific tests for the discard behavior will follow soon. This PR just shows the feasibility and backwards compatibility of the changes at hand so far.

@msujew msujew added ast AST structure related issue performance Issues related to the runtime performance of Langium labels Oct 29, 2024
Copy link
Contributor

@danieldietrich danieldietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Mark,
I just did a quick 'scan' of the code and asked questions.
When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?
Daily reminder (and note to myself): It would be easier to have code formatting in a separate commit ;)
Thanks
Daniel

@msujew
Copy link
Member Author

msujew commented Oct 30, 2024

When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?

As soon as the document is changed (either on disk, or via the textDocument/open notification), the language server will be notified by the language client and reparses the document. Reparsing the document automatically invalidates the cache, so we don't need to worry about desync in that regard.

@danieldietrich
Copy link
Contributor

When I read "cache" my reflex is "and what about invalidation?", i.e. how do you ensure that the closed file contents don't deviate from the cached regions?

As soon as the document is changed (either on disk, or via the textDocument/open notification), the language server will be notified by the language client and reparses the document. Reparsing the document automatically invalidates the cache, so we don't need to worry about desync in that regard.

Sounds all good! Thx

Copy link
Contributor

@danieldietrich danieldietrich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(see my last review)

)];
if (goToLink && goToLink.target.$segments) {
const name = this.nameProvider.getNameProperty(goToLink.target);
if (name) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we use the whole composite node as target in case a name is not available?

// Whenever the user reopens the document, the CST will be rebuilt
discardCst(document.parseResult.value);
}
// Discard the diagnostics for the closed document
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall we make this configurable (without method overriding)? It could be simply a public property on the service class to turn this behavior on or off?

I imagine there are users/language implementors who would like to keep all problem markers even for closed documents.

@@ -30,7 +30,7 @@ export interface AsyncParser {
*
* @throws `OperationCancelled` if the parsing process is cancelled.
*/
parse<T extends AstNode>(text: string, cancelToken: CancellationToken): Promise<ParseResult<T>>;
parse<T extends AstNode>(text: string, options: ParserOptions | undefined, cancelToken: CancellationToken): Promise<ParseResult<T>>;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR contains multiple "minor" breaking changes, but in sum I'd say the changes are significant enough to move this to v4.0. Shall we start planning the new major version more concretely?

@@ -23,6 +23,7 @@ export interface AstNode {
readonly $containerIndex?: number;
/** The Concrete Syntax Tree (CST) node of the text range from which this node was parsed. */
readonly $cstNode?: CstNode;
readonly $segments?: AstNodeSegments;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a comment explaining this new property.

@@ -61,6 +61,9 @@ export class DefaultDocumentValidator implements DocumentValidator {

async validateDocument(document: LangiumDocument, options: ValidationOptions = {}, cancelToken = CancellationToken.None): Promise<Diagnostic[]> {
const parseResult = document.parseResult;
if (!parseResult.value.$cstNode) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why the early-exit here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the validator requires the CST to identify where to place the validation markers. If I want to place a diagnostic on a keyword, the validator doesn't know where to place it, because all information about that keyword was lost when we discarded the CST.

readonly locale: string;
}

export interface Environment extends EnvironmentInfo {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need this as a separate service? AFAICT the information is only used by the WorkspaceManager.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like this could benefit other services as well and is well suited as a stand-alone service.

@msujew
Copy link
Member Author

msujew commented Nov 7, 2024

@spoenemann I agree, this is something for a v4.0 of Langium. This PR isn't finished anyway - the formatter needs a refactoring as the new way of doing comments in the CST broke the formatter and I still haven't gotten to writing any tests for this.

@msujew msujew marked this pull request as draft November 7, 2024 11:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ast AST structure related issue performance Issues related to the runtime performance of Langium
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants