Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Semantic Token Support #533

Open
wants to merge 9 commits into
base: develop
Choose a base branch
from
Open

Semantic Token Support #533

wants to merge 9 commits into from

Conversation

Doekeb
Copy link

@Doekeb Doekeb commented Mar 10, 2024

LSP supports Semantic Tokens which editors and colorschemes can opt into in order to provide "smarter" language highlighting than pure tree-based highlighting.
https://code.visualstudio.com/api/language-extensions/semantic-highlight-guide
https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens

Notably, Neovim now supports semantic tokens (neovim/neovim#21100) and, more recently, semantic token modifiers (neovim/neovim#22022).

This feature has been requested in this repo here: #33
In the unmaintained base here: palantir/python-language-server#933
In another jedi-based language server here: pappasam/jedi-language-server#137
And it's implementation has been attempted and abandoned twice in the latter: pappasam/jedi-language-server#196 and pappasam/jedi-language-server#231

There is a maintained fork of alternative tool for Neovim here https://github.com/wookayin/semshi, but it suffers from the major drawbacks that it is only available for Neovim and highlight colors are hardcoded, so they are unlikely to match the user's colorscheme.

This PR only implements full document protocol. Performance may be improved by also implementing full document delta protocol, and the range protocol.

Here are some examples in two different colorschemes with only very simple rules implemented so far. Tree-based highlighting is always on the left, and augmented with Semantic Token highlighting is always on the right.

Functions and classes

  • Tree-based highlighting infers whether a reference is a class based on its name and therefore doesn't highlight dingus_mc_bingus as a class even though it is.
  • Similarly, tree-based highlighting can't determine that my_function and MyFunction are both functions.

classes_functions_cp
classes_functions_tn

Imports

Tree-based highlighting can't determine what kind of thing imported names are, other than by their naming (which often break convention even in standard library modules in python)

imports_cp
imports_tn

Parameters

  • Tree-based highlighting colors parameters in the signature of a function/method differently than when they are used in the body. Semantic token highlighting maintains parameter highlighting until the variable is re-assigned.
  • Note that tree-based highlighting treats the name self as a special token. This is not language smarts as evidenced by the lack of highlighting of the language-equivalent this. Semantic tokens currently colors both self and this inside a method as a regular parameter, but this could be improved using semantic token modifiers and a bit more inference (the colorschemes I'm using here don't apply any different styles to modifiers). Note that even in the semantic token augmented version, tree-based highlighting takes over on self when its outside a method.

parameters_cp
parameters_tn

Properties

Tree-based highlighting guesses whether an attribute is a property or a method based on the presence of parentheses. Semantic token highlighting knows the difference.

properties_cp
properties_tn

@rchl
Copy link
Contributor

rchl commented Mar 11, 2024

Do you have some performance data? For example how long it takes to generate tokens in a 2000 lines document? It feels like it would be very slow to trigger "goto" for each "name" like that.

Ideally such feature would be implemented by jedi and use some form of caching to speed things up. The LSP semantic tokens is designed in a way that should make the case of adding/removing text pretty fast but in your implementation the whole work seems like will be done from scratch on every single change.

@Doekeb
Copy link
Author

Doekeb commented Mar 11, 2024

Do you have some performance data? For example how long it takes to generate tokens in a 2000 lines document? It feels like it would be very slow to trigger "goto" for each "name" like that.

I don't have performance data on a behemoth like that, but happy to gather some especially if you can point me in the direction of a big project I can try it on. Additionally, if performance ends up being an issue for huge files, it would be fairly simple to implement the range protocol which exists exactly for this purpose. From the LSP specs:

There are two uses cases where it can be beneficial to only compute semantic tokens for a visible range:

  • for faster rendering of the tokens in the user interface when a user opens a file. In this use cases servers should also implement the textDocument/semanticTokens/full request as well to allow for flicker free scrolling and semantic coloring of a minimap.
  • if computing semantic tokens for a full document is too expensive servers can only provide a range call. In this case the client might not render a minimap correctly or might even decide to not show any semantic tokens at all.

Determining when to request full semantic tokens vs. a range would then be the client's responsibility.

Ideally such feature would be implemented by jedi and use some form of caching to speed things up. The LSP semantic tokens is designed in a way that should make the case of adding/removing text pretty fast but in your implementation the whole work seems like will be done from scratch on every single change.

I agree that an upstream implementation is possible and preferable, and it would be great to contribute a portion of this to Jedi down the road. But hopefully this can work for the people who want it in the meantime.

If performance is a major concern (I agree that it would be good to gather more information on this front), we could begin by making this plugin opt-in like many of the other bundled plugins are.

@rchl
Copy link
Contributor

rchl commented Mar 19, 2024

I don't have performance data on a behemoth like that, but happy to gather some especially if you can point me in the direction of a big project I can try it on.

Not as big but maybe https://github.com/davidhalter/jedi/blob/master/jedi/plugins/stdlib.py

Additionally, if performance ends up being an issue for huge files, it would be fairly simple to implement the range protocol which exists exactly for this purpose. From the LSP specs:

Would it really be that easy? It depends really on whether the API that you are using for this would make it possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants