-
Notifications
You must be signed in to change notification settings - Fork 333
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: Add a design doc for Language Support for notebook cells #4309
Closed
tanishiking
wants to merge
1
commit into
scalameta:main
from
tanishiking:design-doc-language-support-notebook
Closed
Changes from all commits
Commits
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,303 @@ | ||
Language support for VSCode notebook | ||
|
||
# Overview | ||
|
||
We can run Scala programs on Jupyter Notebooks using a Jupyter Kernel for Scala called [Almond](https://almond.sh/). | ||
|
||
Currently, when using notebook cells in VSCode, Metals cannot provide language support such as code completion (by Metals) and code navigations. | ||
|
||
This design document describes how Metals can support many language features for the notebook environment in VSCode | ||
|
||
# Scope | ||
|
||
The scope of this document is limited to **notebook in VSCode**. It doesn't cover LSP support for jupyter notebook in general (like integration with [jupyter-lsp](https://github.com/jupyter-lsp/jupyterlab-lsp)). | ||
|
||
However, supporting the notebooks in VSCode would be a good first step for integrations with other notebook environments. | ||
|
||
## Goal | ||
|
||
- When users running Almond Kernel in VSCode, users can enjoy language features that Metals provides. | ||
- All the `textDocument/*` requests should be properly handled. | ||
- (Maybe we can omit some features for notebook (like `textDocument/foldingRange` may be useless in notebook) though). | ||
|
||
## Non-Goal | ||
|
||
- Provide language support for other notebook environments other than VSCode notebook (such as `jupyter-lsp`). | ||
- Automatically set up Almond Kernel based on the project setting | ||
- Install Almond Kernel when users open a `*.ipynb` file and no Scala Kernel is available. | ||
- Enable Almond to use project-defined components from notebooks. | ||
- If we have `A.scala` adjacent to `B.ipynb`, `B.ipynb` should be able to `import` something from `A.scala`. | ||
|
||
# Context | ||
|
||
## LSP 3.17 NotebookCellTextDocumentFilter | ||
|
||
Starting from LSP 3.17, LSP clients can recognize notebook cells as documents and request `textDocument/*` to LSP servers. | ||
For example, the following change unlocks the `textDocument/*` requests for jupyter-notebook (of Scala cells) in VSCode. | ||
|
||
```diff | ||
diff --git a/src/extension.ts b/src/extension.ts | ||
index 13b49f6f..32a99c36 100644 | ||
--- a/src/extension.ts | ||
+++ b/src/extension.ts | ||
@@ -291,6 +291,10 @@ function launchMetals( | ||
{ scheme: "file", language: "java" }, | ||
{ scheme: "jar", language: "scala" }, | ||
{ scheme: "jar", language: "java" }, | ||
+ { | ||
+ notebook: { scheme: "file", notebookType: "jupyter-notebook" }, | ||
+ language: "scala", | ||
+ }, | ||
], | ||
synchronize: { | ||
configurationSection: "metals", | ||
``` | ||
|
||
see: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#notebookCellTextDocumentFilter | ||
|
||
Now, notebook on VSCode starts sending the `textDocument/*` request to LSP server. | ||
|
||
```json | ||
[Trace - 01:09:28 PM] Received request 'textDocument/completion - (140)' | ||
Params: { | ||
"context": { | ||
"triggerKind": 2, | ||
"triggerCharacter": "." | ||
}, | ||
"textDocument": { | ||
"uri": "vscode-notebook-cell:/Users/tanishiking/src/github.com/tanishiking/scala3-playground/src/main/scala/Untitled-1.ipynb#W2sZmlsZQ%3D%3D" | ||
}, | ||
"position": { | ||
"line": 0, | ||
"character": 2 | ||
} | ||
} | ||
``` | ||
|
||
## LSP 3.17 Notebook Document syncronization | ||
|
||
In addition to `NotebookCellTextDocumentFilter`, LSP 3.17 provides a feature called Notebook Document syncronization. | ||
|
||
see: https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#notebookDocument_synchronization | ||
|
||
By setting server capabilities in `scalameta/metals` | ||
|
||
```diff | ||
diff --git a/metals/src/main/scala/scala/meta/internal/metals/MetalsLanguageServer.scala b/metals/src/main/scala/scala/meta/internal/metals/MetalsLanguageServer.scala | ||
index 00aa1d791f..35b0ea2dc9 100644 | ||
--- a/metals/src/main/scala/scala/meta/internal/metals/MetalsLanguageServer.scala | ||
+++ b/metals/src/main/scala/scala/meta/internal/metals/MetalsLanguageServer.scala | ||
@@ -860,6 +860,16 @@ class MetalsLanguageServer( | ||
ServerCommands.all.map(_.id).asJava | ||
) | ||
) | ||
+ val selector = new NotebookSelector() | ||
+ selector.setNotebook( | ||
+ JEither.forLeft[String, NotebookDocumentFilter]("*") | ||
+ ) | ||
+ selector.setCells(List(new NotebookSelectorCell("scala")).asJava) | ||
+ capabilities.setNotebookDocumentSync( | ||
+ new NotebookDocumentSyncRegistrationOptions( | ||
+ List(selector).asJava | ||
+ ) | ||
+ ) | ||
capabilities.setFoldingRangeProvider(true) | ||
capabilities.setSelectionRangeProvider(true) | ||
capabilities.setCodeLensProvider(new CodeLensOptions(false)) | ||
``` | ||
|
||
Notebook LSP client start sending several notifications such as `notebookDocument/didOpen`, and `notebookDocument/didChange`. | ||
LSP 3.17 expects LSP servers to syncronize the notebook contents by these notifications, and provide language features based on the synchronized content. | ||
|
||
# Problems | ||
|
||
Though LSP 3.17 provides some specifications around notebook support, it doesn't work out of the box with Metals for mainly two reasons. | ||
|
||
## Metals doesn't understand the Scala program in a cell | ||
|
||
As you found, LSP client (VSCode) sends `textDocument/*` request with | ||
|
||
- `uri` points to the notebook cell. | ||
- `position` represents the position in the cell. | ||
|
||
``` | ||
"textDocument": { | ||
"uri": "vscode-notebook-cell:/Users/tanishiking/src/github.com/tanishiking/scala3-playground/src/main/scala/Untitled-1.ipynb#W2sZmlsZQ%3D%3D" | ||
}, | ||
"position": { | ||
"line": 0, | ||
"character": 2 | ||
} | ||
``` | ||
|
||
However, Metals can't understand each cell of notebook. For example, in the following setting, Metals can't handle cell2 and cell3 (in Scala2, Metals can't cell1 too). Because each cell doesn't compile. | ||
|
||
```scala | ||
// cell1 | ||
def add(a: Int, b: Int): Int = a + b | ||
|
||
// cell2 | ||
add(1, 2) | ||
|
||
// cell3 | ||
import $ivy.`io.circe::circe-generic:0.11.1` | ||
``` | ||
|
||
**In order to analyze Scala code and provide compiler-based language features, Metals have to convert cells into compilable Scala program.** | ||
|
||
## Metals server cannot recognise `*.ipynb` file as a compile target | ||
|
||
Metals server provide language features to the files that are recognized as a `buildTarget/sources` of [Build Server Protocol](https://build-server-protocol.github.io/docs/specification.html#build-target-sources-request). | ||
|
||
However, current Metals doesn't recognize `*.ipynb` (or combined Scala source, that we'll mention later) for Build Servers. | ||
|
||
**Metals have to run build server that recognizes `*.ipynb` files as a `SourceItem`**. Otherwise, Metals cannot tell which Scala version of presentation compiler should run against the notebook (Kernel). | ||
|
||
# Solution / Technical Architecture | ||
|
||
In order to provide language support for notebooks, two things are needed. | ||
|
||
- Convert cells to compilable Scala sources. | ||
- Also, Metals have to convert the incoming request parameter (for cells) into the parameter for the (converted) compilable Scala source (back and forth). | ||
- Converted sources have to be on the filesystem, because Metals expect sources to be on the filesystem. | ||
- Build Server that recognizes `*.ipynb` as a build target. | ||
- Where `ScalaBuildTarget.scalaVersion` should match with the Scala version of the running Kernel. | ||
- When users open a notebook with Scala Kernel, Metals should prompt running the Build Server for the notebook, like Metals does for Ammonite script. | ||
|
||
## Components | ||
|
||
- `Almond BSP` | ||
- Run on the same process with `Almond Kernel`, but the I/O will be via unix domain socket at a pre-defined location (somewhere in `~/.local/share/almond/bsp/{kernel-name}.sock`). See: https://github.com/scalameta/metals/pull/4253#issuecomment-1222091839 | ||
- `buildTarget/compile` will convert cells into compilable Scala sources (as Almond Kernel / Ammonite does) and **write the generated Scala sources onto the file system** (as Ammonite Build Server does). Also, `buildTarget/compile` should enable SemanticDB plugin. | ||
- Answer `workspace/buildTargets`. One notebook - one buildTarget. | ||
- Scala version should be match with Kernel's one. | ||
- Accept `buildTarget/scalacOptions` and `ScalacOptions.classDirectory` should point to the directory where Metals can find the Almond-generated Scala sources. | ||
- (optional) instead, we can support `workspace/wrappedSources` as [ScalaCli does](https://github.com/VirtusLab/scala-cli/blob/6efbefb1d864c0ee36156f9ac8489d0e14ee54c4/modules/scala-cli-bsp/src/main/java/scala/build/bsp/ScalaScriptBuildServer.java#L9-L10) | ||
- `VSCode` | ||
- Notebook LSP client, send `textDocument/*` requests to `Metals` and receive responses. | ||
- Notify `notebookDocument/didOpen|didChange` to Metals server. | ||
- `Metals` | ||
- When Metals receive `notebookDocument/didOpen`, connect to Almond BSP and register the result of `workspace/buildTargets`. | ||
- Receive `textDocument/*` request and convert the params to the generated Scala source. And then convert the result back to the cell positions. | ||
|
||
## System flow | ||
|
||
### When VSCode start running Almond Kernel on VSCode | ||
|
||
```mermaid | ||
sequenceDiagram | ||
participant AlmondKernel | ||
actor VSCode | ||
participant Metals | ||
participant AlmondBSP | ||
VSCode->>AlmondKernel: (run and) connect to the kernel | ||
VSCode->>Metals: notebookDocument/didOpen | ||
Metals->>AlmondBSP: connect to AlmondBSP | ||
Metals->>AlmondBSP: workspace/buildTargets | ||
AlmondBSP->>Metals: buildTarget for the notebook | ||
``` | ||
|
||
- 1. When Almond Kernel started, Almond Kernel process will create a sock file (for Almond BSP) in a pre-defined position. | ||
- 2. When VSCode open a notebook, Metals will receive `notebookDocument/didOpen`, then Metals will try to connect to the Almond BSP via the sock file created at (1) | ||
- Concern: Metals may have to wait until the sock file is created (?) | ||
- 3. Then Metals asks `workspace/buildTargets` to Almond BSP, and Almmond BSP answers. | ||
- 4. Register the mapping URI of notebook cell -> generated Scala sources to [MappedSource](https://github.com/scalameta/metals/blob/626b2a65199ea39ffca68f1e6330b67b9142a1fe/metals/src/main/scala/scala/meta/internal/metals/TargetData.scala#L303-L308) | ||
- see: https://github.com/scalameta/metals/blob/626b2a65199ea39ffca68f1e6330b67b9142a1fe/metals/src/main/scala/scala/meta/internal/metals/ammonite/Ammonite.scala#L108-L129 | ||
- It would be even trickier than Almond one. | ||
|
||
### When editing notebook | ||
|
||
```mermaid | ||
sequenceDiagram | ||
actor VSCode | ||
participant Metals | ||
participant AlmondBSP | ||
VSCode->>Metals: notebookDocument/didChange | ||
Metals->>AlmondBSP: buildTarget/compile | ||
``` | ||
|
||
- 1. Every time users edit notebook, Metals receive `notebookDocument/didChange`. | ||
- 2. Then Metals will run `buildTarget/compile` to Almond BSP for generating the compilable Scala source files and generate SemanticDBs. | ||
- I guess it would be required because users usually keep coding on Notebook without saving. | ||
- Maybe we want to debounce the `buildTarget/compile` | ||
|
||
## When VSCode request `textDocument/*` | ||
|
||
- We can convert notebook cell's URI `<->` generated Scala source for the cell, using `MappedSource`. | ||
- Conversion between position in the cell `<->` position in the generated Scala source can be done as the same way with Almond script. | ||
|
||
# Alternative Solution | ||
|
||
## Can we just combine cells into Ammonite script, instead of communicating with Almond BSP? | ||
|
||
While Almond can generate accurate Scala source program, it takes many steps for providing notebook support. | ||
|
||
Instead of using Almond, Metals can combine cells into `*.sc` file. | ||
|
||
For example, given the following notebook cells, | ||
|
||
```scala | ||
// cell1 | ||
import $ivy.`io.circe::circe-generic:0.11.1` | ||
|
||
// cell2 | ||
def add(a: Int, b: Int): Int = a + b | ||
|
||
// cell3 | ||
add(1, 2) | ||
``` | ||
|
||
Metals can combine those cells into `combined.sc`. The combine source will be a valid Ammmonite script file. | ||
Now Metals can re-use the Ammonite BSP for the language support. | ||
|
||
```scala | ||
import $ivy.`io.circe::circe-generic:0.11.1` | ||
def add(a: Int, b: Int): Int = a + b | ||
add(1, 2) | ||
``` | ||
|
||
- pros | ||
- We don't need to implement BSP for Almond, and re-use the Ammonite BSP. | ||
- We will have `one notebook <-> one generated scala source`, that might be easier to handle (while Almond BSP will `one notebook cell <-> one generated scala source`). | ||
- cons | ||
- It doesn't work if the program contains Almond-pre-imported things or [User APIs](https://almond.sh/docs/api). | ||
- We have to run the same Scala version for Ammonite BSP with Almond Kernel, but it complicates the [Ammonite BSP logic](https://github.com/scalameta/metals/blob/fa03b00ad31307763979a00c4ccf794a2054c3fe/metals/src/main/scala/scala/meta/internal/metals/ammonite/Ammonite.scala#L184-L191) a bit. | ||
- It adds up one more layer of conversion: `cell <-> combined sc file <-> Ammonite generated scala file`. | ||
|
||
## Run Almond BSP only when it's from VSCode | ||
|
||
Instead of running Almond BSP all the time, maybe we can run Almond BSP only when it's from VSCode? | ||
I don't think we can, because: | ||
|
||
- Looks like Kernel don't have information who is the clients, AFAIK. | ||
- Kernel may accept connection from multiple clients. Kernel started from VSCode doesn't mean it works only with Metals. | ||
|
||
## Almond BSP shouldn't accept `buildTarget/compile`? | ||
|
||
Instead of accepting `buildTarget/compile` and generate Scala sources on notebook changes, we can let Kernel compile Scala sources (enabling SemanticDB and writing out the generated Scala sources to the filesystem). | ||
|
||
It works only if Almond Kernel can detect it's running for Metals, but it looks like impossible as mentioned in the above section. | ||
|
||
# Milestones | ||
|
||
- [ ] Implement bare-minimum Almond BSP | ||
- Create a sock file in a pre-defined location. | ||
- Accept `build/initialize` | ||
- `workspace/buildTargets` should returns the appropriate Scala version | ||
Comment on lines
+283
to
+286
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I would also include a request to execute/interpret a cell. |
||
- [ ] Metals can connect to Almond BSP | ||
- When Metals open `*.ipynb` with `notebookDocument/didOpen`, connect to Almond Build Server, and handshake. | ||
- [ ] Almond BSP can compile notebook cells | ||
- Almond should accept `buildTarget/compile` | ||
- Write out the generated Scala sources to a pre-defined location (based on base directory). | ||
- Accept `buildTarget/scalacOptions` and `ScalacOptions.classDirectory` should point to the directory where Metals can find the Almond-generated Scala sources. | ||
- [ ] Metals invoke `buildTarget/compile` for every notebook changes | ||
- (optional) Metals should debounce the incoming requests | ||
- [ ] Metals should register MappedSource between `notebook cell URI <-> generated Scala source (found from ScalacOptions.classDirectory)`. | ||
- [ ] Metals can handle `textDocument/*` requests using presentation compiler. | ||
- [ ] Metals can handle `textDocument/*` requests using SemanticDB | ||
- Almond should generate SemanticDB on `buildTarget/compile`. | ||
|
||
# Concerns | ||
|
||
- Almond will create a sock file for Almond BSP even if users won't use Metals. | ||
- Metals may receive `notebookDocument/didOpen` before Almond BSP starts (?) |
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking a bit further about this (that I proposed in #4253 (comment)), I'm very slightly worried about how Metals will be able to get the PID of the Almond process corresponding to a given notebook, or more generally how Metals can find a way to connect to the right Almond kernel.
I skimmed through the notebook section of LSP 3.17 (that you linked above), and couldn't find any way of letting servers know about the kernel processes that the LSP client starts.
To work around that, maybe we could have hashed the notebook path, and used that hash instead of the kernel PID. But the kernel doesn't know about the path of the notebook it is asked to run, it only knows about individual code cells passed to it via Jupyter message protocol messages. So hashing the notebook path isn't an option either.
Another workaround could consist in having Almond leave its PID in the metadata of some display data or execute_result (AFAIK, these are persisted in the notebook). That might be a bit flaky though, as for the PID to be written in the notebook file, this would require at least a cell to have run. Not to add too much overhead, we might want to return those metadata only for the first cell say, but then the metadata would go away if users delete that cell… (and the kernel doesn't know about cell deletion…) So it might need to be put in all execute_result metadata.
Also, those metadata, once written on disk, might be stale (they may correspond to the kernel that previously ran the notebook). (Maybe writing a date alongside it could help - LSP servers would just pick the PID with the latest date.)
Don't know if asking people involved in LSP to update the protocol, so that LSP servers could know about Jupyter kernels started by LSP clients, could be an option here…
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think it's difficult (or even impossible) to find a Kernel PID from Metals. That's why I'm thinking maybe we can use
kernel name
or something instead of PID (more details later)I think it works 👍
Not really, we should be able to get kernel's metadata like on
notebookDocument/didOpen
, and we'll have access to that information without running cellsOn the other hand, what do you think about using the Kernel spec name instead of embedding PID information to the name or display name?
One concern is: is the kernel name unique?; can we say there won't be multiple same kernels running at the same time?
I think LSP's responsibility is just to provide some interfaces between the editor(notebook) and language servers and how to connect with Kernel is out of the scope of LSP, I guess.
(kinda off-topic: the notebook LSP spec seems like the notebook content can be evaluated purely with its content (by naively concatenating cells into a script), and assuming there's no need to connect with Kernel).