-
Notifications
You must be signed in to change notification settings - Fork 301
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Discussion: Should we have a front-end/back-end split? #5121
Comments
On the surface, this sounds like a great idea. Using half a gig of memory isn’t necessarily a problem, if we’re not using the host application’s allocation to do it. I also very much like the idea of isolating (at least some) of the crashes in the LS process, where it wouldn’t drag down the host app with it. Although, from what I see through issues (and someone please correct me if I’m wrong), it seems most crashes are happening around COM interop. So, I don’t actually know if we get a ton of benefit on crash protection. I’m hoping someone who’s actually committed in the last few years can shed some light on that. Personally, I’d love to see a proof of concept and would be even more excited if the language server could run Core, because then I would actually be able to contribute code again. If a PoC proves feasible, we would need to determine if/how things might be moved incrementally or if it’s an all or nothing endeavor. |
Yes, that's fair - most of the hard crashes are from AVs from COM. I just listed it because it is a potential benefit, and the smaller our in-process footprint, the less opportunity for problems. The server can be any language we want. I vote Core :-) |
I agree on the principle, but I'm worried implementing the current-tech stuff could turn into a can of worms, given there's obviously zero existing support for VB6/VBA - but definitely worth looking into whether & how it could be done. (and yes, I vote Core as well!) |
What operating systems do we support these days? Still supporting 7? Does Core run on 7? |
Core 3 will run on 7. 3.1 (LTS), probably will too. Beyond that probably not. |
One advantage would be folks who like to use other editors being able to write their own front ends. I recall a few people writing their VBA in VSCode and copy/pasting it back to the VBE. |
Benefits of running LSP: it's an existing, open protocol in this problem space. We don't have to write it from scratch. Side benefits: syntax highligting for VBA/VB6 in Visual Studio, VSCode, Eclipse... NOT suggesting we officially support those... |
Actually, not just syntax highlighting... Any capabilities which are supported by our LS and client X - navigation, diagnostics, code-completion... |
Right now we support Vista at minimum but with mocking PR, that will be dropped because of bump to .NET FX 4.6.1. I can confirm that the large majority of the crashes are due to COM interactions because those are where we play with unmanaged memory and pointers. That said, I do see 2 issues:
|
|
I think we need a PoC. Maybe pick up the SlimDucky, set up a toolwindows with a WPF control, then demonstrate that it doesn't crash, etc. |
Agreed, this need proving. Just a proposal at this point, looking for showstoppers... |
Stupid (?) thought: does enabling VB6/VBA in VSC amount to suiciding Rubberduck as a VBIDE add-in? |
Is it possible to Will the process be fast? Well, it won’t be as fast as an in process call, because it is a network call and the requests/responses must be serialized/deserialized, but it won’t be as slow as going to an external network. |
Well.... maybe, years from now. We want to bring VBC[1] into the 21st century IIUC? Not necessrarily in the current VBE? Imagine swapping out the registry the launches the old VBE with a redirect to VSCode, running Rubberduck as its LSP and possibly with VBA-specific extensions. Are we sentimental about the classic VBE, or do we care more about VBA/VB6 developers? [1] Visual Basic "Classic" |
I don’t think so @retailcoder, the compiler still lives inside the VBE. Even if someone wrote a front end for Code, you’ve got to move the code in and out of the VBE to make that workflow work. That’s a PITA. |
Not necessarily. We want to give more choices, even if it means using VSC and VSC still can't inspect VBA code anyway. Furthermore, there will be scenarios where installing VSC or whatever isn't feasible/practical and we shouldn't tell the users, "Sucks to be you! Mwahaha!" No, we want to give them a better IDE right there even if they have to use VBIDE. |
I think we are already using TPL for the backend work (parsing). We are also already hijacking the message pump. It's actually the WPF stuff that I'm not so sure about WRT whether we can dispatch messages to WPF controls without using the main UI thread. |
This. A small PoC is in order. Just enough to prove the concept isn’t a dead end from the get go. |
I meant create our own message pump that sends messages to and receives notifications from the server.
That’s standard WPF stuff. You have to be on the UI thread (or dispatch to it) in order to notify the UI in a full blown desktop app too. |
As a service we offer? I’m under the impression that the Rubberduck name is both a process for developers by talking to self in a peer review way and not sure if this behaviour could be repeated elsewhere is in VSC. |
“I meant create our own message pump that sends messages to and receives notifications from the server.” does that mean what I thought Thunderframe mentioned is making our own duckspeak? |
@rubberduck203 - I don't follow why do we want our own message pump? The reason I was wondering about the dispatching to WPF controls directly is to minimize the events that the UI thread must process. My worry (perhaps illegitimately) is that with out-process, there will be much more events queued , and because VBIDE is single-threaded, that might end up feeling quite.... slow. If everything ends up going to the UI thread, I'm not sure having our own message pump will help since it'd still have to forward to the thread? |
We would still be processing the same number of those events today, I believe. |
Not sure why we're talking about message pumps... We'd still do everything we do today, just in two separate places with an IPC in between... |
I shouldn’t have used the words “message pump”. I momentarily forgot that has a particular meaning on Windows. I was speaking of the more general concept. Just a background thread that listens for notifications from the server and notifies the proper client listeners. |
Gotcha. My concern was more that the amount of events may be as much or will increase once we go out of process, so in the PoC, we must prove that just because we freed our backend from the constraints doesn't mean it can go ahead and blow out the front-end's UI thread. It'll have to be reasonably smart about dispatching to UI thread without making it feel unresponsive. |
Just one small thought: we will have to push all type lib information from the front-end to the language server as well, as this information is only accessible from the UI thread of the VBE. |
Something that we will also need to keep in mind is how the installer may need to be adapted for this to work properly. I also want to note is that our DI registration and setup would need to be split then. It's about time we deal with the large blob of code that the setup is right now. This dichotomy also plays into the Rubberduck.API sunsetting idea. If we have a language server for RD's parsing needs, the API could just use that very same language server, making maintenance a lot easier. An additional huge thing that we need to figure out would be settings and settings invalidation. As of now, settings invalidation just raises an event and all consumers just refresh their local cache and possibly perform invalidation calculations. |
Agreed for the user projects. The referenced libraries, however, don't require the type lib extensions and can be loaded with standard TypeLib APIs. I am not sure if processing the referenced type libs on the backend is a good idea because when processing the user projects' type libs, they may need to be able to reference the typelib/typeinfo from referenced typelib which implies some mechanism to share the already discovered types. Fortunately, we already have |
Couldn't the client just send the server a GUID, and then the referenced library could be loaded & dissected server-side?
Wouldn't the thousands of declarations need to live on the back-end in order to reduce in-process memory pressure? |
I'm thinking of the resolving references. Say I have a method |
Doesn't the LSP support diagnostics? https://microsoft.github.io/language-server-protocol/specification#textDocument_publishDiagnostics |
What about the UserForms designer?
|
This includes also projects in other document files (e.g. other Excel .xlsm from Excel, other Word .docx from within Word etc.) added via the Add References dialog? |
In order for other front ends to know as much as Rubberduck does, it must have access to the binary streams embedded within the files housing the VBA projects. ATM, we don’t really parse the FRX equivalent; we know about all the source code and the typelib representation (which is only available in-memory). Thus that would be entirely another scope of work to extract the extra metadata that you would not get the source code alone. There are malware analysis tools out there that can read a subset of possible VBA projects but not for all possible VBA hosts. I don’t know if they bother with extracting the binary contents that aren’t source code. Still, one must write the implementation and conform to LSP. So that’s for way later stage of works, IMO. For diagnostics, the problem is more that our inspections and other subsystems don’t conform to the diagnostic specifications. Thus that would be another major engineering work. The proposal here should first deal with demonstrating that we can successfully split into client/server to reduce the resource pressure on the host. |
Yes. In those projects we make use of extended TypeLib API to get the missing metadata not directly available in the source code alone. |
Been reviewing this thread and the LSP spec, and have some more thoughts: Messages between the client and server should definitely use some kind of asynchronous queuing mechanism, possibly with a dwell time to limit message rates. In a LS, source is typically loaded from disk by the server, upon receipt of a relevant message from the client (document opened, saved, closed...). After that, it keeps in sync with changes made in the client by document changed messages. I don't see any reason in principal why this couldn't also apply to referenced libraries (obviously change events wouldn't apply). Settings and settings invalidation appear to be specifically catered for in the protocol. In an upcoming version of LSP, there's work on a serialization format for caching parse trees. This may or may not be useful to us. A UserForms designer would only be needed if we wanted to fully support VBA/VB6 development outside of the VBE, and would need to be client-specific. I'd suggest that's out-of-scope for RD (although it'd be amazing if someone else added it to VSC to complete the circle). LSP supports custom messaging extensions if we need them, although that then means that both client and server must understand them. As we'd only be officially supporting the RD add-in to the VBE as a client, this need not be a problem for us (although we should try to minimise it to maximise potential for other clients). EDIT: all messages must be responded to. |
Regarding the change events, considerations needs to be given to the issue encountered in one very early version of Rubberduck (I think it was covered in this issue) where we tried to capture all keypresses and clicks but that unfortunately caused the VBIDE to slow down severely. As alluded, that was alleviated by using message pump. I'm not sure whether once we start reacting to LSP events, that will overflow? |
Ah yes, good point. I was originally thinking dwell time, like 2s since last keyboard input. But even that may not be performant enough. The protocol does allow the client to issue aribtrary commands though, so iiuc we could just request a reparse on the same basis as now. |
Been working through this walkthrough on creating language servers. It's a Typescript example for VSC, so not directly applicable, but it's pretty easy to follow along - got to the point where I can hit breakpoints at the server-side. |
Because the consensus seems to be to build a PoC, we can track the progress via #5176 and close this issue. |
Justification
Rubberduck is currently monolithic - everything runs in-process in the host. This raises performance, stability and testability concerns.
Details
Currently, like every other VBE add-in, Rubberduck runs entirely within the process space of the host VBE. Whilst this is convenient for accessing host resources through the VBIDE interface, it does have several drawbacks:
We are constrained by the resources available to the host process, primarily RAM. Large projects necessarily mean large parse trees to be held in memory, which can lead to OOM exceptions even on 64-bit systems with plenty of physical memory available. See Holy RAM, batman! (Out of memory errors and excessively high memory usage) #3347.
If we crash, we can take down the host or other add-ins. Even if we avoid this, the host can mark us as unreliable and disable us from loading without explicit user intervention at restart. This is not a good look.
There is no natural 'seam' between the COM-focused aspects of host interaction at the front-end and the language-focused aspects of lexing, parsing, resolving and anaylsing at the back-end. Facilitating testing of these concerns relies on careful design of abstractions to ensure proper decoupling.
Proposal
The front-end aspects of Rubberduck (interacting with the host through COM interop) should be strictly isolated from the back-end workloads. This is best acheived by running the front-end as a COM add-in (as currently), but commicating with a back-end running in a separate process on the same machine.
This requires inter process communication over a high-performance, low latency mechanism.
Background
Microsoft, during the design of VSCode recognised the need to separate front-end editors from back-end language analysis. Their driver was primarily to avoid the n:n relationship between editors and supported languages, however the solution they devised can be leveraged for our purposes too.
Language Server Protocol (CCA 3.0) is an open standard which describes an astraction between IDEs ("Tools" in their documentation) and language servers. It allows IDEs to support multiple languages without having to include support directly in their native codebase. Instead, a JSON-RPC request is made to an external process requesting information for source code at a given location.
For example, if a user wishes to find implementations of a virtual function:
Class1.cls
atline 20
character 36
.Typically this occurs over stdio, but the protocol itself is transport-agnostic.
Language servers can be authored in any language, regardless of the client architecture. Many IDEs and editors, including Visual Studio, VSCode, Eclipse, Sublime, Atom and even emacs now offer LSP clients either by default or as options or extensions.
Implementation
Rubberduck could split itself into a client/server system, with both running on the user's machine. The front-end would remain as a VBE COM Add-in, but would limit itself to collecting user events and displaying a UI. The back-end would be a separate process, launched by the add-in on demand, which would load source files (either transmitted from the front-end or exported by it), parse, resolve and anylase them, then cache them to respond to front-end requests.
Fortunately, we would not need to implement the LSP protocol from spec. Middleware packages exist for many languages, including a highly active project for C#: OmniSharp (MIT), which is the mainline C# provider for VSCode and part of the .Net foundation. This is made available as a Nuget package for construction of C# LSP clients and servers.
The text was updated successfully, but these errors were encountered: