Discussion: Should we have a front-end/back-end split? #5121

mansellan · 2019-08-31T00:15:03Z

Justification
Rubberduck is currently monolithic - everything runs in-process in the host. This raises performance, stability and testability concerns.

Details
Currently, like every other VBE add-in, Rubberduck runs entirely within the process space of the host VBE. Whilst this is convenient for accessing host resources through the VBIDE interface, it does have several drawbacks:

We are constrained by the resources available to the host process, primarily RAM. Large projects necessarily mean large parse trees to be held in memory, which can lead to OOM exceptions even on 64-bit systems with plenty of physical memory available. See Holy RAM, batman! (Out of memory errors and excessively high memory usage) #3347.
If we crash, we can take down the host or other add-ins. Even if we avoid this, the host can mark us as unreliable and disable us from loading without explicit user intervention at restart. This is not a good look.
There is no natural 'seam' between the COM-focused aspects of host interaction at the front-end and the language-focused aspects of lexing, parsing, resolving and anaylsing at the back-end. Facilitating testing of these concerns relies on careful design of abstractions to ensure proper decoupling.

Proposal
The front-end aspects of Rubberduck (interacting with the host through COM interop) should be strictly isolated from the back-end workloads. This is best acheived by running the front-end as a COM add-in (as currently), but commicating with a back-end running in a separate process on the same machine.

This requires inter process communication over a high-performance, low latency mechanism.

Background
Microsoft, during the design of VSCode recognised the need to separate front-end editors from back-end language analysis. Their driver was primarily to avoid the n:n relationship between editors and supported languages, however the solution they devised can be leveraged for our purposes too.

Language Server Protocol (CCA 3.0) is an open standard which describes an astraction between IDEs ("Tools" in their documentation) and language servers. It allows IDEs to support multiple languages without having to include support directly in their native codebase. Instead, a JSON-RPC request is made to an external process requesting information for source code at a given location.

For example, if a user wishes to find implementations of a virtual function:

The (LSP-aware) IDE makes a request to the language server stating "Give me implementations for source file Class1.cls at line 20 character 36.
The language server uses its knowledge of the source file, coupled with its analysis, to deduce the correct answer and reply to the IDE
The IDE displays the result to the user in an approriate manner

Typically this occurs over stdio, but the protocol itself is transport-agnostic.

Language servers can be authored in any language, regardless of the client architecture. Many IDEs and editors, including Visual Studio, VSCode, Eclipse, Sublime, Atom and even emacs now offer LSP clients either by default or as options or extensions.

Implementation
Rubberduck could split itself into a client/server system, with both running on the user's machine. The front-end would remain as a VBE COM Add-in, but would limit itself to collecting user events and displaying a UI. The back-end would be a separate process, launched by the add-in on demand, which would load source files (either transmitted from the front-end or exported by it), parse, resolve and anylase them, then cache them to respond to front-end requests.

Fortunately, we would not need to implement the LSP protocol from spec. Middleware packages exist for many languages, including a highly active project for C#: OmniSharp (MIT), which is the mainline C# provider for VSCode and part of the .Net foundation. This is made available as a Nuget package for construction of C# LSP clients and servers.

The text was updated successfully, but these errors were encountered:

rubberduck203 · 2019-08-31T01:19:22Z

On the surface, this sounds like a great idea. Using half a gig of memory isn’t necessarily a problem, if we’re not using the host application’s allocation to do it. I also very much like the idea of isolating (at least some) of the crashes in the LS process, where it wouldn’t drag down the host app with it. Although, from what I see through issues (and someone please correct me if I’m wrong), it seems most crashes are happening around COM interop. So, I don’t actually know if we get a ton of benefit on crash protection. I’m hoping someone who’s actually committed in the last few years can shed some light on that.

Personally, I’d love to see a proof of concept and would be even more excited if the language server could run Core, because then I would actually be able to contribute code again.

If a PoC proves feasible, we would need to determine if/how things might be moved incrementally or if it’s an all or nothing endeavor.

mansellan · 2019-08-31T01:22:00Z

Yes, that's fair - most of the hard crashes are from AVs from COM. I just listed it because it is a potential benefit, and the smaller our in-process footprint, the less opportunity for problems.

The server can be any language we want. I vote Core :-)

retailcoder · 2019-08-31T01:23:34Z

I agree on the principle, but I'm worried implementing the current-tech stuff could turn into a can of worms, given there's obviously zero existing support for VB6/VBA - but definitely worth looking into whether & how it could be done.
What are the advantages of implementing Microsoft's protocol (through a middleware package that handles VB6 code), over making our own server process?

(and yes, I vote Core as well!)

rubberduck203 · 2019-08-31T01:24:41Z

What operating systems do we support these days? Still supporting 7? Does Core run on 7?

mansellan · 2019-08-31T01:25:40Z

Core 3 will run on 7. 3.1 (LTS), probably will too. Beyond that probably not.

rubberduck203 · 2019-08-31T01:26:19Z

One advantage would be folks who like to use other editors being able to write their own front ends. I recall a few people writing their VBA in VSCode and copy/pasting it back to the VBE.

mansellan · 2019-08-31T01:27:59Z

Benefits of running LSP: it's an existing, open protocol in this problem space. We don't have to write it from scratch. Side benefits: syntax highligting for VBA/VB6 in Visual Studio, VSCode, Eclipse... NOT suggesting we officially support those...

mansellan · 2019-08-31T01:29:31Z

Actually, not just syntax highlighting... Any capabilities which are supported by our LS and client X - navigation, diagnostics, code-completion...

bclothier · 2019-08-31T01:29:34Z

Right now we support Vista at minimum but with mocking PR, that will be dropped because of bump to .NET FX 4.6.1.

I can confirm that the large majority of the crashes are due to COM interactions because those are where we play with unmanaged memory and pointers.

That said, I do see 2 issues:

VBIDE itself is single-threaded, so the COM add-in (rather, its UI thread) should be single-threaded. However, I really don't want to dispatch events from LSP to the UI thread. Can we dispatch to the threads behind the WPF controls independently? That would be a big gain, I think.
Is the transport protocol going to be fast? I keep hearing about how massively slow ICP is compared to in-proc calls. We must minimize the blocking on the UI thread as much as possible, and if there are too much dispatches that must be made to the UI thread, that might end up being more fragile than keeping it in-process.

mansellan · 2019-08-31T01:32:32Z

Yes, we still own the front end.
It runs VSCode, so.... yes?

bclothier · 2019-08-31T01:33:58Z

I think we need a PoC. Maybe pick up the SlimDucky, set up a toolwindows with a WPF control, then demonstrate that it doesn't crash, etc.

mansellan · 2019-08-31T01:36:47Z

Agreed, this need proving. Just a proposal at this point, looking for showstoppers...

retailcoder · 2019-08-31T01:36:51Z

Stupid (?) thought: does enabling VB6/VBA in VSC amount to suiciding Rubberduck as a VBIDE add-in?

rubberduck203 · 2019-08-31T01:39:36Z

Is it possible to new Thread() from within the add-in? It’s a little hairier than more modern async approaches, but I’m willing to bet we could run a message pump that talks to the server if we can.

Will the process be fast? Well, it won’t be as fast as an in process call, because it is a network call and the requests/responses must be serialized/deserialized, but it won’t be as slow as going to an external network.

mansellan · 2019-08-31T01:40:15Z

Well.... maybe, years from now. We want to bring VBC[1] into the 21st century IIUC? Not necessrarily in the current VBE? Imagine swapping out the registry the launches the old VBE with a redirect to VSCode, running Rubberduck as its LSP and possibly with VBA-specific extensions.

Are we sentimental about the classic VBE, or do we care more about VBA/VB6 developers?

[1] Visual Basic "Classic"

rubberduck203 · 2019-08-31T01:41:47Z

I don’t think so @retailcoder, the compiler still lives inside the VBE. Even if someone wrote a front end for Code, you’ve got to move the code in and out of the VBE to make that workflow work. That’s a PITA.

bclothier · 2019-08-31T01:41:55Z

Stupid (?) thought: does enabling VB6/VBA in VSC amount to suiciding Rubberduck as a VBIDE add-in?

Not necessarily. We want to give more choices, even if it means using VSC and VSC still can't inspect VBA code anyway. Furthermore, there will be scenarios where installing VSC or whatever isn't feasible/practical and we shouldn't tell the users, "Sucks to be you! Mwahaha!" No, we want to give them a better IDE right there even if they have to use VBIDE.

bclothier · 2019-08-31T01:43:34Z

Is it possible to new Thread() from within the add-in? It’s a little hairier than more modern async approaches, but I’m willing to bet we could run a message pump that talks to the server if we can.

I think we are already using TPL for the backend work (parsing). We are also already hijacking the message pump. It's actually the WPF stuff that I'm not so sure about WRT whether we can dispatch messages to WPF controls without using the main UI thread.

rubberduck203 · 2019-08-31T01:45:06Z

I think we need a PoC. Maybe pick up the SlimDucky, set up a toolwindows with a WPF control, then demonstrate that it doesn't crash, etc.

This. A small PoC is in order. Just enough to prove the concept isn’t a dead end from the get go.

rubberduck203 · 2019-08-31T01:47:24Z

We are also already hijacking the message pump.

I meant create our own message pump that sends messages to and receives notifications from the server.

It's actually the WPF stuff that I'm not so sure about WRT whether we can dispatch messages to WPF controls without using the main UI thread.

That’s standard WPF stuff. You have to be on the UI thread (or dispatch to it) in order to notify the UI in a full blown desktop app too.

PeterMTaylor · 2019-08-31T01:48:34Z

As a service we offer? I’m under the impression that the Rubberduck name is both a process for developers by talking to self in a peer review way and not sure if this behaviour could be repeated elsewhere is in VSC.

PeterMTaylor · 2019-08-31T01:50:32Z

“I meant create our own message pump that sends messages to and receives notifications from the server.” does that mean what I thought Thunderframe mentioned is making our own duckspeak?

bclothier · 2019-08-31T01:50:53Z

@rubberduck203 - I don't follow why do we want our own message pump?

The reason I was wondering about the dispatching to WPF controls directly is to minimize the events that the UI thread must process. My worry (perhaps illegitimately) is that with out-process, there will be much more events queued , and because VBIDE is single-threaded, that might end up feeling quite.... slow.

If everything ends up going to the UI thread, I'm not sure having our own message pump will help since it'd still have to forward to the thread?

rubberduck203 · 2019-08-31T01:55:03Z

We would still be processing the same number of those events today, I believe.

mansellan · 2019-08-31T01:56:19Z

Not sure why we're talking about message pumps... We'd still do everything we do today, just in two separate places with an IPC in between...

rubberduck203 · 2019-08-31T01:59:15Z

I shouldn’t have used the words “message pump”. I momentarily forgot that has a particular meaning on Windows. I was speaking of the more general concept. Just a background thread that listens for notifications from the server and notifies the proper client listeners.

bclothier · 2019-08-31T02:01:24Z

Gotcha. My concern was more that the amount of events may be as much or will increase once we go out of process, so in the PoC, we must prove that just because we freed our backend from the constraints doesn't mean it can go ahead and blow out the front-end's UI thread. It'll have to be reasonably smart about dispatching to UI thread without making it feel unresponsive.

MDoerner · 2019-08-31T08:59:01Z

Just one small thought: we will have to push all type lib information from the front-end to the language server as well, as this information is only accessible from the UI thread of the VBE.

Vogel612 · 2019-08-31T13:30:36Z

Something that we will also need to keep in mind is how the installer may need to be adapted for this to work properly. I also want to note is that our DI registration and setup would need to be split then. It's about time we deal with the large blob of code that the setup is right now.

This dichotomy also plays into the Rubberduck.API sunsetting idea. If we have a language server for RD's parsing needs, the API could just use that very same language server, making maintenance a lot easier.

An additional huge thing that we need to figure out would be settings and settings invalidation. As of now, settings invalidation just raises an event and all consumers just refresh their local cache and possibly perform invalidation calculations.
If we split into Frontend and Backend, we need to deal with synchronizing settings across two processes.

bclothier · 2019-08-31T14:24:11Z

Just one small thought: we will have to push all type lib information from the front-end to the language server as well, as this information is only accessible from the UI thread of the VBE.

Agreed for the user projects. The referenced libraries, however, don't require the type lib extensions and can be loaded with standard TypeLib APIs. I am not sure if processing the referenced type libs on the backend is a good idea because when processing the user projects' type libs, they may need to be able to reference the typelib/typeinfo from referenced typelib which implies some mechanism to share the already discovered types.

Fortunately, we already have Com*** types, which can be used to push data to the backend, minimizing the access needed on the UI thread.

retailcoder · 2019-08-31T14:26:09Z

Couldn't the client just send the server a GUID, and then the referenced library could be loaded & dissected server-side?

not sure if processing the referenced type libs on the backend is a good idea

Wouldn't the thousands of declarations need to live on the back-end in order to reduce in-process memory pressure?

bclothier · 2019-08-31T15:00:16Z

I'm thinking of the resolving references. Say I have a method Public Function Foo() As Excel.Range --- in the type lib for the project, it will have a type description but to resolve it, we need to know about the Excel typelib. Not sure if that is something that must be resolved in the front-end or simply can be deferred to the backend processing. That's one more thing to find out.

zspitz · 2019-08-31T19:32:40Z

@bclothier

even if it means using VSC and VSC still can't inspect VBA code anyway.

Doesn't the LSP support diagnostics? https://microsoft.github.io/language-server-protocol/specification#textDocument_publishDiagnostics

zspitz · 2019-08-31T19:37:39Z

What about the UserForms designer?

Does Rubberduck currently do any work in parsing MS Forms?
I imagine an LSP implementation would also have to provide a designer for MS Forms.

zspitz · 2019-08-31T19:50:41Z

@bclothier

Agreed for the user projects.

This includes also projects in other document files (e.g. other Excel .xlsm from Excel, other Word .docx from within Word etc.) added via the Add References dialog?

bclothier · 2019-08-31T20:01:13Z

@zspitz

In order for other front ends to know as much as Rubberduck does, it must have access to the binary streams embedded within the files housing the VBA projects. ATM, we don’t really parse the FRX equivalent; we know about all the source code and the typelib representation (which is only available in-memory). Thus that would be entirely another scope of work to extract the extra metadata that you would not get the source code alone.

There are malware analysis tools out there that can read a subset of possible VBA projects but not for all possible VBA hosts. I don’t know if they bother with extracting the binary contents that aren’t source code. Still, one must write the implementation and conform to LSP. So that’s for way later stage of works, IMO.

For diagnostics, the problem is more that our inspections and other subsystems don’t conform to the diagnostic specifications. Thus that would be another major engineering work.

The proposal here should first deal with demonstrating that we can successfully split into client/server to reduce the resource pressure on the host.

bclothier · 2019-08-31T20:03:45Z

@zspitz

This includes also projects in other document files (e.g. other Excel .xlsm from Excel, other Word .docx from within Word etc.) added via the Add References dialog?

Yes. In those projects we make use of extended TypeLib API to get the missing metadata not directly available in the source code alone.

mansellan · 2019-09-02T22:46:25Z

Been reviewing this thread and the LSP spec, and have some more thoughts:

Messages between the client and server should definitely use some kind of asynchronous queuing mechanism, possibly with a dwell time to limit message rates. ~~I believe the protocol allows for certain message types to be dropped if latency gets too severe.~~

In a LS, source is typically loaded from disk by the server, upon receipt of a relevant message from the client (document opened, saved, closed...). After that, it keeps in sync with changes made in the client by document changed messages. I don't see any reason in principal why this couldn't also apply to referenced libraries (obviously change events wouldn't apply).

Settings and settings invalidation appear to be specifically catered for in the protocol.

In an upcoming version of LSP, there's work on a serialization format for caching parse trees. This may or may not be useful to us.

A UserForms designer would only be needed if we wanted to fully support VBA/VB6 development outside of the VBE, and would need to be client-specific. I'd suggest that's out-of-scope for RD (although it'd be amazing if someone else added it to VSC to complete the circle).

LSP supports custom messaging extensions if we need them, although that then means that both client and server must understand them. As we'd only be officially supporting the RD add-in to the VBE as a client, this need not be a problem for us (although we should try to minimise it to maximise potential for other clients).

EDIT: all messages must be responded to.

bclothier · 2019-09-02T23:34:16Z

Regarding the change events, considerations needs to be given to the issue encountered in one very early version of Rubberduck (I think it was covered in this issue) where we tried to capture all keypresses and clicks but that unfortunately caused the VBIDE to slow down severely. As alluded, that was alleviated by using message pump. I'm not sure whether once we start reacting to LSP events, that will overflow?

mansellan · 2019-09-03T00:00:17Z

Ah yes, good point. I was originally thinking dwell time, like 2s since last keyboard input. But even that may not be performant enough.

The protocol does allow the client to issue aribtrary commands though, so iiuc we could just request a reparse on the same basis as now.

mansellan · 2019-09-04T10:39:42Z

Been working through this walkthrough on creating language servers. It's a Typescript example for VSC, so not directly applicable, but it's pretty easy to follow along - got to the point where I can hit breakpoints at the server-side.

bclothier · 2019-10-03T21:44:36Z

Because the consensus seems to be to build a PoC, we can track the progress via #5176 and close this issue.

mansellan added the discussion label Aug 31, 2019

Vogel612 pinned this issue Aug 31, 2019

mansellan mentioned this issue Sep 13, 2019

Holy RAM, batman! (Out of memory errors and excessively high memory usage) #3347

Open

mansellan mentioned this issue Oct 3, 2019

Create a language server proof-of-concept #5176

Closed

bclothier closed this as completed Oct 3, 2019

Vogel612 unpinned this issue Oct 7, 2019

Discussion: Should we have a front-end/back-end split? #5121

Discussion: Should we have a front-end/back-end split? #5121

Comments

mansellan commented Aug 31, 2019 • edited Loading

rubberduck203 commented Aug 31, 2019 • edited Loading

mansellan commented Aug 31, 2019

retailcoder commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

mansellan commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

mansellan commented Aug 31, 2019

mansellan commented Aug 31, 2019

bclothier commented Aug 31, 2019

mansellan commented Aug 31, 2019

bclothier commented Aug 31, 2019

mansellan commented Aug 31, 2019

retailcoder commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

mansellan commented Aug 31, 2019 • edited Loading

rubberduck203 commented Aug 31, 2019

bclothier commented Aug 31, 2019 • edited Loading

bclothier commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

PeterMTaylor commented Aug 31, 2019

PeterMTaylor commented Aug 31, 2019

bclothier commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

mansellan commented Aug 31, 2019

rubberduck203 commented Aug 31, 2019

bclothier commented Aug 31, 2019

MDoerner commented Aug 31, 2019

Vogel612 commented Aug 31, 2019

bclothier commented Aug 31, 2019 • edited Loading

retailcoder commented Aug 31, 2019 • edited Loading

bclothier commented Aug 31, 2019

zspitz commented Aug 31, 2019

zspitz commented Aug 31, 2019

zspitz commented Aug 31, 2019

bclothier commented Aug 31, 2019

bclothier commented Aug 31, 2019

mansellan commented Sep 2, 2019 • edited Loading

bclothier commented Sep 2, 2019

mansellan commented Sep 3, 2019

mansellan commented Sep 4, 2019

bclothier commented Oct 3, 2019

mansellan commented Aug 31, 2019 •

edited

Loading

rubberduck203 commented Aug 31, 2019 •

edited

Loading

mansellan commented Aug 31, 2019 •

edited

Loading

bclothier commented Aug 31, 2019 •

edited

Loading

bclothier commented Aug 31, 2019 •

edited

Loading

retailcoder commented Aug 31, 2019 •

edited

Loading

mansellan commented Sep 2, 2019 •

edited

Loading