Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Java script engine switcher and highlight js #452

Merged
merged 7 commits into from
Feb 17, 2017
Merged

Java script engine switcher and highlight js #452

merged 7 commits into from
Feb 17, 2017

Conversation

phil-scott-78
Copy link
Contributor

This builds upon the PR #442. Didn't want to commit there and throw off any work that you might have done trying to get the Jint engine playing nicely incase this work is just too out there and needs to be tossed.

This additional commit adds the following packages to Core:

  • JavaScriptEngineSwitcher - an abstraction around common JS processing libraries
  • JsPool - a pool around the engine switcher. We'll be creating these a lot potentially so good pooling is a must
  • Jint - managed implementation of a javascript processor. We'll make this the default.
  • JavaScriptEngineSwitcher.Jint - allows Jint to be plugged into the JavaScriptEngineSwitcher

For a module that wants to use the javascript engine it's pretty simple. They just need to call GetJsEngineFromPool off of the context. When they are done then must call ReturnJsEngineToPool, This exposes more or less the same interface as IJsEngine from the JavaScriptEngineSwitcher so it should be easy to find help and examples for devs to work with. The implementation is just a wrapper of the default implementation plus a couple of helpers around ensuring libraries only get loaded once per engine.

Things get tricky in two spots

  1. Testing. With this being exposed off of IExecutionContext the TestContext project needed some implementation. Rather than bring in a JS engine here too I made it a Func<> that could be set to a factory for building up the JsEngine by the test. I added a Wyam.Tests.JavaScript with an implementation of an engine that could be used for testing. It's just a copy and paste from Wyam.Core but it does allow test projects to be able to avoid needing to reference Core just to get some JS processing to work
  2. Reloading the configuration. Because the configuration could potentially change the JS engine it needed the ability to wipe the singleton that JavaScriptEngineSwitcher uses. Had to add a couple static methods off of Engine and ExecutionContext to get those settings cleared out properly.

One big advantage of doing it this was is that I was able to swap out the JS engine from Jint pretty easily in my config for Vroom. All I did was add this code and wallah

using JavaScriptEngineSwitcher.Core;
using JavaScriptEngineSwitcher.Vroom;

#n JavaScriptEngineSwitcher.Vroom

JsEngineSwitcher.Instance.EngineFactories.Add(new VroomJsEngineFactory());
JsEngineSwitcher.Instance.DefaultEngineName = VroomJsEngine.EngineName;

@phil-scott-78
Copy link
Contributor Author

phil-scott-78 commented Feb 15, 2017

It's actually failing here due to the AutoCodeBlock unit test while using Jint. I'm going to mark it as skipped.

@daveaglick
Copy link
Member

Wondering a little bit about the utility of switching JS engines. Not that it isn't a good idea, I'd just like to understand some of the use cases a bit better. For example:

  • Who would typically switch out the engine? The module developer? The user running Wyam?
  • How can we be sure an expected engine is available in a particular environment? For example, what if the module swaps the engine to MSIE but then gets run on a non-Windows system?
  • Why would the engine be swapped from the default in the first place? Speed? Missing features? Compatibility?

@phil-scott-78
Copy link
Contributor Author

So my thought is that regardless of what engine that's used it would have to be abstracted out. Might as well useJavaScriptEngineSwitcher has that abstraction. Any new engine is gonna have support for that library, so no matter what engine picked today the next latest and greatest should be able to be plugged in pretty easily behind the scenes.

As to the use cases, I don't think a module developer would control it - that would be chaos with the different platform scenarios. Like if someone wrote a module that relied on ECMAScript 6 so it requests a specific engine...yeah. Might want to discourage those shenanigans.

Really I think the only time someone would be messing with that would be an end user. And the reason for doing so would definitely be perf. People swapping the engine out for compatibility sake would mean that modules are out there that only support a specific subset of JS features that only one engine supports - that'd be ugly. But take a look at these benchmarks - some drastic differences here - https://asmagin.com/2016/10/22/how-to-marry-reactjs-and-sitecore-with-webpack-part-3-js-engines-performance/.

I suspect once sass.js is brought in and it targets something like foundation that's where the difference would be striking between calling Vroom/ChakraCore vs Jint/Jurassic. Having the option for someone to tweak that is a nice to have that comes basically for free, and I don't think causes a ton of harm by exposing. Only real headache is that you gotta work around the configuration being a singleton so the reloading of the config needs to wipe it out.

@daveaglick
Copy link
Member

daveaglick commented Feb 16, 2017

That makes a ton of sense, thanks. It sounds like there's essentially two compelling reasons to go this route:

  • The library already has a well thought-out abstraction so we don't have to reinvent the wheel.
  • Users can have the option of swapping engines if they want (and are okay accepting any compatibility issues).

Agree with all your other points, especially discouraging swapping the engine inside a module. In fact, it probably makes sense not to even expose that option from the execution context and only allow it to be changed from the config file/calling code (similar to some of the other objects that become read-only during execution).

I was thinking a lot about this ability last night and today - could be a game changer. Generation time highlighting, Sass, Less, diagrams, list goes on. JS opens a whole new world of possibilities. Thanks for continuing to press this initiative! Hoping I have a chance to merge this first PR tomorrow.

@phil-scott-78
Copy link
Contributor Author

Cool. I'm trying to redo the Wyam.Web project to use the server side highlighting as a stress test before my wife gets home tonight. I'll keep you posted on any disasters or successes that might prevent a merge.

This applies the highlight.js highlighting on the server allowing the client side script to be dropped. It does this using the MsieJavaScriptEngine to execute the script. The script itself is actually the node version, but I applied browserify to it so that it wouldn't require node anymore. Kind of a hack, but it works. I included the steps I took to build the file in regenerating-highlight-all.js so that it could be updated in the future with a newer version.
MsieJsEngine is certainly not thread safe so I was rebuildingi t from scratch within the loop. No sense in doing that, so switch the code to use a ThreadLocal instance to pool the JS instances between threads.  Not a huge win for small sites, but for a 1000 document test run it went from 15s to 1s on my machine.
Adds a test where the code block doesn't have a language and highlightAuto
must be called. This not only covers that scenario but is an important
test case for javascript execution engines. Because highlight.js doesn't
know the language it must iterate through the languages it is configured
with and check them one by one using regular expressions. These have a
wide variety of implementations so it actually provides a good test case
for a javascript languages regex support. It seems anything doing JS -> IL
starts to fall short here due to the differences in supported Regex
escaping and language features. And with regex being a huge part of
linting and text manipulation supporting regex is a vital part of
evaluating a JS engine.
Adds the following packages to Core

* JavaScriptEngineSwitcher - an abstraction around common JS processing libraries
* JsPool - a pool around the engine switcher. We'll be creating these a lot potentially so good pooling is a must
* Jint - managed implementation of a javascript processor. We'll make this the default.
* JavaScriptEngineSwitcher.Jint - allows Jint to be plugged into the JavaScriptEngineSwitcher

For a module that wants to use the javascript engine it's pretty simple. They just need to call GetJsEngineFromPool off of the context. When they are done then must call ReturnJsEngineToPool, This exposes more or less the same interface as IJsEngine from the JavaScriptEngineSwitcher so it should be easy to find help and examples for devs to work with. The implementation is just a wrapper of the default implementation plus a couple of helpers around ensuring libraries only get loaded once per engine.

Things get tricky in two spots

1. Testing. With this being exposed off of IExecutionContext the TestContext project needed some implementation. Rather than bring in a JS engine here too I made it a Func<> that could be set to a factory for building up the JsEngine by the test. I added a Wyam.Tests.JavaScript with an implementation of an engine that could be used for testing. It's just a copy and paste from Wyam.Core but it does allow test projects to be able to avoid needing to reference Core just to get some JS processing to work
2. Reloading the configuration. Because the configuration could potentially change the JS engine it needed the ability to wipe the singleton that JavaScriptEngineSwitcher uses. Had to add a couple static methods off of Engine and ExecutionContext to get those settings cleared out properly.
Test is failing due to Jint's regex implementation, but is still valuable to keep around for testing those fixes in the future
@phil-scott-78
Copy link
Contributor Author

Give me a bit on this. Ran into some edge cases with the docs that I missed.

This was escaping ALL @ signs in the document instead of just the ones in the code blocks. Not good for things like actual javascript and the such. I tried to get AngularSharp to be nice and let me escape it via the InnerHtml but they insisted on changing it back. 

Taking a step back I realized all of this is foolish. This should really only be called after Razor is invoked. Added a test to make sure the highlighting is cool with the escaped html in a code block razor will generate.
The ADA and Lisp languages used some regex syntax that .Net couldn't handle. By changing this it allows Jint to run successfully.
@phil-scott-78
Copy link
Contributor Author

Ok, ran this through the docs site. Ran into an interesting problem with if statements not exposing the new IModuleCollection. I just hard-coded the highlight into the recipe for testing. So with the Vroom engine it added is churns through them pretty quickly. Without it...well it isn't pretty. About 8 minutes when running through Jint. 😑

Going to mess with performance but for now...
works badge

@daveaglick
Copy link
Member

About 8 minutes when running through Jint

😨

@phil-scott-78
Copy link
Contributor Author

Oh yeah. That's where the DotTrace PR came from.

I have some things I'm going to try tomorrow. I suspect a good chunk of the slow down is due to the fact many elements don't have an explicit language set on the code block so highlight.js is having to do more work than needed to figure out how to render it. Going to try and make that explicit in the recipe maybe. Along the same lines I'm going to add two configuration options. One that skips pre code blocks with no languages rather than calling the HighlightAuto. Maybe just dump a hljs class on it so the code block theme is there and move onto the next element. Ideally it isn't needed, but can't hurt to throw that in there. Second one would allow a list of valid languages so it doesn't need to check eleventy billion of them. Make that a DocKeys setting and see how perf goes if it is only configured to use a simple default subset.

Now I'm probably not the average use case, but my box has 8 cores and it was pretty bored while running through this. If I recall correctly, AsParallel will create "#numberOfCores" threads. On my machine though I'm sitting at 20% cpu usage while it does...something. I've got a lot of bored processors. Not quite sure why it isn't chewing up more processing. Compare that to Razor which basically warms my room by 2 degrees it maxes out my CPU while doing its thing. I suspect that'll be digging into Jint's guts which might be fun.

@daveaglick
Copy link
Member

Starting to look through this. What happens if a module doesn't call ReturnJsEngineToPool()?

@daveaglick
Copy link
Member

I'm thinking more about the pool and how it's being used here. It looks like the benefit of using JSPool is that you can bypass both the JS engine instantiation time and the long loading time of shared state (from a library or other "initialization" JS code) when you know you're going to use the engine over and over in a concurrent manner. This performance savings only happens once you grab the same instance over again, right? In other words, if my pool has 10 slots and I don't return the engines before I need the next one, I'll end up the same performance-wise as if I had just created 10 engines from scratch. Also, if you know you're only going to be running synchronously, then the pool doesn't make sense - just create a single JS engine, initialize it with your shared state, and reuse that over and over.

I'm wondering if we're making the best use of pooling here. It seems like you realize the most benefit when module-specific initialization (like loading highlight.js) is performed by the pool itself (granted, we're avoiding running the initialization more than once in the highlight case by using the engine's Require... methods). Also, by putting the pool in the context and sharing it between modules, wouldn't the global state carry over from one module to the next for a reused JS engine (until we reach the pool MaxUsagesPerEngine and it gets recreated)? I'm also a little concerned about pool starvation across modules if I even get module and/or pipeline parallelization to work - it's not a problem now, but if we go with a centralized pool model today it'll be harder to change in every module that uses it down the road.

I guess what I'm getting at is perhaps the pool itself should be created by the modules? Or at least provide custom pool creation as an option for the modules. I do really like the level of abstraction here, especially that we can consume JS engines from module libraries without bringing in Wyam.Core (even if it does mean essentially duplicating the interfaces from JavaScriptEngineSwitcher). Rough outline of what I'm thinking:

  • Add a new IJsEnginePool interface to Wyam.Common. Since the JSPool IJsPool interface is already pretty simple, this will probably look similar to that (though without the generic engine type).
  • Add a new GetJsEnginePool() method to IExecutionContext with several overloads to handle the various pool configuration options.
  • Remove the other pool methods on IExecutionContext and instead have it also implement IJsEnginePool

Here's what I'm thinking - I'll go ahead and merge this in right now and then make some changes. You can see what I adjusted from the commit diff and then we can iterate in another PR if needed.

@daveaglick daveaglick merged commit 6ab96dc into statiqdev:develop Feb 17, 2017
@daveaglick
Copy link
Member

Okay, my adjustments are in. First off, I loved the abstraction concept you build so I tried to stay close to that. The big differences are:

  • We're now abstracting the pool and you get a new pool from the IExecutionContext. Since the pools are not shared between modules, this should be safe and lets the module customize their pool if they need to.
  • Since the modules now set up the pool, we no longer need the RequireResource() and RequireFile() interface methods since any requirements can go in the pool initializer (including arbitrary script code).
  • I didn't like how JSPool handled disposal. The user shouldn't be directly able to dispose pooled engines since they may be reused. The ReturnEngineToPool() method is silly too - having to explicitly return the engine may result in accidentally keeping it and certainly means lots of try..catch blocks. Instead, disposing an engine should return it to the pool and then the disposable pattern and usings can be used to control the pool lifecycle (you can see this in action in the Highlight module).

Anyway, certainly still open for debate and comments - just figured it was easier to explain where I was thinking we should go with code.

@phil-scott-78
Copy link
Contributor Author

Nice. All that makes perfect sense. I was thinking about throwing the the pool into a using statement or making some sort of Pool<T> but this is far cleaner.

My wife is working the night shift tonight again so I'll pull this down and play with it. Going to give kick the tires of the changes by using sass.js a go as a test case of implementing a new module from scratch

@Daniel15
Copy link

Hi! I'm the developer of JSPool. I stumbled on this pull request as I'm considering switching the ReactJS.NET site to use Wyam, and was looking through the code of Wyam.Highlight to see how to extend it for my use cases.

I didn't like how JSPool handled disposal. The user shouldn't be directly able to dispose pooled engines since they may be reused. The ReturnEngineToPool() method is silly too - having to explicitly return the engine may result in accidentally keeping it and certainly means lots of try..catch blocks. I

@daveaglick - Yeah, ReturnEngineToPool is a bit silly. I've been meaning to clean up the API a bit. The reasoning behind it is that JSPool currently returns the JavaScript engine directly from JavaScriptEngineSwitcher, without any modifications or wrapper around it. This means that the engine has no idea about the pool, and disposing it will actually dispose the engine.

In order to override the Dispose handler to return the engine to the pool (rather than actually disposing it), I'd need to have a wrapper around the engine itself, and proxy all the methods. I do think this is a reasonable idea though, so I'll probably do it in a future JSPool release.

Thanks for the feedback 😄

@daveaglick
Copy link
Member

@Daniel15 Great to hear from you! We've gotten great use from JSPool - it's helped opened up a whole class of generation-time JavaScript execution cases. There's so much that could be done there and I've only started to scratch the surface of what we could bring in that way.

I've got my own share of "silly" APIs so no worries at all about the disposal pattern 😄. In fact...

I'd need to have a wrapper around the engine itself

That's exactly what I ended up doing: https://github.com/Wyamio/Wyam/blob/develop/src/core/Wyam.Core/JavaScript/PooledJsEngine.cs

Daniel15 referenced this pull request in Daniel15/JSPool May 15, 2017
…o call ReturnEngineToPool

To handle this, the engine is wrapped in a class that overrides Dispose and does the right thing.

References #7
References https://github.com/Wyamio/Wyam/pull/452
@Daniel15
Copy link

I started working on JSPool again and did exactly that (wrapped the engine and overrode Dispose to return it to the pool) so this should be in a release soon!

Out of curiosity, how did you find JSPool?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants