Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ScriptTagger and Nashorn engine #89

Open
danizen opened this issue Jan 31, 2019 · 5 comments
Open

ScriptTagger and Nashorn engine #89

danizen opened this issue Jan 31, 2019 · 5 comments

Comments

@danizen
Copy link

danizen commented Jan 31, 2019

I am updating my code to work under OpenJDK 11, as soon Oracle will stop supporting Java 8, and my institution, as government may be expected to do, is moving on.

After some adjustments, my tests mostly work, but I say the following error message"Warning: Nashorn engine is planned to be removed from a future JDK release" in a verify test that runs with the actual importer configuration I use in production. The problem I gather is the ScriptTagger.

JEP 335 states that the Nashorn engine will be removed from a future release. Long term, that's probably a good thing, but the ScriptTagger defaults to using the Nashorn engine, and so work should be done to find a better alternative default ECMA script implementation so that importer configurations similar to the following continue to work:

    <tagger class="com.norconex.importer.handler.tagger.impl.ScriptTagger">
      <script><![CDATA[
        /* create a domain field */
        var expr = new RegExp('[a-z]+://([^/]+).*');
        var url = metadata.url[0];
        var domain = url.replace(expr, '$1');
        metadata.addString('domain', domain);

        /* if keywords is not a list, make it one */
        if (metadata.containsKey('keywords')) {
           var keywords = metadata.get('keywords');
           if (typeof keywords == 'string') {
              metadata.set('keywords', [keywords])
           }
        }

        /* Clean the schemaorg_itemtype variables */
        if (metadata.containsKey('schemaorg_itemtype')) {
          var newdata = new java.util.ArrayList();
          var data = metadata.get('schemaorg_itemtype');
          for each (var datum in data) {
            newdata.add(datum.replaceFirst('^https?\://schema.org/', ''));
          }
          metadata.put('schemaorg_itemtype', newdata);
        }
      ]]></script>
    </tagger>
@danizen
Copy link
Author

danizen commented Jan 31, 2019

Probably the best thing is to bow to the inevitable and to include Groovy with groovy as the default script engine implementation.

@danizen
Copy link
Author

danizen commented Feb 1, 2019

Another option, although slower:
https://search.maven.org/artifact/org.mozilla/rhino/1.7.10/jar

And then there is GraalVM's JS engine, but that looks a little harder to add.

@essiembre
Copy link
Contributor

Given GraalVM comes standard with JDK 11, and supports backward compatibility with Nashorn, it looks like a more than a suitable alternative (being a more complete implementation and being more efficient).

That being said, it does not prevent from adding Groovy as an option.

@danizen
Copy link
Author

danizen commented Apr 23, 2019

@danizen
Copy link
Author

danizen commented Apr 23, 2019

The OpenJDK 11 I have, from Azul.com, is a supported version we pay for, because Oracle will not offer support on JDK 11 long enough for us. This version at least does not include Graaljs.

Another significant issue for performance, is that the ScriptEngine cannot compile the code ahead of time and then invoke an Invokable within the evaluation code as is done in the Maven repo. linked above.

So, then, compilation is done for each invocation of the ScriptEngine. I guess that is OK.

How I checked for Graal on my JDK:

jshell> import javax.script.ScriptEngine;

jshell> import javax.script.ScriptEngineFactory;

jshell> import javax.script.ScriptEngineManager;

jshell> var manager = new ScriptEngineManager();
manager ==> javax.script.ScriptEngineManager@48974e45

jshell> List<ScriptEngineFactory> factories = manager.getEngineFactories()
factories ==> [jdk.nashorn.api.scripting.NashornScriptEngineFactory@6d3a388c]

jshell> for (var factory : factories) {
   ...>    println(factory.getEngineName());
   ...>    println(factory.getEngineVersion());
   ...>    println(factory.getLanguageName());
   ...>    println(factory.getLanguageVersion());
   ...>    println(factory.getExtensions());
   ...>    println(factory.getMimeTypes());
   ...>    println(factory.getNames());
   ...>    println("");
   ...> }
Oracle Nashorn
11.0.2
ECMAScript
ECMA - 262 Edition 5.1
[js]
[application/javascript, application/ecmascript, text/javascript, text/ecmascript]
[nashorn, Nashorn, js, JS, JavaScript, javascript, ECMAScript, ecmascript]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants