Skip to content

Latest commit

 

History

History
89 lines (65 loc) · 4.2 KB

Jython.md

File metadata and controls

89 lines (65 loc) · 4.2 KB

Using Jython as your Expression Language

Tutorials

Full docs on the Jython language are at its official site http:www.jython.org.

You can use almost any Python (.py)(.pyc) files compatible with the bundled Jython 2.5.1 and drop them into the path. Since Jython is essentially Java, you can even Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python

NOTE: Python libraries or code that uses C bindings will not work in OpenRefine which uses Jython / Java only, and has no CPython interpreter built-in

NOTE: OpenRefine now has most of the Jsoup.org library built in for parsing and working with HTML elements and extraction - GREL Other Functions

NOTE: Remember to restart OpenRefine, so that new Jython/Python libraries are initialized during Butterfly's startup.


Expressions in Jython must have a return statement:

return value[1:-1]
return rowIndex%2

Fields have to be accessed using the bracket operator rather than the dot operator:

return cells["col1"]["value"]

To access the Levenshtein distance between the reconciled value and the cell value (?) use the Recon Variables:

return cell["recon"]["features"]["nameLevenshtein"]

To return the lower case of value (if the value is not null):

if value is not None:
    return value.lower()
  else:
    return None

Tutorial - Working with Phone numbers using Java libraries inside Python

As mentioned before, you can drop in JAVA .jar files into the classpath and utilize them from OpenRefine with Python/Jython as your expression language.

  1. Download the http:repo1.maven.org/maven2/com/googlecode/libphonenumber/libphonenumber/8.3.1/libphonenumber-8.3.1.jar (repo and docs here: https:github.com/googlei18n/libphonenumber)
  2. Copy the .jar file into your openrefine-2.xx/webapp/WEB-INF/lib folder libphonenumber_jar
  3. Start OpenRefine
  4. Choose Edit column -> Add column based on this column...
  5. Give it a new column name such as VALID_FORMATTED_NUMBER
  6. Choose Python/Jython as your Expression Language.
  7. Copy and paste the following code into the Expression input box:
from com.google.i18n.phonenumbers import PhoneNumberUtil
   from com.google.i18n.phonenumbers.PhoneNumberUtil import PhoneNumberFormat

   phoneUtil = PhoneNumberUtil.getInstance()
   number = phoneUtil.parse(value, 'US')
   formatted = phoneUtil.format(number, PhoneNumberFormat.NATIONAL)
   valid = phoneUtil.isValidNumber(number)

   if valid == 1:
     return formatted
from com.google.i18n.phonenumbers import PhoneNumberUtil
   from com.google.i18n.phonenumbers.PhoneNumberUtil import PhoneNumberFormat

   phoneUtil = PhoneNumberUtil.getInstance()
   number = phoneUtil.parse(value, 'US')
   formatted = phoneUtil.format(number, PhoneNumberFormat.INTERNATIONAL)
   valid = phoneUtil.isValidNumber(number)
   output = valid, formatted, number

   return '|||'.join(map(str,output))