- Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python
- Extending-Jython-with-pypi-modules
- StrippingHTML
Full docs on the Jython language are at its official site http:www.jython.org.
You can use almost any Python (.py)(.pyc) files compatible with the bundled Jython 2.5.1 and drop them into the path. Since Jython is essentially Java, you can even Jython#tutorial---working-with-phone-numbers-using-java-libraries-inside-python
NOTE: Python libraries or code that uses C bindings will not work in OpenRefine which uses Jython / Java only, and has no CPython interpreter built-in
NOTE: OpenRefine now has most of the Jsoup.org library built in for parsing and working with HTML elements and extraction - GREL Other Functions
NOTE: Remember to restart OpenRefine, so that new Jython/Python libraries are initialized during Butterfly's startup.
Expressions in Jython must have a return statement:
return value[1:-1]
return rowIndex%2
Fields have to be accessed using the bracket operator rather than the dot operator:
return cells["col1"]["value"]
To access the Levenshtein distance between the reconciled value and the cell value (?) use the Recon Variables:
return cell["recon"]["features"]["nameLevenshtein"]
To return the lower case of value (if the value is not null):
if value is not None:
return value.lower()
else:
return None
As mentioned before, you can drop in JAVA .jar files into the classpath and utilize them from OpenRefine with Python/Jython as your expression language.
- Download the http:repo1.maven.org/maven2/com/googlecode/libphonenumber/libphonenumber/8.3.1/libphonenumber-8.3.1.jar (repo and docs here: https:github.com/googlei18n/libphonenumber)
- Copy the .jar file into your openrefine-2.xx/webapp/WEB-INF/lib folder
- Start OpenRefine
- Choose Edit column -> Add column based on this column...
- Give it a new column name such as VALID_FORMATTED_NUMBER
- Choose Python/Jython as your Expression Language.
- Copy and paste the following code into the Expression input box:
from com.google.i18n.phonenumbers import PhoneNumberUtil
from com.google.i18n.phonenumbers.PhoneNumberUtil import PhoneNumberFormat
phoneUtil = PhoneNumberUtil.getInstance()
number = phoneUtil.parse(value, 'US')
formatted = phoneUtil.format(number, PhoneNumberFormat.NATIONAL)
valid = phoneUtil.isValidNumber(number)
if valid == 1:
return formatted
- You can now see in the Preview a nicely formatted number if your original string in your column was a valid US number.
- OPTIONAL - http:repo1.maven.org/maven2/com/googlecode/libphonenumber/libphonenumber/8.3.1/libphonenumber-8.3.1-javadoc.jar, unzip and extract it, then open the index.html file and read and experiment with other methods and classes in Google's libphonenumber library.
- OPTIONAL - If you want a list that you can later split into record rows, you can do something like this.
from com.google.i18n.phonenumbers import PhoneNumberUtil
from com.google.i18n.phonenumbers.PhoneNumberUtil import PhoneNumberFormat
phoneUtil = PhoneNumberUtil.getInstance()
number = phoneUtil.parse(value, 'US')
formatted = phoneUtil.format(number, PhoneNumberFormat.INTERNATIONAL)
valid = phoneUtil.isValidNumber(number)
output = valid, formatted, number
return '|||'.join(map(str,output))