Skip to content

Releases: OpenSextant/Xponents

Xponents Core Python API v1.6

11 Jun 23:52
Compare
Choose a tag to compare

Please see related repo:
Python API release here - https://github.com/OpenSextant/Xponents-Core/releases/tag/python-v1.6.2

This tag, v3.7.4, is for the Xponents REST server tested with the python library.
Also tested against v3.5.

OpenSextant Python API

29 Mar 16:01
Compare
Choose a tag to compare

OpenSextant ("Xponents") Python library

This library provides the most common data model, utilities and Xponents REST client to interact with OpenSextant/Xponents solutions. The main resources interfaced here are:

  • Xponents REST API (opensextant.xlayer package offers XlayerClient)
  • Xponents Gazetteer API (opensextant.gazetteer package offers Solr Gazetter mechanics)
  • Xponents Gazetteer ETL (opensextant.gazetteer offers classes used to curate the gazetteer in ./solr from raw sources)

Version: opensextant library, v1.5 (2024-March), attached below.
Install: pip install opensextant-1.5.8.tar.gz

Details

For the REST client below, please deploy Docker image per https://hub.docker.com/r/mubaldino/opensextant
Use the resulting server_host:port as url below

Correct usage for opensextant.xlayer with REST API looks like this:

from opensextant.xlayer import XlayerClient
client = XlayerClient(url)    # opensextant server URL or simply "server_host:port"
tags  = client.process(docid, text, features=["geo", "postal"])

# Confidence threshold = 20
# Array of opensextant.TextMatch,  
#  -- Consult subclass -- PlaceCandidate is a match that has geographic information
#  -- Consult label to determine nature of tag. It is one of  "country", "coord", "postal", "place" 
#  -- Consult TextMatch.attrs dictionary for useful metadata, e.g., as shown geolocation "confidence" should be used wisely. 
#      100 point scale, where 20 is a default cut-off (below that tag is unlikely a location or correct.)
#  -- Consult PlaceCandidate metadata in attrs as well as the place attribute for location metadata.
for t in tags:
    if t.filtered_out:
        # Add "filtered" to features to see what is filtered out.
        continue
    conf = int(t.attrs.get("confidence", -1))
    if isinstance(t, PlaceCandidate):
        if t.label == "coord":
            print("Found a coordinate")
        if conf >= 25:
            print("Found a high confidence place")

Xponents v3.7 baseline

07 Apr 13:26
Compare
Choose a tag to compare

Xponents 3.7 provides these refinements on tagging:

  • filtering noise and abbreviations
  • enhanced tag filtering for CJK and Arabic language groups, as well as improved Spanish stopwords
  • bare country codes and bare, short administrative codes are omitted, e.g., UK, Uk, or uk is tagged as a country, but filtered out if it is not qualified/preceded by a city or province.

Library cleanup

Docker release of Xponents REST API and Gazetteer is here: https://hub.docker.com/r/mubaldino/opensextant . mubaldino/opensextant:xponents-3.5 image is the latest rev. v3.7 of docker image is pending.

Xponents 3.5 Final

01 Jun 16:24
Compare
Choose a tag to compare

Xponents 3.5 addresses primarily:

  • addition of postal detection and geocoding, refined over the course of the past year
  • remediation of Log4J vulnerabilities, bringing the level of that library version to 2.17.2
  • the start of simplified documentation across gazetteer curation, Java API usage, and other references
  • formal release of a Python client API and scripting for interfacing to the REST API. See this Xponents release tag https://github.com/OpenSextant/Xponents/releases/tag/python-v1.4.7

Docker release of Xponents REST API and Gazeteer is here: https://hub.docker.com/r/mubaldino/opensextant . mubaldino/opensextant:xponents-3.5 image is the latest rev

Xponents 3.5 Begin Again, Again

07 Feb 16:07
Compare
Choose a tag to compare

Happy Valentines

Xponents 3.5.5 BeginAgain (Again)

  • Full Evaluation: internal evaluation work was redone start to finish to hone outlier gazetteer entries and
    patterns of rogue entries from new data sources. Evaluation work called out and fixed serious false-positive and recall
    errors
  • Log4J Remediation: While Log4J is not the primary choice of logging facility, it is a dependency that appears
    mainly in the Solr 7.x server distribution. Vulnerable Log4J JAR files were removed and latest ones were injected.
  • API Changes:
    • TextEntity is a text span and requires a start, end offset pair. Only constructor
      requires that pair. Other subclasses can have a zero argument constructor by exception, such as PoLiMatch
    • GeonamesUtility.isCountry() now only returns true for PCLI entries others are historical country names or territories.
    • REST API now has method and match-id on most matches to be more consistent
    • codes feature can be requested in REST API: features=geo,taxons,patterns,codes for example.
      This will emit tagged acronyms for admin boundaries for now.
    • Xponents Core TextUtils now offers trivial text span testing for common punctuation.
      For example, to quickly test if MARC __&__ U looks like a entity or is a false positive
      when tagging the phrase Marc U a common punct test was needed. These were fairly obvious
      pre-filters to employ just after tagging and before serious reasoning happens.
  • Geocoding: Tamped down on acronym false-positives on UPPERCASE and lowercase
    documents given the added gazetteer data includes lots of codes.
    • Default behavior: country codes and province codes are NOT emitted although tagged.
      These are requested explicitly by caller using the codes feature. Right, so USA
      or COD or MA are not emitted by default although those bare tokens may represent
      countries or provinces. Such codes qualifying other placenames will be emitted.
    • Gazetteer tagging ommissions: numerous transliterated short names for Pacific/Asian islands A xx, I-xx
      and various other false-positive places are NOT tagged, although present in the gazetteer.
    • About 500 dictionary words in French, German and English were added to the stop-filter
      for tokens commonly not places. E.g., amend, adept, etc.
  • Bugs Fixed:
    • Geocoder Rule HeatMap memory leak fixed
    • German is removed as a country -- its a nationality or an adjective
    • Tagger will throw ExtractionException if it tags 100,000 or more locations from gazetter

DISTRIBUTIONS:

TESTING:

Deploy: https://github.com/OpenSextant/Xponents/blob/master/Examples/Docker/docker-compose.yml

Install client library (ATTACHED)

pip3 install opensextant-1.4.6.tar.gz

Use Test suite: https://github.com/OpenSextant/Xponents/blob/master/test/xlayer-test-suite.py

DEFAULT_URL=localhost:8787
python3 xlayer-test-suite.py   $DEFAULT_URL

Test output:

  • Consult docker logs on docker container, ala docker logs xponents to see that server is alive
  • Review output to console -- unit tests results for normal geotagging, postal geotagging and tests in Arabic and Japanese should appear.

Xponents 3.5 "Begin Again"

03 Jan 23:11
Compare
Choose a tag to compare

Happy New Year

Xponents 3.5.4 BeginAgain

  • Full Evaluation: internal evaluation work was redone start to finish to hone outlier gazetteer entries and
    patterns of rogue entries from new data sources. Evaluation work called out and fixed serious false-positive and recall
    errors
  • Log4J Remediation: While Log4J is not the primary choice of logging facility, it is a dependency that appears
    mainly in the Solr 7.x server distribution. Vulnerable Log4J JAR files were removed and latest ones were injected.
  • API Changes:
    • TextEntity is a text span and requires a start, end offset pair. Only constructor
      requires that pair. Other subclasses can have a zero argument constructor by exception, such as PoLiMatch
    • GeonamesUtility.isCountry() now only returns true for PCLI entries others are historical country names or territories.
    • REST API now has method and match-id on most matches to be more consistent

DISTRIBUTIONS:

TESTING:

Deploy: https://github.com/OpenSextant/Xponents/blob/master/Examples/Docker/docker-compose.yml

Install client library

pip3 install opensextant-1.4.5.tar.gz

Use Test suite: https://github.com/OpenSextant/Xponents/blob/master/test/xlayer-test-suite.py

DEFAULT_URL=localhost:8787
python3 xlayer-test-suite.py   $DEFAULT_URL

Test output:

  • Consult docker logs on docker container, ala docker logs xponents to see that server is alive
  • Review output to console -- unit tests results for normal geotagging, postal geotagging and tests in Arabic and Japanese should appear.

Xponents Core & SDK v3.3.5 patch

06 May 21:06
Compare
Choose a tag to compare
  • Docker offline image
  • Bug: TaxCat .configure() method accidentally called a second time in PlaceGeocoder
  • JavaDoc 8+ maintanance on HTML5 and javadoc comments
  • Maven plugin versions updated
  • XText module moved to 3.3.5 to release with Xponents Examples
  • XCoord retested coordinate patterns on DMS and DM to ensure +/- symbols are detected and coordinate precision is provided. Moved TEST cases on certain patterns to appropriate family to test

Xponents Core & SDK v3.2.2 data patch

12 Dec 23:31
437f593
Compare
Choose a tag to compare
Pre-release

Not fully released for this round. Docker image will be produced for v3.3.

Issues:

  • "taxcat" data sets are now more reliably harvested and scripted so everything is easily reproduced.
  • Docker script simplification for Xponents REST
  • Substantial additions in patterns extraction: "Email" test case added in Java API; "FlexPat" capability now in beta in Python API.
  • Python 3.x readiness and testing
  • Streamlined and retested entire Solr Gazetter build from latest geoname sources
  • Streamlined and retested script/dist.sh build and distribution

Xponents Core & SDK v3.2

24 Jul 02:31
Compare
Choose a tag to compare

release code name: Dead Heat, Summer 2019.

Xponents was refactored in the following way for this turning point release:

  • Core contains lighter weight parsers and base classes and data models (artifact: opensextant-xponents-core)
  • Tagger SDK contains the beefier Solr-based taggers and REST services (artifact (opensextant-xponents)
  • Xlayer REST was folded into SDK
  • XText only needs to make use of Core

NO functional changes or data changes were made. This release is strictly an organizational matter relative to 3.1.1.

This includes the binary distribution of Xponents SDK (JARs, config files, docs) but no Xponents Solr data, due to size limitations. Docker Hub will have the full release.

Xponents SDK v3.0 (2018-OCTOBER)

12 Oct 14:03
Compare
Choose a tag to compare

Download: Xponents SDK @ Data-Releases

Improvements:

  • CLI: Command line improvements on testing and running Example demos using a Groovy script (./script/xponents-demo.sh)
  • Stopword tuning by language, when language ID for text is known: see @genediazjr "stopwords-iso" project; Stopwords for Tagalog, Urdu, Farsi, Chinese, Korean, etc, contributed there. This contributes to noise reduction in post-processing naiive tagger output.
  • Solr: SolrTextTagger was incorporated into Solr 7.4.0 formally. Solr 7.4 is minimum requirement. SolrJ partially deprecated; "FST50" postings format for FST is now used.
  • Geocoder Rules: NAME, CODE patterns teased apart, e.g., "Boise, ID", "Boise, Id." are valid locations, where admin boundary code qualifies city. "Boise id" is not valid, though.
  • Consolidation: Xponents is now one library. XText and Xlayer are separate related modules.
  • Social Geo: org.opensextant.data.social and org.opensextant.extractors.geo.social represent the core functionality that was previously in test TweetGeocoder from OpenSextant 1.0.
  • LangID: CyboZu LangDetect is incorporated as an extractor, but we still require the use of a valid ISO-639 table and a Language object model to manage LangID concepts.
  • JSON: Jodd JSON library is now formally supported by XText and XLayer and other areas where JSON is used.