11 Jun 23:52

5855354

Please see related repo:
Python API release here - https://github.com/OpenSextant/Xponents-Core/releases/tag/python-v1.6.2

This tag, v3.7.4, is for the Xponents REST server tested with the python library.
Also tested against v3.5.

Assets 2

29 Mar 16:01

mubaldino

python-v1.5.8

99e70ca

OpenSextant Python API

OpenSextant ("Xponents") Python library

This library provides the most common data model, utilities and Xponents REST client to interact with OpenSextant/Xponents solutions. The main resources interfaced here are:

Xponents REST API (opensextant.xlayer package offers XlayerClient)
Xponents Gazetteer API (opensextant.gazetteer package offers Solr Gazetter mechanics)
Xponents Gazetteer ETL (opensextant.gazetteer offers classes used to curate the gazetteer in ./solr from raw sources)

Version: opensextant library, v1.5 (2024-March), attached below.
Install: pip install opensextant-1.5.8.tar.gz

Details

Xponents REST usage for Python
Gazetteer queries for example, See Python section at bottom.
opensextant API pydoc

For the REST client below, please deploy Docker image per https://hub.docker.com/r/mubaldino/opensextant
Use the resulting server_host:port as url below

Correct usage for opensextant.xlayer with REST API looks like this:

from opensextant.xlayer import XlayerClient
client = XlayerClient(url)    # opensextant server URL or simply "server_host:port"
tags  = client.process(docid, text, features=["geo", "postal"])

# Confidence threshold = 20
# Array of opensextant.TextMatch,  
#  -- Consult subclass -- PlaceCandidate is a match that has geographic information
#  -- Consult label to determine nature of tag. It is one of  "country", "coord", "postal", "place" 
#  -- Consult TextMatch.attrs dictionary for useful metadata, e.g., as shown geolocation "confidence" should be used wisely. 
#      100 point scale, where 20 is a default cut-off (below that tag is unlikely a location or correct.)
#  -- Consult PlaceCandidate metadata in attrs as well as the place attribute for location metadata.
for t in tags:
    if t.filtered_out:
        # Add "filtered" to features to see what is filtered out.
        continue
    conf = int(t.attrs.get("confidence", -1))
    if isinstance(t, PlaceCandidate):
        if t.label == "coord":
            print("Found a coordinate")
        if conf >= 25:
            print("Found a high confidence place")

Assets 3

07 Apr 13:26

mubaldino

v3.7.0

28100cb

Xponents v3.7 baseline Latest

Latest

Xponents 3.7 provides these refinements on tagging:

filtering noise and abbreviations
enhanced tag filtering for CJK and Arabic language groups, as well as improved Spanish stopwords
bare country codes and bare, short administrative codes are omitted, e.g., UK, Uk, or uk is tagged as a country, but filtered out if it is not qualified/preceded by a city or province.

Library cleanup

Converge geodesy and giscore libraries from opensextant under Xponents, rather than as separate dependencies. Java compatibility concerns, long term.
Latest Apache Commons and logging libraries updated
Updated opensextant python support library released at https://github.com/OpenSextant/Xponents/releases/tag/python-v1.5.8

Docker release of Xponents REST API and Gazetteer is here: https://hub.docker.com/r/mubaldino/opensextant . mubaldino/opensextant:xponents-3.5 image is the latest rev. v3.7 of docker image is pending.

Assets 3

0 Join discussion

01 Jun 16:24

mubaldino

v3.5.9

2871824

Xponents 3.5 Final

Xponents 3.5 addresses primarily:

addition of postal detection and geocoding, refined over the course of the past year
remediation of Log4J vulnerabilities, bringing the level of that library version to 2.17.2
the start of simplified documentation across gazetteer curation, Java API usage, and other references
formal release of a Python client API and scripting for interfacing to the REST API. See this Xponents release tag https://github.com/OpenSextant/Xponents/releases/tag/python-v1.4.7

Docker release of Xponents REST API and Gazeteer is here: https://hub.docker.com/r/mubaldino/opensextant . mubaldino/opensextant:xponents-3.5 image is the latest rev

Assets 2

07 Feb 16:07

mubaldino

v3.5.5

97a3c41

Xponents 3.5 Begin Again, Again

Happy Valentines

Xponents 3.5.5 BeginAgain (Again)

Full Evaluation: internal evaluation work was redone start to finish to hone outlier gazetteer entries and
patterns of rogue entries from new data sources. Evaluation work called out and fixed serious false-positive and recall
errors
Log4J Remediation: While Log4J is not the primary choice of logging facility, it is a dependency that appears
mainly in the Solr 7.x server distribution. Vulnerable Log4J JAR files were removed and latest ones were injected.
API Changes:
- TextEntity is a text span and requires a start, end offset pair. Only constructor
  requires that pair. Other subclasses can have a zero argument constructor by exception, such as PoLiMatch
- GeonamesUtility.isCountry() now only returns true for PCLI entries others are historical country names or territories.
- REST API now has method and match-id on most matches to be more consistent
- codes feature can be requested in REST API: features=geo,taxons,patterns,codes for example.
  This will emit tagged acronyms for admin boundaries for now.
- Xponents Core TextUtils now offers trivial text span testing for common punctuation.
  For example, to quickly test if MARC __&__ U looks like a entity or is a false positive
  when tagging the phrase Marc U a common punct test was needed. These were fairly obvious
  pre-filters to employ just after tagging and before serious reasoning happens.
Geocoding: Tamped down on acronym false-positives on UPPERCASE and lowercase
documents given the added gazetteer data includes lots of codes.
- Default behavior: country codes and province codes are NOT emitted although tagged.
  These are requested explicitly by caller using the codes feature. Right, so USA
  or COD or MA are not emitted by default although those bare tokens may represent
  countries or provinces. Such codes qualifying other placenames will be emitted.
- Gazetteer tagging ommissions: numerous transliterated short names for Pacific/Asian islands A xx, I-xx
  and various other false-positive places are NOT tagged, although present in the gazetteer.
- About 500 dictionary words in French, German and English were added to the stop-filter
  for tokens commonly not places. E.g., amend, adept, etc.
Bugs Fixed:
- Geocoder Rule HeatMap memory leak fixed
- German is removed as a country -- its a nationality or an adjective
- Tagger will throw ExtractionException if it tags 100,000 or more locations from gazetter

DISTRIBUTIONS:

Python: See attached Opensextant Python API 1.4.6
Docker: https://hub.docker.com/r/mubaldino/opensextant - see "xponents-3.5" tag. Now latest is also a tag
Gazetteer: see Docker image; Copy xponents-solr out of docker image to use it outside of Docker
Java, Maven:
- https://search.maven.org/artifact/org.opensextant/opensextant-xponents-core/3.5.5/jar
- https://search.maven.org/artifact/org.opensextant/opensextant-xponents/3.5.5/jar

TESTING:

Deploy: https://github.com/OpenSextant/Xponents/blob/master/Examples/Docker/docker-compose.yml

Install client library (ATTACHED)

pip3 install opensextant-1.4.6.tar.gz

Use Test suite: https://github.com/OpenSextant/Xponents/blob/master/test/xlayer-test-suite.py

DEFAULT_URL=localhost:8787
python3 xlayer-test-suite.py   $DEFAULT_URL

Test output:

Consult docker logs on docker container, ala docker logs xponents to see that server is alive
Review output to console -- unit tests results for normal geotagging, postal geotagging and tests in Arabic and Japanese should appear.

Assets 3

03 Jan 23:11

mubaldino

v3.5.4

5291040

Xponents 3.5 "Begin Again"

Happy New Year

Xponents 3.5.4 BeginAgain

Full Evaluation: internal evaluation work was redone start to finish to hone outlier gazetteer entries and
patterns of rogue entries from new data sources. Evaluation work called out and fixed serious false-positive and recall
errors
Log4J Remediation: While Log4J is not the primary choice of logging facility, it is a dependency that appears
mainly in the Solr 7.x server distribution. Vulnerable Log4J JAR files were removed and latest ones were injected.
API Changes:
- TextEntity is a text span and requires a start, end offset pair. Only constructor
  requires that pair. Other subclasses can have a zero argument constructor by exception, such as PoLiMatch
- GeonamesUtility.isCountry() now only returns true for PCLI entries others are historical country names or territories.
- REST API now has method and match-id on most matches to be more consistent

DISTRIBUTIONS:

Python: See attached Opensextant Python API 1.4.5
Docker: https://hub.docker.com/r/mubaldino/opensextant - see "xponents-3.5" tag
Gazetteer: see Docker image; Copy xponents-solr out of docker image to use it outside of Docker
Java, Maven:
- https://search.maven.org/artifact/org.opensextant/opensextant-xponents-core/3.5.4/jar
- https://search.maven.org/artifact/org.opensextant/opensextant-xponents/3.5.4/jar

TESTING:

Deploy: https://github.com/OpenSextant/Xponents/blob/master/Examples/Docker/docker-compose.yml

Install client library

pip3 install opensextant-1.4.5.tar.gz

Use Test suite: https://github.com/OpenSextant/Xponents/blob/master/test/xlayer-test-suite.py

DEFAULT_URL=localhost:8787
python3 xlayer-test-suite.py   $DEFAULT_URL

Test output:

Consult docker logs on docker container, ala docker logs xponents to see that server is alive
Review output to console -- unit tests results for normal geotagging, postal geotagging and tests in Arabic and Japanese should appear.

Assets 3

06 May 21:06

mubaldino

v3.3.5a

b3a5554

Xponents Core & SDK v3.3.5 patch

Docker offline image
Bug: TaxCat .configure() method accidentally called a second time in PlaceGeocoder
JavaDoc 8+ maintanance on HTML5 and javadoc comments
Maven plugin versions updated
XText module moved to 3.3.5 to release with Xponents Examples
XCoord retested coordinate patterns on DMS and DM to ensure +/- symbols are detected and coordinate precision is provided. Moved TEST cases on certain patterns to appropriate family to test

Assets 2

12 Dec 23:31

mubaldino

v3.2.2

437f593

Xponents Core & SDK v3.2.2 data patch Pre-release

Pre-release

Not fully released for this round. Docker image will be produced for v3.3.

Issues:

"taxcat" data sets are now more reliably harvested and scripted so everything is easily reproduced.
Docker script simplification for Xponents REST
Substantial additions in patterns extraction: "Email" test case added in Java API; "FlexPat" capability now in beta in Python API.
Python 3.x readiness and testing
Streamlined and retested entire Solr Gazetter build from latest geoname sources
Streamlined and retested script/dist.sh build and distribution

Assets 2

24 Jul 02:31

mubaldino

v3.2

5ad78c4

Xponents Core & SDK v3.2

release code name: Dead Heat, Summer 2019.

Xponents was refactored in the following way for this turning point release:

Core contains lighter weight parsers and base classes and data models (artifact: opensextant-xponents-core)
Tagger SDK contains the beefier Solr-based taggers and REST services (artifact (opensextant-xponents)
Xlayer REST was folded into SDK
XText only needs to make use of Core

NO functional changes or data changes were made. This release is strictly an organizational matter relative to 3.1.1.

This includes the binary distribution of Xponents SDK (JARs, config files, docs) but no Xponents Solr data, due to size limitations. Docker Hub will have the full release.

Assets 3

12 Oct 14:03

mubaldino

v3.0.4

99e7cf0

Xponents SDK v3.0 (2018-OCTOBER)

Download: Xponents SDK @ Data-Releases

Improvements:

CLI: Command line improvements on testing and running Example demos using a Groovy script (./script/xponents-demo.sh)
Stopword tuning by language, when language ID for text is known: see @genediazjr "stopwords-iso" project; Stopwords for Tagalog, Urdu, Farsi, Chinese, Korean, etc, contributed there. This contributes to noise reduction in post-processing naiive tagger output.
Solr: SolrTextTagger was incorporated into Solr 7.4.0 formally. Solr 7.4 is minimum requirement. SolrJ partially deprecated; "FST50" postings format for FST is now used.
Geocoder Rules: NAME, CODE patterns teased apart, e.g., "Boise, ID", "Boise, Id." are valid locations, where admin boundary code qualifies city. "Boise id" is not valid, though.
Consolidation: Xponents is now one library. XText and Xlayer are separate related modules.
Social Geo: org.opensextant.data.social and org.opensextant.extractors.geo.social represent the core functionality that was previously in test TweetGeocoder from OpenSextant 1.0.
LangID: CyboZu LangDetect is incorporated as an extractor, but we still require the use of a valid ISO-639 table and a Language object model to manage LangID concepts.
JSON: Jodd JSON library is now formally supported by XText and XLayer and other areas where JSON is used.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

OpenSextant ("Xponents") Python library

Details

DISTRIBUTIONS:

TESTING:

DISTRIBUTIONS:

TESTING:

Improvements:

Releases: OpenSextant/Xponents

Xponents Core Python API v1.6

OpenSextant Python API

OpenSextant ("Xponents") Python library

Details

Xponents v3.7 baseline

Xponents 3.5 Final

Xponents 3.5 Begin Again, Again

DISTRIBUTIONS:

TESTING:

Xponents 3.5 "Begin Again"

DISTRIBUTIONS:

TESTING:

Xponents Core & SDK v3.3.5 patch

Xponents Core & SDK v3.2.2 data patch

Xponents Core & SDK v3.2

Xponents SDK v3.0 (2018-OCTOBER)

Improvements: