A Ruby library for tagging named entities in text based on OpenNPL. As OpenNLP is written in Java, NamedEntityTagger needs to be run on JRuby.
In order to use NamedEntityTagger, you need to download and build a few java libraries. The jar files need to be placed in the deps directory. Furthermore, you need OpenNLP model data that must be placed under models.
NamedEntityTagger exposes a minimal API. The main class is EntityTagger
and it provides the
method #tag(text)
. This method takes the text that should be tagged and returns a new string
where the words that were identified as named entities are highlighted. The encoding of the
output is defined by a formatter object. Currently there is only the
CSSClassAnnotationFormatter that adds span tags around the named entities. The span tag has a
class corresponding to the model that matched the entity.
For example:
require 'lib/entity_tagger'
require 'lib/css_class_annotation_formatter'
tagger = EntityTagger.new(CSSClassAnnotationFormatter.new)
tagger.tag("Mrs. Smith flew to Berlin")
=> "Mrs. <span class=\"person\">Smith</span> flew to <span class=\"location\">Berlin</span>"