Skip to content

Latest commit

 

History

History
184 lines (125 loc) · 8.35 KB

README.md

File metadata and controls

184 lines (125 loc) · 8.35 KB

Graphene

Knowledge Graph / Open Information Extraction

Motivation

Graphene is an information extraction pipeline which extracts Knowledge Graphs from texts (n-ary relations and rhetorical structures extracted from complex factoid discourse). Given a sentence or a text, Graphene outputs a semantic representation of the text which is a labeled directed graph (a knowledge graph). This knowledge graph can be later used for addressing different AI tasks, such as building Question Answering systems, extracting structured data from text, supporting semantic inference, among other tasks. Differently from existing open relation extraction tools, which focus on the main relation expressed in a sentence, Graphene aims at maximizing the extraction of contextual relations. For example:

Trump withdrew his sponsorship after the second Tour de Trump in 1990 because his business ventures were experiencing financial woes.

Graphene-Extraction

In order to capture all the contextual information, Graphene performs the following steps:

  • Resolves co-references.
  • Transforms complex sentences (for example, containing subordinations, coordinations, appositive phrases, etc), into simple independent sentences (one clause per sentence).
  • Identifies rhetorical relations between those sentences
  • Extract binary relations (subject, predicate and object) from each sentence.
  • Merge all the extracted relations into a relation graph (knowledge graph).

Graphene-Extraction

Graphene’s extracted graphs are represented by our RDFNL format, an simple format that facilitates the representation of complex contextual relations in a way that balances machine representation with human legibility. A description of the RDFNL format can be found here. In order to increase further processability of the extracted relations, Graphene can materialize its relations into a proper RDF graph serialized under the N-Triples specification of the RDF standard. A description of the RDF format can be found here. Alternatively, developers can use the direct output class of the API, which is serializable and deserializable as a JSON object.

Example Extractions

Sentence Extraction

Although the Treasury will announce details of the November refunding on Monday, the funding will be delayed if Congress and President Bush fail to increase the Treasury's borrowing capacity.

The serialized class: JSON
The RDFNL format:

# Although the Treasury will announce details of the November refunding on Monday , the funding will be delayed if Congress and President Bush fail to increase the Treasury 's borrowing capacity .

bacf06771e0f4fc5a8e68c30fc77c9c4    0    the Treasury    will announce    details of the November refunding
    S:TEMPORAL    on Monday .
    L:CONTRAST    948eeebd73564adab7dee5c6f177b3b9

948eeebd73564adab7dee5c6f177b3b9    0    the funding    will be delayed        
    L:CONDITION 006a71e51295440fab7a8e8c697d2ba6
    L:CONDITION e4d86228cff443b7a8e9f6d8a5c5987b
    L:CONTRAST    bacf06771e0f4fc5a8e68c30fc77c9c4

006a71e51295440fab7a8e8c697d2ba6    1    Congress    fail    to increase the Treasury 's borrowing capacity
    L:LIST    e4d86228cff443b7a8e9f6d8a5c5987b

e4d86228cff443b7a8e9f6d8a5c5987b    1    president Bush    fail    to increase the Treasury 's borrowing capacity
    L:LIST    006a71e51295440fab7a8e8c697d2ba6

The RDF N-Triples format: NT

Full text extraction of the Barack Obama Wikipedia Page (2017-11-06):

The serialized class: JSON
The RDFNL format: RDFNL
The RDF N-Triples format: RDF

Contributors (alphabetical order)

  • Andre Freitas
  • Bernhard Bermeitinger
  • Christina Niklaus
  • Leonardo Souza
  • Matthias Cetto
  • Siegfried Handschuh

Requirements

  • Java 8 (OpenJDK or Oracle)
  • Maven 3.3.9
  • Docker version 17.03+
  • docker-compose version 1.12+

Dependencies

Compiling and packaging requires an additional package:

To install Discourse Simplification, execute the following script:

./install-DiscourseSimplification.sh

Setup of Graphene

Graphene can be customized via configuration file. The default settings are specified in graphene-core/src/main/resources/reference.conf. You can overwrite these settings in the corresponding src/main/java/resources/application.conf file for each module (graphene-core / graphene-server / graphene-cli).

(Optional:) If you want to use PyCobalt as the preferred coreference resolution system, you must have a PyCobalt instance running. It is provided in the docker-compose-core.yml. Start it with docker-compose -f docker-compose-core.yml up. Ensure that the PyCobaltCoref class is selected and the coreference url of PyCobalt is set correctly in your application.conf:

graphene {
    coreference.resolver = org.lambda3.graphene.core.coreference.impl.pycobalt.PyCobaltCoref
    coreference.settings.pycobalt.url = "http://localhost:5128/resolve"
}

You need to install the Graphene-Core API with:

mvn clean install -DskipTests

Graphene can be used as a Java API, as a Web Service, or as a Command Line Interface.

REST-Like Web Service (Graphene-Server)

If you want the server part, you have to specify that profile:

mvn -P server clean package -DskipTests

Command Line Interface (Graphene-CLI)

If you want the command line part, you have to specify that profile:

mvn -P cli clean package -DskipTests   

Both

To build both interfaces, you can specify both profiles:

mvn -P cli -P server clean package -DskipTests

Video Tutorial

A short video tutorial on the Graphene setup for CLI usage is provided here. Note that the command line arguments for Relation Extraction used in this video are outdated. Please refer to the newest set of commands here.

Setup of Graphene-Server using Docker-Compose

For simplified deployment, we have wrapped Graphene's web service including all needed dependencies (e.g. PyCobalt) into a Docker image. The configuration settings for the Docker image are customized in conf/graphene.conf which will overwrite the default settings from graphene-core/src/main/resources/reference.conf. Ensure that the coreference url for PyCobalt in your conf/graphene.conf is set to:

graphene {
    coreference.settings.pycobalt.url = "http://coreference:5128/resolve"
}

Ensure that you have installed the Graphene-Core API and packaged the server part:

mvn clean install -DskipTests
mvn -P server clean package -DskipTests

Then, you can build and start the composed images by running:

docker-compose -f docker-compose-server.yml up

Usage

Graphene-Core

Graphene comes with a Java API which is described here.

In order to use the Graphene API within your own Java application, you can import it as a Maven dependency. Just add the following lines to your project's pom.xml file:

<dependency>
    <groupId>org.lambda3.graphene</groupId>
    <artifactId>graphene-core</artifactId>
    <version>3.1.0</version>
</dependency>

Graphene-Server

The usage of the Graphene-Server is described here.

Graphene-CLI

The usage of Graphene's command-line interface is described here.

Citation

@InProceedings{cetto2018graphene,
  author    = {Matthias Cetto and Christina Niklaus and Andr\'{e} Freitas and Siegfried Handschuh},
  title     = {Graphene: Semantically-Linked Propositions in Open Information Extraction},
  booktitle = {Proceedings of the 27th International Conference on Computational Linguistics},
  year      = {2018},
  publisher = {Association for Computational Linguistics},
  pages     = {2300--2311},
  location  = {Santa Fe, New Mexico, USA},
  url 	    = {http://aclweb.org/anthology/C18-1195}
}