GitHub - fusepool/patents-reengineering: A project to re-engineering patent documents from xml files to ontological resources. The content elements are mapped to a patent ontology.

#PATENTS RE-ENGINEERING

RDF-izer (Extractor) for patent XML documents.

##Stanbol Enhancement Engine

The outcome of this project is an OSGi bundle (enhancement engine) for Apache Stanbol that takes a patent document XML file as input and gives RDF triples as output. It uses an XSLT transformation to map the MAREC XML elements and attributes to classes and properties of a patent ontology as explained below.

The name of the engine is marecEngine. This same name must be used when configuring a chain in order to use it.

When called from a client using the REST API the bundle will send back the RDF data extracted.

XSLT 2.0 Tranformation

XSLT 2.0 templates to transform MAREC XML to RDF/XML, ECLA to RDF/XML and supporting data.

Requirements

An XSLT 2.0 processor to transform, and some configuring to change defaults.

What can it do?

Currently it does two types of transformations: patents and classifications. The patents are based on the MAREC standard which is a superset of the patents from EP, US, JP, WO offices. The classifications are based on ECLA (and soon to be on CPC).

From patents, it transforms bibliogrpahic-data, publication and application references, priority-claims, technical-data (classifications), parties (applicants, inventors, assignees), claims.

There is also a small script that converts a list of filing offices from the EPO websites into N-Triples.

Classifications and filing offices are both used in patents.

What is inside?

It comes with scripts and sample data. Tested with saxonb-xslt tool for Debian from command-line.

Scripts

The scripts/ directory contains Bash script to test on sample data.

XSL

The xsl/ directory is for MAREC and ECLA transformations.

Data

There is some sample data under data/.

How-to

Either use the provided Bash script for sample data, or take marec.xsl, common.xsl to your application. Run marec.xsl (it imports common.xsl). Same goes for ecla.xsl.

Name		Name	Last commit message	Last commit date
Latest commit History 246 Commits
data		data
lib		lib
scripts		scripts
src/main		src/main
.gitignore		.gitignore
README.md		README.md
pom.xml		pom.xml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

XSLT 2.0 Tranformation

Requirements

What can it do?

What is inside?

Scripts

XSL

Data

How-to

About

Releases

Packages

Contributors 5

Languages

fusepool/patents-reengineering

Folders and files

Latest commit

History

Repository files navigation

XSLT 2.0 Tranformation

Requirements

What can it do?

What is inside?

Scripts

XSL

Data

How-to

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 5

Languages

Packages