Elasticsearch xml Ingest Processor

This is a generic XML data ingestion plugin. It automatically extracts fields and values from attributes and contents of the XML, without using an XML schema or XPATH. This plugin uses a DFS algorithm inside the XML structure. The most important constraint is that the XML must be well formed, with a root tag that contains all other tags.

If you have more tags with the same name, the plugin will name them with an incremental number.

<root>
    <event>event1</event>
    <event attr="attr_value">event2</event>
    <event>event3</event>
</root>

In the previous example, the algorithm will create four key-values pairs:

root-event#text : event1
root-event2#text : event2
root-event2@attr : attr_value
root-event3#text : event3

It also includes and exclude optional parameter: all fields that match the REGEX will be discarded. For example, root-event2(.*) will exclude the second and the third record in the previous example.

Usage

PUT _ingest/pipeline/xml-pipeline
{
  "description": "XML parsing pipeline",
  "processors": [
    {
      "xml" : {
        "field" : "message",
        "exclude" : ["root-event2(.*)"]
      }
    }
  ]
}

POST _ingest/pipeline/xml-pipeline/_simulate
{
  "docs" : [
    {
      "_source" :
        {
          "source" : "/path/to/the/file",
          "message" : "<root><event time=\"time_value\" name=\"name_value\" uid=\"uid_value\">event_value1</event><event>event_value2</event></root>"
        }
    }
  ]
}

Configuration

Parameter	Use	Required
field	The string that contains the XML document to parse.	Yes
exclude	Pattern that match this REGEX will be discarded (check `java.lang.String.matches` for the correct syntax).	No

Setup

In order to install this plugin, you need to create a zip distribution first by running

gradle clean check

This will produce a zip file in build/distributions.

After building the zip file, you can install it like this

/usr/share/elasticsearch/bin/elasticsearch-plugin install file:///path/to/ingest-xml/build/distribution/ingest-xml-0.0.1.zip

If you need to work on a different version, you can try to change the setting in gradle.properties before compiling. Use at your own risk!

Bugs & TODO

Only basic DOM tags are implemented. Should be added the support for more advanced tags (all possible tags are in the code, commented out, to give a guideline)
The visit of attributes can be improved, according to the DOM logic checking the ATTRIBUTE_NODE type
Define a custom char to separate each field of the path. Now it is - for fields (please note that . returns error), @ for attributes and # for DOM-defined keywords

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
src		src
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
README.md		README.md
build.gradle		build.gradle
gradle.properties		gradle.properties
settings.gradle		settings.gradle

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Elasticsearch xml Ingest Processor

Usage

Configuration

Setup

Bugs & TODO

About

Releases

Packages

Languages

License

andrearomagnoli/elasticsearch-ingest-xml

Folders and files

Latest commit

History

Repository files navigation

Elasticsearch xml Ingest Processor

Usage

Configuration

Setup

Bugs & TODO

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages