Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling xpath results #36

Open
wiibaa opened this issue May 25, 2016 · 1 comment
Open

Handling xpath results #36

wiibaa opened this issue May 25, 2016 · 1 comment

Comments

@wiibaa
Copy link
Contributor

wiibaa commented May 25, 2016

@GrahamHannington
Copy link

GrahamHannington commented Mar 31, 2017

Background

From the XPath 1.0 W3C Recommendation:

The primary syntactic construct in XPath is the expression. [...] An expression is evaluated to yield an object, which has one of the following four basic types:

  • node-set (an unordered collection of nodes without duplicates)
  • boolean (true or false)
  • number (a floating-point number)
  • string (a sequence of UCS characters)

From the XPath 3.1 W3C Recommendation:

Sequences

An important characteristic of the data model is that there is no distinction between an item (a node, function, or atomic value) and a singleton sequence containing that item. An item is equivalent to a singleton sequence containing that item and vice versa.
A sequence may contain any mixture of nodes, functions, and atomic values.
[...] Sequences replace node-sets from XPath 1.0. In XPath 1.0, node-sets do not contain duplicates.

Sequences were introduced in XPath 2.0.

It’s useful—or at least, interesting—to establish the relevant XPath version in this context.

I am using Elastic Stack with Logstash 5.2.1. On the system where Elastic Stack is installed, entering the following Unix command:

find / -name "xpath"

returns:

/opt/logstash/vendor/bundle/jruby/1.9/gems/nokogiri-1.7.0.1-java/lib/nokogiri/xml/xpath

The corresponding version-specific Nokogiri web page contains a list of features that includes:

XPath 1.0 support for document searching

Further reading appears to confirm that the xpath setting in the Logstash xml filter supports a subset of XPath 1.0.

With that in mind—specifically, this:

node-set (an unordered collection of nodes without duplicates)

It’s interesting (to me 🙂 ) that the Logstash xpath returns an array: that is, an ordered collection.

My two cents

Ideally, Logstash should honor the spec (the XPath 1.0 W3C Recommendation), and return the corresponding (Ruby) data types.

I write “ideally” because, in the context of XPath 1.0 and an expression that yields a node-set, this would mean changing the existing behavior of the Logstash xpath to return a hash (an unordered collection, corresponding to a node-set) instead of an array. It’s more pragmatic to still return an array in this case. Looking ahead, this is also a better fit for sequence, which is an ordered collection.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants