Optimize normal use case #76

dlee-libo · 2018-12-27T04:27:30Z

Use a handy DFA based algorithm to parse the input, when dealing with most naive configuration.

…sting. and fix empty value bug

elasticcla · 2018-12-27T04:30:11Z

Hi @dlee-libo, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

colinsurprenant · 2019-01-03T17:02:17Z

@dlee-libo thanks for your contribution! Could you please verify the CLA notification above?
For the record, this follows up #75
Let's ping @yaauie to see if he has any concerns on this approach since he did the latests KV refactors.

yaauie

Thank you for taking a stab at this. I can definitely see an advantage to having an optimised implementation for simple use-cases.

I've left a number of comments in-line, mostly having to do with readability. We maintain a lot of plugins here, and we can't always count on the folks who need to jump on bugs having all of the context in their brains, so things like descriptive names go a long way toward making things maintainable.

I have some concern that this may be a little too naive when it comes to escape sequences, but believe that we can shake those out once some of my naming/structure recommendations have been addressed.

We will also want to be able to force the plugin into naive-only mode so we can prove that we are testing the right things. I imagine doing so with a non-advertised naive_only config directive that errors in register if configured with anything that would cause us to skip the optimised path, and a variety of specs that would validate that behaviour on a variety of edge-cases is the same. I'd be glad to help out with this bit when we get there.

yaauie · 2019-02-21T20:15:49Z

lib/logstash/filters/kv.rb

@@ -431,6 +441,28 @@ def filter(event)

  private

+  def naive_conf?()
+    naive = true


Two things:

we likely want to memoize the result of this operation, since it relies exclusively on inputs that do not change and re-calculating it with every event we process would be wasteful.

in Ruby, the english and and or operators are intended for control flow, and can have surprising results when combined with other statements (see: "How to use Ruby’s English and/or operators without going nuts"). It operates as intended here, but because the LHS of the and is always the variable we're assigning, I would prefer using logical assignment (&&=) .

def naive_conf? @naive_conf ||= begin naive = true naive &&= (@allow_duplicate_values == true) naive &&= (@exclude_keys.empty? ) naive &&= (@field_split == ' ' ) naive &&= (@field_split_pattern.nil? ) naive &&= (@include_brackets == true ) naive &&= (@include_keys.empty? ) naive &&= (@recursive == false ) naive &&= (@remove_char_key.nil? ) naive &&= (@remove_char_value.nil? ) naive &&= (@transform_key.nil? ) naive &&= (@transform_value.nil? ) naive &&= (@trim_key.nil? ) naive &&= (@trim_value.nil? ) naive &&= (@value_split == "=" ) naive &&= (@value_split_pattern.nil? ) naive &&= (@whitespace == "lenient" ) naive &&= (@prefix == "" ) naive end end

yaauie · 2019-02-21T20:17:36Z