Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize normal use case #76

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

dlee-libo
Copy link

Use a handy DFA based algorithm to parse the input, when dealing with most naive configuration.

@elasticcla
Copy link

Hi @dlee-libo, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile?

@colinsurprenant
Copy link
Contributor

@dlee-libo thanks for your contribution! Could you please verify the CLA notification above?
For the record, this follows up #75
Let's ping @yaauie to see if he has any concerns on this approach since he did the latests KV refactors.

Copy link
Contributor

@yaauie yaauie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking a stab at this. I can definitely see an advantage to having an optimised implementation for simple use-cases.

I've left a number of comments in-line, mostly having to do with readability. We maintain a lot of plugins here, and we can't always count on the folks who need to jump on bugs having all of the context in their brains, so things like descriptive names go a long way toward making things maintainable.

I have some concern that this may be a little too naive when it comes to escape sequences, but believe that we can shake those out once some of my naming/structure recommendations have been addressed.


We will also want to be able to force the plugin into naive-only mode so we can prove that we are testing the right things. I imagine doing so with a non-advertised naive_only config directive that errors in register if configured with anything that would cause us to skip the optimised path, and a variety of specs that would validate that behaviour on a variety of edge-cases is the same. I'd be glad to help out with this bit when we get there.

@@ -431,6 +441,28 @@ def filter(event)

private

def naive_conf?()
naive = true
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two things:

  1. we likely want to memoize the result of this operation, since it relies exclusively on inputs that do not change and re-calculating it with every event we process would be wasteful.
  2. in Ruby, the english and and or operators are intended for control flow, and can have surprising results when combined with other statements (see: "How to use Ruby’s English and/or operators without going nuts"). It operates as intended here, but because the LHS of the and is always the variable we're assigning, I would prefer using logical assignment (&&=) .
  def naive_conf?
    @naive_conf ||= begin
      naive = true
      naive &&= (@allow_duplicate_values == true)
      naive &&= (@exclude_keys.empty?           )
      naive &&= (@field_split == ' '            )
      naive &&= (@field_split_pattern.nil?      )
      naive &&= (@include_brackets == true      )
      naive &&= (@include_keys.empty?           )
      naive &&= (@recursive == false            )
      naive &&= (@remove_char_key.nil?          )
      naive &&= (@remove_char_value.nil?        )
      naive &&= (@transform_key.nil?            )
      naive &&= (@transform_value.nil?          )
      naive &&= (@trim_key.nil?                 )
      naive &&= (@trim_value.nil?               )
      naive &&= (@value_split == "="            )
      naive &&= (@value_split_pattern.nil?      )
      naive &&= (@whitespace == "lenient"       )
      naive &&= (@prefix == ""                  )
      naive
    end
  end

import java.util.List;
import java.util.Map;

public class KvFilter {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this filter only supports a specific naive subset of kv filter options, I would prefer that its name indicated such

Suggested change
public class KvFilter {
public class NaiveKvFilter {

int state = 0, valueBegin = 0, valueEnd = 0;
for (int msgLen = message.length(); i < msgLen; ++i) {
c = message.charAt(i);
switch (state) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using an enum for state using meaningful state names would make this a lot more readable.

}
}
if (targetChar == ')') {
this.noneBracket1 = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is surprising to me that a method called lookAhead would mutate the parser's state, and it is not immediately clear in which conditions it does so.

The call pattern seems to indicate that it will only get to this point when an open-bracket was left unclosed and end-of-input was encountered, but I'm not seeing how this value would then be useful afterwards for anything other than a short-circuit to prevent further bracket captures of that type.

If that is the case, variable names like shortCircuiteBracketParen, shortCircuitBracketSquare, and shortCircuitBracketAngle would be hugely helpful for anyone else who has to dive into this code in future. We could also then move the setting of the short-circuits back up to the call-site -- if lookAhead returns null, we know that we can safely short-circuit the type of bracket we were looking for in the first place.

public Map<String, Object> filter(String message) {
Map<String, Object> result = new HashMap<>();
for (int cursor = 0, msgLen = message.length(); cursor < msgLen;) {
ScanResult keyResult = phase1(cursor, message);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From their usage, phase1 and phase2 would be more helpfully named nextKey and nextValue

break;
}
ScanResult valueResult = phase2(keyResult.cursor, message);
if (valueResult.value.length() != 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nitpick: switching between tabs and spaces for indentation

for (int msgLen = message.length(); i < msgLen; ++i) {
char c = message.charAt(i);
switch (state) {
case 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using an enum for state using meaningful state names would make this a lot more readable.

import java.util.Map;

public class KvFilter {
private boolean noneBracket1 = false;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is not immediately clear to me that (1) these are state variables, (2) why there are three of them that only differ by number, or (3) what they are used for. I would prefer more-descriptive variable names.


private ScanResult lookAhead(int i, char targetChar, String message) {
for (int cursor = i + 1, msgLen = message.length(); cursor < msgLen; ++cursor) {
if (message.charAt(cursor) == targetChar) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🤔 I think that this would capture escaped target chars, which could be problematic and break some usage (such as a backslash-escaped close-square-bracket while looking for a close-square-bracket)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants