Conversation
|
Hi @dlee-libo, we have found your signature in our records, but it seems like you have signed with a different e-mail than the one used in yout Git commit. Can you please add both of these e-mails into your Github profile (they can be hidden), so we can match your e-mails to your Github profile? |
|
@dlee-libo thanks for your contribution! Could you please verify the CLA notification above? |
yaauie
left a comment
There was a problem hiding this comment.
Thank you for taking a stab at this. I can definitely see an advantage to having an optimised implementation for simple use-cases.
I've left a number of comments in-line, mostly having to do with readability. We maintain a lot of plugins here, and we can't always count on the folks who need to jump on bugs having all of the context in their brains, so things like descriptive names go a long way toward making things maintainable.
I have some concern that this may be a little too naive when it comes to escape sequences, but believe that we can shake those out once some of my naming/structure recommendations have been addressed.
We will also want to be able to force the plugin into naive-only mode so we can prove that we are testing the right things. I imagine doing so with a non-advertised naive_only config directive that errors in register if configured with anything that would cause us to skip the optimised path, and a variety of specs that would validate that behaviour on a variety of edge-cases is the same. I'd be glad to help out with this bit when we get there.
| private | ||
|
|
||
| def naive_conf?() | ||
| naive = true |
There was a problem hiding this comment.
Two things:
- we likely want to memoize the result of this operation, since it relies exclusively on inputs that do not change and re-calculating it with every event we process would be wasteful.
- in Ruby, the english
andandoroperators are intended for control flow, and can have surprising results when combined with other statements (see: "How to use Ruby’s English and/or operators without going nuts"). It operates as intended here, but because the LHS of theandis always the variable we're assigning, I would prefer using logical assignment (&&=) .
def naive_conf?
@naive_conf ||= begin
naive = true
naive &&= (@allow_duplicate_values == true)
naive &&= (@exclude_keys.empty? )
naive &&= (@field_split == ' ' )
naive &&= (@field_split_pattern.nil? )
naive &&= (@include_brackets == true )
naive &&= (@include_keys.empty? )
naive &&= (@recursive == false )
naive &&= (@remove_char_key.nil? )
naive &&= (@remove_char_value.nil? )
naive &&= (@transform_key.nil? )
naive &&= (@transform_value.nil? )
naive &&= (@trim_key.nil? )
naive &&= (@trim_value.nil? )
naive &&= (@value_split == "=" )
naive &&= (@value_split_pattern.nil? )
naive &&= (@whitespace == "lenient" )
naive &&= (@prefix == "" )
naive
end
end
| import java.util.List; | ||
| import java.util.Map; | ||
|
|
||
| public class KvFilter { |
There was a problem hiding this comment.
If this filter only supports a specific naive subset of kv filter options, I would prefer that its name indicated such
| public class KvFilter { | |
| public class NaiveKvFilter { |
| int state = 0, valueBegin = 0, valueEnd = 0; | ||
| for (int msgLen = message.length(); i < msgLen; ++i) { | ||
| c = message.charAt(i); | ||
| switch (state) { |
There was a problem hiding this comment.
using an enum for state using meaningful state names would make this a lot more readable.
| } | ||
| } | ||
| if (targetChar == ')') { | ||
| this.noneBracket1 = true; |
There was a problem hiding this comment.
It is surprising to me that a method called lookAhead would mutate the parser's state, and it is not immediately clear in which conditions it does so.
The call pattern seems to indicate that it will only get to this point when an open-bracket was left unclosed and end-of-input was encountered, but I'm not seeing how this value would then be useful afterwards for anything other than a short-circuit to prevent further bracket captures of that type.
If that is the case, variable names like shortCircuiteBracketParen, shortCircuitBracketSquare, and shortCircuitBracketAngle would be hugely helpful for anyone else who has to dive into this code in future. We could also then move the setting of the short-circuits back up to the call-site -- if lookAhead returns null, we know that we can safely short-circuit the type of bracket we were looking for in the first place.
| public Map<String, Object> filter(String message) { | ||
| Map<String, Object> result = new HashMap<>(); | ||
| for (int cursor = 0, msgLen = message.length(); cursor < msgLen;) { | ||
| ScanResult keyResult = phase1(cursor, message); |
There was a problem hiding this comment.
From their usage, phase1 and phase2 would be more helpfully named nextKey and nextValue
| break; | ||
| } | ||
| ScanResult valueResult = phase2(keyResult.cursor, message); | ||
| if (valueResult.value.length() != 0) { |
There was a problem hiding this comment.
nitpick: switching between tabs and spaces for indentation
| for (int msgLen = message.length(); i < msgLen; ++i) { | ||
| char c = message.charAt(i); | ||
| switch (state) { | ||
| case 0: |
There was a problem hiding this comment.
using an enum for state using meaningful state names would make this a lot more readable.
| import java.util.Map; | ||
|
|
||
| public class KvFilter { | ||
| private boolean noneBracket1 = false; |
There was a problem hiding this comment.
it is not immediately clear to me that (1) these are state variables, (2) why there are three of them that only differ by number, or (3) what they are used for. I would prefer more-descriptive variable names.
|
|
||
| private ScanResult lookAhead(int i, char targetChar, String message) { | ||
| for (int cursor = i + 1, msgLen = message.length(); cursor < msgLen; ++cursor) { | ||
| if (message.charAt(cursor) == targetChar) { |
There was a problem hiding this comment.
🤔 I think that this would capture escaped target chars, which could be problematic and break some usage (such as a backslash-escaped close-square-bracket while looking for a close-square-bracket)
Use a handy DFA based algorithm to parse the input, when dealing with most naive configuration.