-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Re-established previous behaviour without a default limit for 'decode_size_limit_bytes' #45
base: main
Are you sure you want to change the base?
Re-established previous behaviour without a default limit for 'decode_size_limit_bytes' #45
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO it makes sense to default to limiting sizes as that helps Logstash continue to function, and also gives the user an idea that we’re going into a dangerous territory.
That said I can agree that it’s a breaking change for people doing legit 30, 60 or 100mb messages.
So I suggest we default to none now but warn the the default will change to 20MB in the future cc @robbavey
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have some concerns that this PR conflates buffer full vs large messages and can lead to locking up the codec. A situation we can think about is the following:
tcp {
port => 3333
decode_size_limit_bytes => 1000
}
Then three TCP packets arrive:
Packet 1, size 900 bytes: contains the first 900 bytes of a 1050 bytes message
What happens: nothing, the message is stored in the buffer
Packet 2, size 150 bytes: contains the last 50 bytes of the first message and an entire 100 bytes second message
What happens: a new Logstash Event is created with the contents of packet 2, which is tagged with _jsonparsetoobigfailure
, but the buffer keeps its initial 900 bytes packet 1.
Packet 3, size 200 bytes: an entire 200 bytes third message
What happens: a new Logstash Event is created with the contents of packet 3, tagged with _jsonparsetoobigfailure
, but the buffer keeps its initial 900 bytes packet 1.
As long as the next message is bigger than the remaining 100 bytes on the buffer then Logstash will never get rid of the first 900 bytes of message 1.
You can try this out with the following code:
# first packet with part of message 1
data = {"a" => "a"*1050}.to_json + "\n"; socket.write(data[0...900])
# second packet with part of message 1 and entire message 2
data = {"a" => "a"*1050}.to_json + "\n"; socket.write(data[900..] + "{\"b\" => \"bbb\"}\n")
# third packet w/ entire message 3
socket.write({"c" => "c"*200}.to_json + "\n")
# fourth packet w/ entire message 4
socket.write({"c" => "c"*200}.to_json + "\n")
# fifth packet w/ entire message 5
socket.write({"c" => "c"*200}.to_json + "\n")
We need to flush out the buffer and generate the event using the buffers data.
There may be other problematic cases with variable sized messages, please check that it behaves correctly in these partial message situations.
I tried locally, and breakpointing at https://github.com/elastic/logstash/blob/ac034a14ee422148483d42a51b68f07d1a38494c/logstash-core/src/main/java/org/logstash/common/BufferedTokenizerExt.java#L80 and in the first iteration I would expect that
So a couple of observations:
|
Did you try with bigger packages larger than MTU (16k)? Same test but larger messages:
|
… not raise an error bu create an Event, tag it appropriately and slice the offending data into 'message' field
…nd tag the event with an error
6e7ae5b
to
0db5a0f
Compare
Using the code suggested (but just keeping the A and B and avoiding the Cs) with this PR, which handles the
Detailed explanationConsidering the code at https://github.com/elastic/logstash/blob/701108f88b3c16a08fb501a71d812b804a79fe68/logstash-core/src/main/java/org/logstash/common/BufferedTokenizerExt.java#L79-L95
What happens here is that at iteration 3 the What should be expected
|
Agreed, on the observations. Easier:
Harder:
The harder solution provides a last resort attempt at consuming the bigger-than-payload message before discarding it. The easier version just drops it. |
… condition, now instead the exception is raised producing a properly tagged event
After the merge of the BufferredTokenizer fix, I think we could proceed with the review of this PR. |
Release notes
Update behaviour of
decode_size_limit_bytes
to do not apply any limitation to the length of a line, eventually tagging the event if it's set.What does this PR do?
If
decode_size_limit_bytes
is unset re-establish the behaviour like before version3.2.0
so that if a long line, close to the heap available space, is processed by the codec then it would throw an OOM and kill Logstash. If the such config is instead explicitly set then instead of continue looping with a warning log, it still produces an event but it's tagged with an error and the content of the message clipped at the firstdecode_size_limit_bytes
bytes.Why is it important/What is the impact to the user?
With #43 which configured a default 20Mb limit on the size of the line to parse, it introduced a breaking change in the behaviour of existing installations. If a Logstash was parsing correctly lines wide 30Mb, then after that change it would encounter an error.
With this PR the existing behaviour is re-established but let the use to explicitly set a limit that once passed, instead of make a continuous iteration on the error, still produces an event but it's tagger with error and contains a slice of the offending input.
Checklist
[ ] I have made corresponding change to the default configuration files (and/or docker env variables)Author's Checklist
How to test this PR locally
_jsonparsetoobigfailure
and amessage
clipped at 512, as sample file you can use1kb_single_line.json
cat /path/to/1kb_single_line.json | bin/logstash -f /path/to/test_pipeline.conf
Related issues
Logs