-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add decode_size_limit_bytes option. #30
base: main
Are you sure you want to change the base?
Conversation
it "should raise an error if the max bytes are exceeded" do | ||
expect { | ||
subject.decode(maximum_payload << "z") | ||
}.to raise_error(RuntimeError, "input buffer full") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is there a way to provide more context to this exception? input buffer full
feels a little vague, or am I missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately that exception is provided by the FileWatch library, so it'd require a patch there. I should probably wrap and re-raise it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just submitted https://github.com/jordansissel/ruby-filewatch/pull/82/files to enable us to catch a more precise exception.
# Maximum number of bytes for a single line before a fatal exception is raised | ||
# which will stop Logsash. | ||
# The default is 20MB which is quite large for a JSON document | ||
config :decode_size_limit_bytes, :validate => :number, :default => 20 * (1024 * 1024) # 20MB |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Is
decode
the right name? The description says this is bytes for a line, but then we call itdecode
which isn't something mentioned elsewhere in the docs. - Exceeding this will cause a fatal error in Logstash and stop the process? Is this the desired behavior?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the size limit is exceeded, where do we show that this exception will terminate Logstash? I don't see it when I read through the code.
Test planUsed 1 line big json file (~1GB), limited the Java heap to 512Mb, and processed with a file input plugin. It goes in OOM Generate one big json fileUse the script to generate it: require "json"
part = [
{:name => "Jannik", :surname => "Sinner"},
{:name => "Novak", :surname => "Djokovic"},
{:name => "Rafa", :surname => "Nadal"},
{:name => "Roger", :surname => "Federer"},
{:name => "Pete", :surname => "Sampras"},
{:name => "André", :surname => "Agassi"},
{:name => "Rod", :surname => "Laver"},
{:name => "Ivan", :surname => "Lendl"},
{:name => "Bjorn", :surname => "Borg"},
{:name => "John", :surname => "McEnroe"},
{:name => "Jimmy", :surname => "Connors"}
]
json_part = JSON.generate(part)
out_file = File.open("big_single_line.json", "a")
out_file.write "{"
counter = 1
desired_size = 1024 * 1024 * 1024
actual_size = 0
while actual_size < desired_size do
json_fragment = "\"field_#{counter}\": #{json_part}"
actual_size += json_fragment.size
if actual_size < desired_size
json_fragment += ","
end
counter += 1
out_file.write json_fragment
end
out_file.write "}\r\n"
out_file.flush
puts "Done! output file is #{out_file.size} bytes"
out_file.close Configure LogstashIn
and execute the pipeline
Configure this patch PR, in "logstash-codec-json_lines" with "logstash-codec-json_lines", :path => "/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines" and execute bin/logstash-plugin install --no-verify ResultIt fails with following logs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As described in comment doesn't work.
If that's not the correct way to test, please provide guidance on how to verify the fix.
The problem with this approach is that the size limit works only when the data is fully loaded in memory: In such case, as the exception's stacktrace exposes:
the problem is throw before, when the data are still being read from the IO. Can't be catched by this codec because it's raised by the input |
Resolves #29 . This is most useful when people accidentally use this codec on something that is not properly newline delimited. That can easily lead to an OOM.
Superseded by #43