Add decode_size_limit_bytes option. #30

andrewvc · 2016-08-05T20:18:28Z

Resolves #29 . This is most useful when people accidentally use this codec on something that is not properly newline delimited. That can easily lead to an OOM.

Superseded by #43

Resolves logstash-plugins#29

talevy · 2016-08-05T20:33:14Z

spec/codecs/json_lines_spec.rb

+      it "should raise an error if the max bytes are exceeded" do
+        expect {
+          subject.decode(maximum_payload << "z")
+        }.to raise_error(RuntimeError, "input buffer full")


is there a way to provide more context to this exception? input buffer full feels a little vague, or am I missing something?

Unfortunately that exception is provided by the FileWatch library, so it'd require a patch there. I should probably wrap and re-raise it.

I just submitted https://github.com/jordansissel/ruby-filewatch/pull/82/files to enable us to catch a more precise exception.

jordansissel · 2016-08-05T21:26:33Z

lib/logstash/codecs/json_lines.rb

+  # Maximum number of bytes for a single line before a fatal exception is raised
+  # which will stop Logsash.
+  # The default is 20MB which is quite large for a JSON document
+  config :decode_size_limit_bytes, :validate => :number, :default => 20 * (1024 * 1024) # 20MB


Is decode the right name? The description says this is bytes for a line, but then we call it decode which isn't something mentioned elsewhere in the docs.

Exceeding this will cause a fatal error in Logstash and stop the process? Is this the desired behavior?

If the size limit is exceeded, where do we show that this exception will terminate Logstash? I don't see it when I read through the code.

andsel · 2024-06-07T14:25:38Z

Test plan

Used 1 line big json file (~1GB), limited the Java heap to 512Mb, and processed with a file input plugin. It goes in OOM

Generate one big json file

Use the script to generate it:

require "json"

part = [ 
    {:name => "Jannik", :surname => "Sinner"}, 
    {:name => "Novak", :surname => "Djokovic"}, 
    {:name => "Rafa", :surname => "Nadal"}, 
    {:name => "Roger", :surname => "Federer"}, 
    {:name => "Pete", :surname => "Sampras"}, 
    {:name => "André", :surname => "Agassi"}, 
    {:name => "Rod", :surname => "Laver"}, 
    {:name => "Ivan", :surname => "Lendl"}, 
    {:name => "Bjorn", :surname => "Borg"}, 
    {:name => "John", :surname => "McEnroe"}, 
    {:name => "Jimmy", :surname => "Connors"}
]
 
json_part = JSON.generate(part)
out_file = File.open("big_single_line.json", "a")
out_file.write "{"

counter = 1
desired_size = 1024 * 1024 * 1024
actual_size = 0
while actual_size < desired_size do
  json_fragment = "\"field_#{counter}\": #{json_part}"
  actual_size += json_fragment.size
  if actual_size < desired_size
  	json_fragment += ","
  end
  counter += 1
  out_file.write json_fragment
end
out_file.write "}\r\n"
out_file.flush

puts "Done! output file is #{out_file.size} bytes"
out_file.close

Configure Logstash

In config/jvm.options set

-Xms512m
-Xmx512m

and execute the pipeline

input {
  file {
    path => "/path/to/big_single_line.json"
    sincedb_path => "/tmp/sincedb"
    mode => "read"
    file_completed_action => "log"
    file_completed_log_path => "/tmp/processed.log"
  
    codec => json_lines {
      decode_size_limit_bytes => 32768
    }
  }
}

output {
  stdout {
    codec => rubydebug
  }
}

Configure this patch PR, in Gemfile
replace

"logstash-codec-json_lines"

with

"logstash-codec-json_lines", :path => "/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines"

and execute

bin/logstash-plugin install --no-verify

Result

It fails with following logs

[2024-06-07T16:09:54,017][INFO ][filewatch.readmode.handlers.readfile][main][0cdeaf0672f90b760dedf003f2c0dcbca174fd7200057d0d92fa085651619d3f] buffer_extract: a delimiter can't be found in current chunk, maybe there are no more delimiters or the delimiter is incorrect or the text before the delimiter, a 'line', is very large, if this message is logged often try increasing the `file_chunk_size` setting. {"delimiter"=>"\n", "read_position"=>413007872, "bytes_read_count"=>32768, "last_known_file_size"=>1078985215, "file_path"=>"/Users/andrea/workspace/logstash_plugins/logstash-codec-json_lines/big_single_line.json"}
[2024-06-07T16:09:54,164][FATAL][org.logstash.Logstash    ][main][0cdeaf0672f90b760dedf003f2c0dcbca174fd7200057d0d92fa085651619d3f] uncaught error (in thread [main]<file)
java.lang.OutOfMemoryError: Java heap space
	at org.jruby.util.ByteList.<init>(ByteList.java:95) ~[jruby.jar:?]
	at org.jruby.RubyString.newStringLight(RubyString.java:466) ~[jruby.jar:?]
	at org.jruby.util.io.EncodingUtils.setStrBuf(EncodingUtils.java:1281) ~[jruby.jar:?]
	at org.jruby.RubyIO.sysreadCommon(RubyIO.java:3277) ~[jruby.jar:?]
	at org.jruby.RubyIO.sysread(RubyIO.java:3266) ~[jruby.jar:?]
	at java.lang.invoke.LambdaForm$DMH/0x00000008007d2000.invokeVirtual(LambdaForm$DMH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007e6800.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007d9800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007d9800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_andsel.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_4_dot_6.lib.filewatch.watched_file.RUBY$method$file_read$0(/Users/andrea/workspace/logstash_andsel/vendor/bundle/jruby/3.1.0/gems/logstash-input-file-4.4.6/lib/filewatch/watched_file.rb:229) ~[?:?]
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x000000080083d800.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007c0800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007c0800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_andsel.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_4_dot_6.lib.filewatch.watched_file.RUBY$method$read_extract_lines$0(/Users/andrea/workspace/logstash_andsel/vendor/bundle/jruby/3.1.0/gems/logstash-input-file-4.4.6/lib/filewatch/watched_file.rb:241) ~[?:?]
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x0000000800846c00.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007d9800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.DelegatingMethodHandle$Holder.delegate(DelegatingMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007d9800.guard(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.Invokers$Holder.linkToCallSite(Invokers$Holder) ~[?:?]
	at Users.andrea.workspace.logstash_andsel.vendor.bundle.jruby.$3_dot_1_dot_0.gems.logstash_minus_input_minus_file_minus_4_dot_4_dot_6.lib.filewatch.read_mode.handlers.read_file.RUBY$block$controlled_read$0(/Users/andrea/workspace/logstash_andsel/vendor/bundle/jruby/3.1.0/gems/logstash-input-file-4.4.6/lib/filewatch/read_mode/handlers/read_file.rb:50) ~[?:?]
	at java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x0000000800f88000.invoke(LambdaForm$MH) ~[?:?]
	at java.lang.invoke.LambdaForm$MH/0x00000008007f8800.invokeExact_MT(LambdaForm$MH) ~[?:?]

andsel

As described in comment doesn't work.
If that's not the correct way to test, please provide guidance on how to verify the fix.

andsel · 2024-08-27T15:15:45Z

The problem with this approach is that the size limit works only when the data is fully loaded in memory:
https://github.com/jordansissel/ruby-filewatch/blob/4ae6ce52e069553516759c4e49389f19f65ec0dd/lib/filewatch/buftok.rb#L68-L74

In such case, as the exception's stacktrace exposes:

java.lang.OutOfMemoryError: Java heap space
	at org.jruby.util.ByteList.<init>(ByteList.java:95) ~[jruby.jar:?]
	at org.jruby.RubyString.newStringLight(RubyString.java:466) ~[jruby.jar:?]
	at org.jruby.util.io.EncodingUtils.setStrBuf(EncodingUtils.java:1281) ~[jruby.jar:?]
	at org.jruby.RubyIO.sysreadCommon(RubyIO.java:3277) ~[jruby.jar:?]
	at org.jruby.RubyIO.sysread(RubyIO.java:3266) ~[jruby.jar:?]

the problem is throw before, when the data are still being read from the IO. Can't be catched by this codec because it's raised by the input logstash-input-file-4.4.6/lib/filewatch/watched_file.rb:229

https://github.com/logstash-plugins/logstash-input-file/blob/55a4a7099f05f29351672417036c1342850c7adc/lib/filewatch/watched_file.rb#L229

Add decode_size_limit_bytes option.

f5e1872

Resolves logstash-plugins#29

andrewvc added enhancement P2 labels Aug 5, 2016

andrewvc self-assigned this Aug 5, 2016

talevy reviewed Aug 5, 2016
View reviewed changes

andrewvc mentioned this pull request Aug 5, 2016

raise BufferFullError when @size_limit reached jordansissel/ruby-filewatch#82

Merged

jordansissel reviewed Aug 5, 2016
View reviewed changes

roaksoax added the status:needs-review label Apr 6, 2021

andsel requested changes Jun 7, 2024

View reviewed changes

roaksoax added status:changes-requested and removed status:needs-review labels Jun 7, 2024

This was referenced Aug 28, 2024

OOM when using the json_lines codec logstash-plugins/logstash-input-file#210

Open

Size limit bytes #43

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add decode_size_limit_bytes option. #30

Add decode_size_limit_bytes option. #30

andrewvc commented Aug 5, 2016 •

edited by andsel

Loading

talevy Aug 5, 2016

andrewvc Aug 5, 2016

andrewvc Aug 5, 2016

jordansissel Aug 5, 2016

jordansissel Aug 5, 2016

andsel commented Jun 7, 2024

andsel left a comment

andsel commented Aug 27, 2024

Add decode_size_limit_bytes option. #30

Are you sure you want to change the base?

Add decode_size_limit_bytes option. #30

Conversation

andrewvc commented Aug 5, 2016 • edited by andsel Loading

talevy Aug 5, 2016

Choose a reason for hiding this comment

andrewvc Aug 5, 2016

Choose a reason for hiding this comment

andrewvc Aug 5, 2016

Choose a reason for hiding this comment

jordansissel Aug 5, 2016

Choose a reason for hiding this comment

jordansissel Aug 5, 2016

Choose a reason for hiding this comment

andsel commented Jun 7, 2024

Test plan

Generate one big json file

Configure Logstash

Result

andsel left a comment

Choose a reason for hiding this comment

andsel commented Aug 27, 2024

andrewvc commented Aug 5, 2016 •

edited by andsel

Loading