Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Large Input causing OutputBuffer-BufferOverflowException #69

Open
pgaref opened this issue Mar 16, 2016 · 5 comments
Open

Large Input causing OutputBuffer-BufferOverflowException #69

pgaref opened this issue Mar 16, 2016 · 5 comments

Comments

@pgaref
Copy link
Contributor

pgaref commented Mar 16, 2016

Just noticed today a rather interesting issue:
I was testing the simple File source example reading String lines from a file.
Each of these line could be rather big (hundreds of bytes-see below). When I used the MarkerSink (meaning the bytes would not have to go over the network) the exampled worked just fine.
On the other hand when I tried to plug in a real Sink I faced the exception below.
I think the exception comes from the OutputBuffer class where we alocating static buffer size:

  • int headroomSize = this.BATCH_SIZE * 2;
  • buf = ByteBuffer.allocate(headroomSize);

My question here is how we want to handle this case? Split the input into smaller chunks that fit into the buffers? Dynamicly extend the buffers? I know some people before just increased the batch size to bypass it but this is not really a solution is it?

23:29:49 [SingleThreadProcessingEngine] INFO  SingleThreadProcessingEngine$Worker - Configuring SINGLETHREAD processing engine with 1 inputAdapters
23:29:49 [File-Reader] INFO  Config - FileConfig values: 
    file.path = /home/pg1712/jmeter.log
    character.set = UTF-8
    text.source = true
    serde.type = 0

23:29:49 [SingleThreadProcessingEngine] INFO  SingleThreadProcessingEngine$Worker - Configuring SINGLETHREAD processing engine with 1 outputBuffers
[Processor] data send Size: 81 => total Size: 81
[Processor] data send Size: 129 => total Size: 210
23:29:49 [File-Reader] INFO  FileSelector$Reader - Finished text File Reader worker: File-Reader
Exception in thread "SingleThreadProcessingEngine" java.nio.BufferOverflowException
    at java.nio.HeapByteBuffer.put(HeapByteBuffer.java:189)
    at java.nio.ByteBuffer.put(ByteBuffer.java:859)
    at uk.ac.imperial.lsds.seepworker.core.output.OutputBuffer.write(OutputBuffer.java:93)
    at uk.ac.imperial.lsds.seepworker.core.Collector.send(Collector.java:116)
    at Processor.processData(Processor.java:26)
    at uk.ac.imperial.lsds.seepworker.core.SingleThreadProcessingEngine$Worker.run(SingleThreadProcessingEngine.java:111)
    at java.lang.Thread.run(Thread.java:745)
^C23:30:41 [Thread-2] INFO  WorkerShutdownHookWorker - JVM is shutting down...
@raulcf
Copy link
Owner

raulcf commented Mar 17, 2016

I think the exception comes from the OutputBuffer class where we alocating static buffer size:

We cannot do anything before knowing exactly where the exception comes from and what causes it. Once we know that then we can make a decision. It could just be a bug, right?

@pgaref
Copy link
Contributor Author

pgaref commented Mar 17, 2016

Sure - will test some more and update this thread - I thought it was a known limitation ( That we are not splitting big chunks of data, that do not fit in a single buffer, into smaller ones)

@raulcf
Copy link
Owner

raulcf commented Mar 17, 2016

I think one limitation is that the batch size must be larger than one single tuple. This is problematic for dynamically sized tuples, (only when one single write can overflow the buffer), but it should be ok for statically sized ones, as it is something we can check statically and throw an error.

@WJCIV
Copy link
Collaborator

WJCIV commented Apr 6, 2016

The problem exists in OutputBuffer.write because a tuple is read into a buffer (max size = 2*batch size), then if the buffer is "full" (size > batch size) all tuples in the buffer are processed are processed. If the record is too long then it does not fit into the buffer (which may already be partially full from an earlier, smaller, record).

From this I think it is fair to say that batch size must be set in the WorkerConfig to be at least as long as the longest record. It doesn't make sense to try to process less than a single record at a time. We could split up a long record across multiple batches and put it back together on the other side if necessary (presumably if there are variable sized records and only a small fraction are "big"), but that doesn't seem to fit the model as well.

@raulcf
Copy link
Owner

raulcf commented Apr 7, 2016

it is fair to say that batch size must be set in the WorkerConfig to be at least
as long as the longest record.

This is exactly what should happen. The purpose of batching at this level is to decrease the communication/processing ratio per record, for that reason a batch should contain naturally more than 1 record.

We could split up a long record across multiple batches and put it back together on the
other side if necessary (presumably if there are variable sized records and only a small
fraction are "big"), but that doesn't seem to fit the model as well.

This would require a revamp of the current design. That is only justified if we have a use case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants