Design Doc: Memory Usage

Prior to SiteMesh3, the memory usage was not very efficient. This page outlines the problems and how they will be tackled.

Update April 2009: This work is now complete – the goals have been met.

Issues

The buffered page content is copied around to many different structures during its short lifetime. From char[] to String to CharBuffer to another char[] etc. This is as it passes through the various layers (OutputStream, encoding, lexer, tag processor, content).
Much of the page contents is duplicated. For example, the entire original page is captured, but also the contents of the body and head tags, which are already in the original.

Goals

Use just one structure to represent the page and ensure that each layer can work directly with it and not have to copy into another structure.
Do not copy data out of it – instead just keep pointers to where the data resides in the original.

The outcome of this will be much more efficient memory usage, and improved performance (due to less allocation and copying).

Implementation notes

The single structure will be java.nio.CharBuffer. That’s what it’s there for and it is versatile enough to be accessed by the other layers.
Buffering the original content will be tricky as the length is not known in advance. If the Content-Length header is set, the buffer should be that size. Otherwise, a CharBuffer needs to be dynamically grown.

SM2 notes

This is the lifecycle of a chunk of content:

ByteBufferBuillder.write(): Copy data to current byte[] buffer chunk at end of LinkedList.
ByteBufferBuilder.toByteBuilder(): Allocate new ByteBuffer for entire contents, chunks.
TextEncoder.encode(): Allocate new CharBuffer, and encode ByteBuffer into it.
HtmlContentProcessor.build(): CharBuffer.toString() – allocates String and copies contents.
HtmlContentProcessor.build(): CharBuffer.toCharArray() – allocates char[] and copies contents.
HtmlContentProcessor.build(): Allocate 4k output CharArray for body, plus additional CharArray for head contents.
Lexer.zzRefill(): Reads sections of data into 2k buffer, after reshuffling data in current buffer.
TagTokenizer.pushBack(): For each token create new String from buffer.
State.handleText(): Copy current buffer to output CharArray.
In decorators: Copy output CharArray to final ServletOutputStream.

Ouch.

Status

Content Processor and Content Buffer completely rearchitected. Met goals, showing roughly half the memory usage and 3 times performance speed up (on typical pages).

Provide feedback

Saved searches