Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for buffering records on disk allowing control over upload size. #5

Merged
merged 40 commits into from
Jul 30, 2024

Conversation

davemarco
Copy link
Contributor

@davemarco davemarco commented Jul 16, 2024

Description

Plugin can now accumulate logs on disk and upload to s3 once a certain size threshold is reached. Logs are stored
using a "trash compactor" design described below. A recovery mechanism was added for abrupt crashes. On start up, the plugin can find buffered logs stored on disk and send them to s3. Lastly, I added an index as part of the object key which increments after each upload.

I wanted to include a timeout threshold in this PR (i.e. will send buffered logs to s3 after timeout even if size threshold is not reached), however, the PR is already too large. The timeout is non-trivial since we have no easy way to retake execution of Fluent Bit after a timeout assuming no new logs are sent to output plugin. I believe timeout requires the use of goroutines. I will add another PR to incorporate the timeout threshold.

Trash Compactor Disk Store

When designing the buffer, I had to decide whether to buffer logs on disk as uncompressed or compressed. I decided on compressed over uncompressed for a few reasons.

  1. Less disk usage
  2. More performant since we avoid the need serialize log events to disk and then deserialize prior to IR/Zstd encoding
  3. More precise control over upload size. With uncompressed buffer, the upload size is more dependent on the compression ratio

Using a compressed buffer introduced challenges related to data recovery and the compression ratio.

A simple approach for the buffer would be to send all the events destined for one S3 upload to a streaming compressor and only close the stream when the target upload size is reached. However, the streaming compressor will keep frames/blocks open in between receipt of Fluent Bit chunks. Open frames/blocks may not be recoverable after an abrupt crash. Therefore, I decided to "compact" each chunk into its own Zstd frame. When the upload size is reached, stacks of frames are sent to S3. As a result, for the majority of runtime, logs are stored as valid Zstd and can be sent to s3 on startup. An EndofStream byte is appended to Zstd data on upload to terminate IR stream. This approach fixes data recovery issue; however, if the chunks are small, the compressor ratio will be poor.

To fix the compression ratio, I added a second uncompressed IR buffer. First, log events are converted to uncompressed IR and buffered into "bins". Uncompressed IR represents uncompressed trash in "trash compactor". Once the bin is full, the bin is then "compacted" into its own separate Zstd frame. Each bin may contain multiple Fluent Bit chunks. Adding a buffer for uncompressed IR fixes the poor compression ratio associated with "compacting" each Fluent Bit chunk.

Below is summary of control flow for disk buffer:

  • Fluent Bit Engine groups logs by tag and flushes them to output plugin every second
  • Output plugin recognizes tag, parses logs into IR and stores on a tag-specific disk IR buffer
  • if IR size < IR size threshold ,the IR is left in buffer and control is returned to Fluent Bit Engine
  • if IR size > IR size threshold, the IR in buffer is compressed into Zstd and stored in disk Zstd Buffer. The disk IR buffer is truncated. The Zstd frame is explicitly closed.
  • If Zstd size < upload size threshold, nothing is uploaded and control is returned to Fluent Bit Engine.
  • If Zstd size > upload size threshold (i.e. multiple IR bins have been compressed), the Zstd frames are sent to s3, and the Zstd buffer is truncated. After upload, control is returned to Fluent Bit Engine.

There is currently no timeout functionality as mentioned at the top of PR description. As a result, the plugin will not upload logs if log quantity < upload size threshold. A timeout is being added in next PR.

Recovery

On startup the plugin will look for IR and Zstd files in the store directory and group them based on tag. It will then compress the IR into the Zstd file and send it to s3.

Index

I added an index in the s3 object key which increments after each upload, helping prevent namespace collision.

Validation performed

Tested buffering and recovery worked as expected with test logs. Tested files sent to s3 could be archived by clp.

@davemarco davemarco requested a review from davidlion July 16, 2024 20:33
@davemarco
Copy link
Contributor Author

Removed code to shorten PR. Specifically removed recovery code that looks for stores on startup, and sends to s3. Recovery logic will be added back for next PR.

plugins/out_clp_s3/flush/flush.go Outdated Show resolved Hide resolved
plugins/out_clp_s3/flush/flush.go Outdated Show resolved Hide resolved
Comment on lines 169 to 176
if ctx.Config.UseDiskBuffer {
zstdFile, ok := tag.Writer.ZstdBuffer.(*os.File)
if !ok {
return fmt.Errorf("error type assertion from buffer to file failed")
}
// Seek to start of Zstd file.
zstdFile.Seek(0, io.SeekStart)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if this logic should be in IrZstdWriter.Close, but ideally the underlying zstd and ir buffers shouldn't leak out of IrZstdWriter.

@davemarco davemarco requested a review from davidlion July 26, 2024 14:21
@davemarco davemarco changed the title Add support for buffering records to disk allowing control over upload size. Add support for buffering records on disk allowing control over upload size. Jul 30, 2024
@davidlion davidlion merged commit 3ed144a into y-scope:main Jul 30, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants