-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allocate direct buffers for multipart upload #559
base: main
Are you sure you want to change the base?
Allocate direct buffers for multipart upload #559
Conversation
I think this feature is good. |
storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java
Outdated
Show resolved
Hide resolved
storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java
Outdated
Show resolved
Hide resolved
storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3Storage.java
Outdated
Show resolved
Hide resolved
storage/s3/src/main/java/io/aiven/kafka/tieredstorage/storage/s3/S3StorageConfig.java
Outdated
Show resolved
Hide resolved
8ffac2f
to
05ef2ed
Compare
@jeqo fixed tests, PTAL |
Allow to allocate multipart upload buffers as direct buffer rather than on the heap. We try to set a pretty large multipart upload part size on cluster to optimize throughput and reduce S3 requests. At the same time, we try to keep kafka JVM heap size contained on most kafka installation in order to leave as much memory as possible for the page cache. As a matter of example, we will use 4GB heap size on machines with 64GB available memory. The consequence of using pretty large multipart upload size on contained JVM heap size is that we can pretty easily run out of heap size if we suddenly have to upload many segments to tiered storage. The strategy we propose is to allocate multipart buffer in direct memory so that we can more easily configure direct buffer budget.
05ef2ed
to
9fcdda2
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @dopuskh3! I agree this is a valuable flag for deployments where JVM memory may not be enough for the part uploads.
I added few minor naming suggestions and some testing coverage; but overall looks good.
@@ -58,6 +58,11 @@ public class S3StorageConfig extends AbstractConfig { | |||
private static final String S3_MULTIPART_UPLOAD_PART_SIZE_DOC = "Size of parts in bytes to use when uploading. " | |||
+ "All parts but the last one will have this size. " | |||
+ "Valid values: between 5MiB and 2GiB"; | |||
|
|||
public static final String S3_MULTIPART_UPLOAD_DIRECT_BUFFERS_CONFIG = "s3.multipart.upload.direct.buffers"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
public static final String S3_MULTIPART_UPLOAD_DIRECT_BUFFERS_CONFIG = "s3.multipart.upload.direct.buffers"; | |
public static final String S3_MULTIPART_UPLOAD_BUFFER_ALLOCATION_DIRECT_CONFIG = "s3.multipart.upload.buffer.allocation.direct"; |
@@ -261,6 +272,10 @@ public Boolean pathStyleAccessEnabled() { | |||
return getBoolean(S3_PATH_STYLE_ENABLED_CONFIG); | |||
} | |||
|
|||
public Boolean multipartDirectBuffers() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
public Boolean multipartDirectBuffers() { | |
public Boolean uploadBufferAllocationDirect() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could these tests be parametrized to have coverage for the direct allocation?
@@ -44,12 +44,15 @@ public class S3Storage implements StorageBackend { | |||
private String bucketName; | |||
private int partSize; | |||
|
|||
private boolean multipartDirectBuffers; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit:
private boolean multipartDirectBuffers; | |
private boolean bufferAllocationDirect; |
Allow to allocate multipart upload buffers as direct buffer rather than on the heap.
Reasoning
We try to set a pretty large multipart upload part size on cluster to optimize throughput and reduce S3 requests.
At the same time, we try to keep kafka JVM heap size contained on most kafka installation in order to leave as much memory as possible for the page cache. As a matter of example, we will use 4GB heap size on machines with 64GB available memory.
The consequence of using pretty large multipart upload size on contained JVM heap size is that we can pretty easily run out of heap size if we suddenly have to upload many segments to tiered storage.
The strategy we propose is to allocate multipart buffers in direct memory so that we can more easily configure direct buffer budget.
Usage
We introduced
s3.multipart.upload.direct.buffers
boolean configuration flag (disabled by default).