You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
#102 at 5276167#diff-b445c304223f019653cd681dd06bbff5 switched out the hostname in the S3 object key (filename) for a UUID, in order to avoid a small possibility of clashes between multiple Logstash hosts if they have the same hostname and timestamp (see review comment 58ad2a1#r79946161):
This also has the nice property of ensuring some distribution of prefixes for S3 buckets. However, that is no longer required since Amazon S3's July 2018 announcement that randomizing object prefixes is no longer required for performance.
Therefore, it seems like we could improve this filename format in a way that meets the original intent of avoiding clashes.
Proposed solution
There is a common pattern used by various products (including AWS services) to have some common static prefix, then the dynamic values of a timestamp before the UUID.
There are two large benefits to this:
we can filter for files on a rough time range (year, month, day, hour, etc) by making S3 ListBucket request for all keys that begin with specified prefix
we can request for files created after a certain time by making S3 ListBucket request for all keys that are lexicographically after a specific object key
Just swapping the position of the UUID and timestamp would be sufficient.
I do think fully configurable filenames should be an option. I would like to reference back to the uploaded file and at the moment, that is impossible. Our use case is to crop massive query data in the original event and instead store the original one in S3. Now I have the cropped event and the event in S3, but I have no way of linking them, because the filename is completely random.
#102 at 5276167#diff-b445c304223f019653cd681dd06bbff5 switched out the hostname in the S3 object key (filename) for a UUID, in order to avoid a small possibility of clashes between multiple Logstash hosts if they have the same hostname and timestamp (see review comment 58ad2a1#r79946161):
logstash-output-s3/lib/logstash/outputs/s3/temporary_file_factory.rb
Lines 66 to 74 in 9d02bc2
This also has the nice property of ensuring some distribution of prefixes for S3 buckets. However, that is no longer required since Amazon S3's July 2018 announcement that randomizing object prefixes is no longer required for performance.
Therefore, it seems like we could improve this filename format in a way that meets the original intent of avoiding clashes.
Proposed solution
There is a common pattern used by various products (including AWS services) to have some common static prefix, then the dynamic values of a timestamp before the UUID.
There are two large benefits to this:
Just swapping the position of the UUID and timestamp would be sufficient.
Related: #134 (for fully configurable filenames?)
The text was updated successfully, but these errors were encountered: