Place UUID after timestamp in filename #197

cflee · 2018-10-29T09:16:45Z

#102 at 5276167#diff-b445c304223f019653cd681dd06bbff5 switched out the hostname in the S3 object key (filename) for a UUID, in order to avoid a small possibility of clashes between multiple Logstash hosts if they have the same hostname and timestamp (see review comment 58ad2a1#r79946161):

logstash-output-s3/lib/logstash/outputs/s3/temporary_file_factory.rb

Lines 66 to 74 in 9d02bc2

    
           def generate_name 
        
             filename = "ls.s3.#{SecureRandom.uuid}.#{current_time}" 
        
             if tags.size > 0 
        
               "#{filename}.tag_#{tags.join('.')}.part#{counter}.#{extension}" 
        
             else 
        
               "#{filename}.part#{counter}.#{extension}" 
        
             end 
        
           end

This also has the nice property of ensuring some distribution of prefixes for S3 buckets. However, that is no longer required since Amazon S3's July 2018 announcement that randomizing object prefixes is no longer required for performance.

Therefore, it seems like we could improve this filename format in a way that meets the original intent of avoiding clashes.

Proposed solution

There is a common pattern used by various products (including AWS services) to have some common static prefix, then the dynamic values of a timestamp before the UUID.

There are two large benefits to this:

we can filter for files on a rough time range (year, month, day, hour, etc) by making S3 ListBucket request for all keys that begin with specified prefix
we can request for files created after a certain time by making S3 ListBucket request for all keys that are lexicographically after a specific object key

Just swapping the position of the UUID and timestamp would be sufficient.

Related: #134 (for fully configurable filenames?)

The text was updated successfully, but these errors were encountered:

anitakrueger · 2019-07-16T15:43:26Z

I do think fully configurable filenames should be an option. I would like to reference back to the uploaded file and at the moment, that is impossible. Our use case is to crop massive query data in the original event and instead store the original one in S3. Now I have the cropped event and the event in S3, but I have no way of linking them, because the filename is completely random.

qingkunl · 2019-10-16T23:54:36Z

I'd love to have configurable filename as well. Thanks!

Forbzy · 2020-03-17T13:52:14Z

I've been searching for this functionality. I'm very surprised it wasn't already an option. It would be useful to be able to set filenames.

jsvd assigned robbavey Oct 29, 2018

robbavey added the enhancement label Oct 29, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Place UUID after timestamp in filename #197

Place UUID after timestamp in filename #197

cflee commented Oct 29, 2018

anitakrueger commented Jul 16, 2019

qingkunl commented Oct 16, 2019

Forbzy commented Mar 17, 2020

Place UUID after timestamp in filename #197

Place UUID after timestamp in filename #197

Comments

cflee commented Oct 29, 2018

Proposed solution

anitakrueger commented Jul 16, 2019

qingkunl commented Oct 16, 2019

Forbzy commented Mar 17, 2020