Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Place UUID after timestamp in filename #197

Open
cflee opened this issue Oct 29, 2018 · 3 comments
Open

Place UUID after timestamp in filename #197

cflee opened this issue Oct 29, 2018 · 3 comments
Assignees

Comments

@cflee
Copy link

cflee commented Oct 29, 2018

#102 at 5276167#diff-b445c304223f019653cd681dd06bbff5 switched out the hostname in the S3 object key (filename) for a UUID, in order to avoid a small possibility of clashes between multiple Logstash hosts if they have the same hostname and timestamp (see review comment 58ad2a1#r79946161):

def generate_name
filename = "ls.s3.#{SecureRandom.uuid}.#{current_time}"
if tags.size > 0
"#{filename}.tag_#{tags.join('.')}.part#{counter}.#{extension}"
else
"#{filename}.part#{counter}.#{extension}"
end
end

This also has the nice property of ensuring some distribution of prefixes for S3 buckets. However, that is no longer required since Amazon S3's July 2018 announcement that randomizing object prefixes is no longer required for performance.

Therefore, it seems like we could improve this filename format in a way that meets the original intent of avoiding clashes.

Proposed solution

There is a common pattern used by various products (including AWS services) to have some common static prefix, then the dynamic values of a timestamp before the UUID.

There are two large benefits to this:

  • we can filter for files on a rough time range (year, month, day, hour, etc) by making S3 ListBucket request for all keys that begin with specified prefix
  • we can request for files created after a certain time by making S3 ListBucket request for all keys that are lexicographically after a specific object key

Just swapping the position of the UUID and timestamp would be sufficient.

Related: #134 (for fully configurable filenames?)

@anitakrueger
Copy link

I do think fully configurable filenames should be an option. I would like to reference back to the uploaded file and at the moment, that is impossible. Our use case is to crop massive query data in the original event and instead store the original one in S3. Now I have the cropped event and the event in S3, but I have no way of linking them, because the filename is completely random.

@qingkunl
Copy link

I'd love to have configurable filename as well. Thanks!

@Forbzy
Copy link

Forbzy commented Mar 17, 2020

I've been searching for this functionality. I'm very surprised it wasn't already an option. It would be useful to be able to set filenames.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants