-
Notifications
You must be signed in to change notification settings - Fork 847
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support User-Defined Object Metadata #4754
Comments
A further wrinkle is that many of the listing APIs do not return this metadata |
We need this somewhat urgently (can hack around it for now but would like to unhack it asap) so I can work on this. |
Can you perhaps expand on your use-case, I'm not sure about the API as originally proposed by this ticket, and was considering instead providing a mechanism similar to what we provide for content type |
We need to read/write objects tags from S3 (and soon other cloud providers). I was planning on spending some time looking at the relevant Cloud provider APIs and seeing what a reasonable way to do this would be. I know with S3 at least it's a little bit annoying as you can set tags in the |
As in https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html or metadata - https://docs.aws.amazon.com/AmazonS3/latest/userguide/UsingMetadata.html They're separate things, and part of why I'm not sure about exposing this
|
Object tagging as in https://docs.aws.amazon.com/AmazonS3/latest/userguide/object-tagging.html
We use tags to drive retention policies
There is a static set of tags but which tags get applied to any given object is dynamic
No, it would not be possible to do this based on some static rules. It would have to be a mechanism that allows tagging of individual put requests. I'm also a little hesitant to try and abstract this as there are a lot of subtle differences between APIs so it would be a little bit hard to make sure the default Alternatively, maybe we could punt on the whole issue by providing a canonical way to extend the
Then there could be standard extansions in the default impl:
|
Yeah, GCS doesn't even have a notion of tags, only metadata 😄
I mean it isn't ideal but we do provide https://docs.rs/object_store/latest/object_store/aws/struct.AmazonS3.html#method.credentials and https://docs.rs/object_store/latest/object_store/aws/struct.AwsAuthorizer.html which would let you fairly easily construct your own requests, including https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutObjectTagging.html |
Right, but it may not really be an issue as long as the semantics are internally consistent within a provider. When it's unclear where to put the metadata (like in the case of AWS) that should be manageable through configuration. It's annoying that semantics are different between providers but that is what it is. I think something like:
where |
Are you referring to https://docs.aws.amazon.com/AmazonS3/latest/userguide/intro-lifecycle-rules.html or some custom system? I'm mainly interested in the importance of being able read them, as writing has a lot more potential options for achieving it that don't leak into the ObjectStore trait
Apart from this crate goes to great lengths to try to provide an API that is consistent across providers... 😅 |
Both. The data is in customer buckets and we add tags so they can manage their own retention. How they do that is up to them, we just provide the tags. Currently we only need to write them. We can obviously work around that (and will in the immediate term) without involving the
Yeah, agreed but the APIs are what they are :). So we can either provide a consistent API which always works the same across providers by always doing additional API calls to grab metadata/tags (which seems like a bad idea). Or we can make the semantics around metadata depend on the provider. Or of course we can do neither and just say that if we can't provide consistent semantics because of provider API differences then it's not going to be exposed in the |
This is not something we should be following, I fought very hard to not include that, and I am increasingly of the opinion we should remove it.
Or a third option is to make these details specified at the point of creation of the ObjectStore, e.g. via some middleware system or otherwise. That way if people have requirements outside the ObjectStore trait, they can plugin at that point. |
This would all be much easier if we didn't have also deal with local filesystems :) I'm leaning more and more towards some sort of extension mechanism. Either exposing the inner client so you can just make arbitrary API calls outside the |
I think adding a tags block to PutOptions that is simply ignored by backends that don't support it, seems harmless to me. I'm in the process of adding conditional put support and so will sequence this after that |
Turns out Azure doesn't even support this consistently... But then again Azure does seem to specialize in inconsistent APIs...
Edit: https://learn.microsoft.com/en-us/azure/storage/blobs/storage-feature-support-in-storage-accounts |
Having played around with this I'm unsure how to support this consistently, stores have different restrictions on what value are valid, and support for this across the stores is wildly inconsistent, even stores from the same provider... Taking a step back, could your use-case encode the lifecycle details in the path of the object instead? |
No, ultimately it's not up to us (this was a solution in place before us and would be monumentally complex to change).
Why is this a problem? If a user adds incorrect metadata (values which are not allowed for whatever reason by the particular provider) then they get an error. It's no different than (for example) writing multi-part file to S3 in which case chunks need to be > 5.5MB (except for the last one). But the same limitation obviously wouldn't apply to local file systems. So at some level you have to know which provider you are using and what the individual semantics are. |
Because in general we try to hide these incompatibilities from you, you can't write to funky paths, the chunking for multipart upload is done for you, etc... We could add TagSets to the crate, and I have a mostly complete PR that does this, but it just seems strange to add something to the ObjectStore trait that is supported by only 1 and a half stores... |
Right, and I think it's a good idea to try and hide the incompatibilities, but if the only way to do that is not add the functionality at all then it may be better to just expose the incompatibilities and let user's deal with it. I guess the "proper" way to do this would be through traits. You could have the base
|
Yeah, that's the approach we've taken for functionality that is disjoint, e.g. the MultiPartStore and Signer traits. This is a bit of a funny one because it is additive to existing functionality, which makes adding a separate trait a bit cumbersome, as you'll have to duplicate your write logic. My current plan is to proceed with the approach in #4999. Provided we add a config option to ignore tags, I think we'll be fine, and will allow people to always write the tags and just have them ignored if not supported |
|
Checking in here. I would like to refocus this ticket on User-Defined Metadata (not tags) as the title suggests. Much of the discussion is around object tags, which are a separate thing. For User-Defined Metadata, I would like to implement a new For get requests, I propose we expose the user-defined metadata the same way as other attributes, as part of the Attribute object. This could be somewhat confusing to users since there's an If no one objects, I would be happy to try and submit a patch for this. I talked to @Xuanwo about this briefly on Twitter and it sounds like no one is actively working on it. |
I've posted a PR for user-defined metadata here: |
This is a draft proposal, and likely needs more polish
Is your feature request related to a problem or challenge? Please describe what you are trying to do.
Many stores provide the ability to associate arbitrary user-defined attributes with objects, it would be useful to expose this.
Describe the solution you'd like
I would like to propose a new
put_opts
call, in a similar vein to the existingget_opts
. This would take aPutOptions
Stores that can't store metadata should return an error if passed metadata, and
ObjectMeta
should be updated to include such metadata.Unix systems can likely make use of xattr to store user metadata
We will likely need to restrict the key names in some manner
Describe alternatives you've considered
Additional context
#4498 also calls for some sort of put_opts style API
#4753 would benefit from this functionality
The text was updated successfully, but these errors were encountered: