Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add AWS presigned URL support #4876

Merged
merged 16 commits into from
Oct 12, 2023
Merged

Conversation

carols10cents
Copy link
Contributor

@carols10cents carols10cents commented Sep 28, 2023

Ok, I'm feeling pretty good about this-- I did test it against a real live S3 bucket and I was able to upload a file with cURL using signed URLs generated by these new functions!

There's always more refactoring that could be done... the private method where I told clippy to let me have many arguments points to the worst of it.

If you take a look commit-by-commit, the last commit adds a public signed_url method directly to AmazonS3, which I think makes it super convenient, but might be more than you'd like to expose. I'm happy to back that commit out!

Which issue does this PR close?

Closes #4763.
Connects to #3027, which is requesting signed GET URLs for all object store implementations that support it-- this PR implements support for both GET and PUT signed URLs, but only for AWS.

Rationale for this change

Enable reusing the authorization and signing code to generate signed URLs that can be used to give permission to upload or view particular objects.

What changes are included in this PR?

A new public method on AwsAuthorizer, currently called sign (as opposed to the existing authorize method) that adds AWS signature query parameters to a specified Url so that it can be given to another user to authorize uploading or viewing a particular resource.

This shares a lot of the signing code with the existing authorize method, so I've pulled out a few private methods, but I don't think the current structure is ideal yet. Thoughts on how far to take this are very welcome, such as if there's another struct or three starting to cry to be let out of AwsAuthorizer, if these should be free functions, etc.

Are there any user-facing changes?

Yes, one (or possibly two) new method(s) in the public API. See the included examples!

@github-actions github-actions bot added the object-store Object Store Interface label Sep 28, 2023
@carols10cents carols10cents force-pushed the refactor-authorize branch 2 times, most recently from f39e63e to eea6f17 Compare September 29, 2023 20:23
@carols10cents carols10cents changed the title WIP: Adding AWS presigned URL support Add AWS presigned URL support Sep 29, 2023
@carols10cents carols10cents marked this pull request as ready for review September 29, 2023 20:34
@carols10cents
Copy link
Contributor Author

Hm. I'm trying to put this into use in IOx and running into a bit of a problem... ideally, we'd like to not have to change much about the rest of the codebase that uses Arc<object_store::DynObjectStore> pretty much everywhere, so Dom and I were thinking of defining a new trait for presigned URLs (that would be a supertrait of object_store::ObjectStore and the bulk import could say it needed that, AmazonS3 could implement it, and the other object store types could implement the new trait but return an error (at some level TBD).

The problem with that I'm running into is that we're not actually using AmazonS3, we're using a LimitStore adapting an AmazonS3, and I can't impl<T: TheNewTrait> TheNewTrait for LimitStore becauseLimitStore(or for that matterThrottledStore) doesn't expose access to the inner T`.

The solutions I'm seeing to this need to be done in object_store, I think:

  • Add pub fn inner(&self) -> &T to LimitStore and ThrottledStore to enable implementing traits that delegate to these adapters' inner object stores, which could enable other object store extensions in the future
  • Define this new trait I tried to write within object_store itself, and have all the implementers other than AmazonS3 return an error saying it's not supported
  • ??? Something else I'm not seeing?

I'm going to try going down these paths to see if I run into different issues, but I'd love thoughts on the best way to solve this from the object_store perspective!

@tustvold
Copy link
Contributor

tustvold commented Oct 4, 2023

I think I might be missing some context on why the pre-signed URL functionality needs to be conflated with the ObjectStore trait.

I would have perhaps expected a dyn Signer that could be passed around to the places that need it. This is based on the potentially incorrect belief that these users would be largely disjoint from those that actually need ObjectStore?

Or in other words you'd just plumb the two separately and nothing would need to know that they potentially are the same thing underneath?

@carols10cents
Copy link
Contributor Author

I think I might be missing some context on why the pre-signed URL functionality needs to be conflated with the ObjectStore trait.

TL;DR IOx; I'm happy to make changes to IOx to do whatever.

That's purely due to the current IOx implementation that has a clap_blocks function for configuring any object store and currently returns an Arc<DynObjectStore>, thus erasing any type information after that point.

For the bulk import functionality, what we've planned for a piece of it is a gRPC service implemented in the router that has an action that returns a pre-signed URL. Currently, the router uses the aforementioned clap_blocks to configure the object store and has the Arc<DynObjectStore> plumbed through it everywhere-- I'd like to avoid changing all that if possible, and supporting presigned URLs through other object stores if those implementations gain the signing capability in the future... does that make more sense? I am relying on the IOx context you have, let me know if I glossed over too much :)

I would have perhaps expected a dyn Signer that could be passed around to the places that need it.

Right, but who defines trait Signer? I think it'd be nice if it was in object_store such that as more object store implementations gain that functionality, users wouldn't have to add the implementations themselves. Would that be ok or do you think it doesn't belong here?


Do you have any general comments about this PR as-is?

@tustvold
Copy link
Contributor

tustvold commented Oct 4, 2023

I need to sleep on it, but perhaps we just add it to the ObjectStore trait with a default not implemented impl, it wouldn't be the weirdest thing there - I'm looking at you append 😅

@tustvold
Copy link
Contributor

tustvold commented Oct 5, 2023

Codes looks good, will continue to think about how we expose this

@carols10cents
Copy link
Contributor Author

Ok, so here's another wrinkle.

The URL signing API should return a full URL so that if the user wants to, they can pass the URL to something that isn't using the object_store crate and use it with cURL, wget, a web browser, what have you.

However, if the user is doing all upload/download through the object_store crate, and particularly through wrappers like LimitStore and such, they will want to send the request using the presigned URL through their object_store, but currently put_request and friends take a Path that gets concatenated onto the object store implementation's full URL (which wouldn't be what we'd want in this case).

So I think the ObjectStore trait will need methods like fn put_presigned(&self, location: &Url, bytes: Bytes) and fn get_presigned(&self, location: &Url), and these should be on the trait so that the adapters like LimitStore and ThrottledStore can support them as well.

In IOx terms, the router will have something that can produce signed urls (configured using the code in clap_blocks, I think I've figured out a good way to do that) and will pass that full URL to a bulk import tool, that will also have an object store configured using the code in clap_blocks (that currently returns an Arc<DynObjectStore> which is a LimitStore wrapping an AmazonS3 in this case) so that we don't have to reimplement all those settings and code handling the upload request.

Does that sound right to you?

@tustvold
Copy link
Contributor

tustvold commented Oct 5, 2023

In IOx terms, the router will have something that can produce signed urls (configured using the code in clap_blocks, https://github.com/influxdata/influxdb_iox/pull/8927) and will pass that full URL to a bulk import tool

This looks sensible to me 👍

So I think the ObjectStore trait will need methods like fn put_presigned(&self, location: &Url, bytes: Bytes) and fn get_presigned(&self, location: &Url), and these should be on the trait so that the adapters like LimitStore and ThrottledStore can support them as well.

that will also have an object store configured using the code in clap_blocks (that currently returns an Arc which is a LimitStore wrapping an AmazonS3 in this case) so that we don't have to reimplement all those settings and code handling the upload request

The upload request is a simple PUT request with the object payload as the body. I'm not sure I follow why this would benefit from being ObjectStore-ified, or what the clap_blocks configuration would be adding?

@carols10cents carols10cents marked this pull request as draft October 5, 2023 17:41
@carols10cents
Copy link
Contributor Author

n IOx terms, the router will have something that can produce signed urls (configured using the code in clap_blocks, https://github.com/influxdata/influxdb_iox/pull/8927) and will pass that full URL to a bulk import tool, that will also have an object store configured using the code in clap_blocks (that currently returns an Arc which is a LimitStore wrapping an AmazonS3 in this case) so that we don't have to reimplement all those settings and code handling the upload request.

As Raphael and I were chatting, I realized this statement of mine was exactly wrong, as Raphael also commented-- the whole point of the thing using the signed URLs is that it DOESN'T have the object_store credentials etc configured-- everything needed will be in the URL. So yeah, ignore this-- the actual upload shouldn't happen through the object_store crate.

@carols10cents carols10cents force-pushed the refactor-authorize branch 3 times, most recently from 993060f to 66f481a Compare October 6, 2023 20:54
@carols10cents
Copy link
Contributor Author

Ok, I think I'm happy-ish with this? I have an iox branch working that I think isn't terrible, maybe?

CI is failing with something about ring, I don't know why :(

@carols10cents carols10cents marked this pull request as ready for review October 6, 2023 21:32
@tustvold
Copy link
Contributor

tustvold commented Oct 9, 2023

Ring failure was likely briansmith/ring#1707 and should be fixed by re-rerunning CI

Copy link
Contributor

@tustvold tustvold left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense to me

object_store/src/aws/mod.rs Outdated Show resolved Hide resolved
object_store/Cargo.toml Outdated Show resolved Hide resolved
@carols10cents
Copy link
Contributor Author

@tustvold fixed, and CI is passing now!! 🎉

@tustvold tustvold merged commit 11205a8 into apache:master Oct 12, 2023
13 checks passed
@carols10cents carols10cents deleted the refactor-authorize branch October 12, 2023 18:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
object-store Object Store Interface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[object_store] Support generating and using signed upload URLs
2 participants