-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for generating Azure Blob Storage signed URLs #22
Conversation
Hi Tim, thank you for submitting this PR! I haven't delved into the details of the PR yet, but it looks great. That aside, I like your idea of the Delta Sharing Server SDK, the delta-sharing-server-rs project. Since the Delta Sharing protocol is defined through REST APIs, there may be some considerations to keep in mind; nevertheless, the idea is interesting. As for the cloud-file-signer crate, I believe it would be more beneficial to expand it into a larger project, similar to the delta-sharing-server-rs. If your contribution here serves any purpose, please feel free to incorporate it. |
Regarding the open questions, here are my answers:
|
Thanks for your quick reaction and enthusiasm @ognis1205. Regarding the Regarding the support for the underlying object stores (i.e. AWS, Azure, GCP). Thanks for your insights on my questions. I will be iterating on the implementation and let you know when it's ready for review. |
I am very grateful for your contribution @tdikland ! This is a quick review of the current commit. Regarding the 'cloud-file-signer' crate, I believe there is operational experience in your project, but I think it still needs a bit more incubation. Therefore, I believe it's still early to use the crate. Since this project itself is also in incubation, considering the future development speed, I would prefer to have the implementation of signed URLs for each cloud provider done here as much as possible. The experience gained here can then be reflected in 'cloud-file-signer,' and eventually, I would like to use the crate in 'delta-sharing-rs' as well. Please do not remove |
Regarding the 'cloud-file-signer' crate, even if, for instance, we are not currently using the crate, I am considering stating in the README that we anticipate potential future use of it as a related project. |
Thanks for the feedback. I agree it is not yet desirable to depend on the |
This reverts commit 16ac210.
You don't have to apologize for it at all! I apologize if I have caused any disruption to the development. We should have prepared a more accessible system, such as a Code of Conduct (COC), to make participation easier. It was very helpful that you made various suggestions and asked questions. |
Allright, I think I got it down now. |
Thank you for your cooperation @tdikland :) I believe we have reached a consensus on the general direction of the implementation. It seems there are still some parts that I would like to discuss though, before proceeding to the review, let's go ahead and run some tests for now. @roeap could you please approve the workflow? (I deliberately commented out the When you are ready to review, please remove [WIP] from the title. Or do you prefer to conduct detailed reviews gradually? |
Hi @ognis1205, @roeap, I'm ready for reviewing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is a good idea to bring in a crate that can handle signing URLs for the different cloud object stores where the Delta Table lives. To this end I've published a crate that I used in some other personal projects to do exactly this https://crates.io/crates/cloud-file-signer. This crate defines a CloudFileSigner trait and implements it for a couple of popular object stores (S3, Azure Blob Storage, GCS) and could be extended to other stores (R2, MinIO, etc.)
In the larger rust-arrow ecosystem the community has been developing the object_store
crate which abstracts access to a large number of object stores, without depending on all the individual SDKs to avoid bloat.
@tustvold - would adding methods to acquire pre-signed URls something you see in-scope for object store, or would that dilute the focussed API too much?
I did not see the discussion around delta-sharing-server-rs
, but in the past @ognis1205 and I had discussed that we want to break this up into smaller crates. and split the different "sub-apis" such that one can easily mix-and-match a server that supports the sharing protocol with just the desired features.
Overall looking great - If @ognis1205 gives his OK, I am happy to merge - knowing that we have some way to go before we can release this project more "officially".
url: String::from(url.as_str()), | ||
}), | ||
"s3" | "s3a" => Ok(Self::Aws), | ||
"abfss" | "abfs" => Ok(Self::Azure), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had interesting discussions around this in the past :D - essentially in azure there are many ways a scheme could look like, some "official" some de-facto standards (e.g. what fsspec
adopted).. Also onelake did not make things less confusing.
Technically abfs[s]
refers to a specific implementation of an hdfs compliant driver...
In the end its fine to do this for now, I'll eave some comments later around the general signing question.
We already support this for S3. Azure and GCP are more complex but would be happy for contributions in this space - https://docs.rs/object_store/latest/object_store/signer/trait.Signer.html |
@tdikland @roeap
Is there any place to discuss this issue? I have some opinions on this since the protocol is defined over REST APIs. I think this project might be helpful: https://github.com/HeroicKatora/oxide-auth Depending on the conclusion of our discussion, in some cases, I think it would be acceptable to change the name of this project to something like "delta-sharing-sdk" and have @tdikland and @roeap take the lead in managing the project. This would allow for more efficient development, I believe. |
FWIW I've created two upstream tickets on object_store for adding support for Azure and GCP, along with links to the relevant docs. I think they should be relatively straightforward if anyone wanted to pick them up: |
@tustvold - i know the azure apis quite well, and signing using keys should (hopefully) be straignforward to leverage the existing signing using the master key ... user delegated are just API cal li think :) had a quick glance at GCP - might do that second - seems similar, cannonicalize request and sign ... |
@tdikland - just FYI, started to add url signing for azure to object store apache/arrow-rs#5259. |
@roeap @ognis1205 - FYI I have created two new issues to keep track of the discussion and next steps. |
This PR introduces support for sharing Delta Lake tables stored on Azure Blob Storage.
I've marked the PR as WIP, since I feel there are still a couple of open questions that we may want to address in this or subsequent issues.
The crate that I used to call the Azure SDK for building a pre-signed URL has an async method to do so. I figured this was okay, since all route handlers are async anyway.
Right now the urls are passed through to the client unsigned. We may want to present an error instead.
On a higher level:
https://crates.io/crates/cloud-file-signer
. This crate defines aCloudFileSigner
trait and implements it for a couple of popular object stores (S3, Azure Blob Storage, GCS) and could be extended to other stores (R2, MinIO, etc.)Looking forward to some feedback!
Closes #18