-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-44308: [C++][FS][Azure] Implement SAS token authentication #45021
GH-44308: [C++][FS][Azure] Implement SAS token authentication #45021
Conversation
…n. This avoids cheating by using the account key again to generate SAS tokens in tests
cpp/src/arrow/filesystem/azurefs.cc
Outdated
// Assume these are part of a SAS token. Its not ideal to make such an assumption | ||
// but given that a SAS token is a complex set of URI parameters, that could be | ||
// tricky to exhaustively list I think its the best option. | ||
credential_kind = CredentialKind::kSasToken; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't have the SAS token specification that includes parameter names used by a SAS token, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I had a quick search and couldn't find what we need. If you think it's important I can try a bit harder. The closest I found seemed to be unabbreviated versions of what actually appears in the sas token.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are many parameters but can we check them...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that doc is only for user delegated SAS tokens so I unioned it with the parameters for account and service SAS tokens and hopefully the spec is slowly changing.
I wasn't really confident on the best way to define a constant set of strings to do contains
checks against in C++. Since there are only 27 values, I ended up with a constexpr
array and std::find
but please let me know if this is not a good option.
Wait... I might have just accidentally worked out how to avoid any of the special authentication stuff for copying... |
d30e6ce
to
7576c4c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
cpp/src/arrow/filesystem/azurefs.cc
Outdated
@@ -147,6 +159,10 @@ Status AzureOptions::ExtractFromUriQuery(const Uri& uri) { | |||
} else if (kv.first == "background_writes") { | |||
ARROW_ASSIGN_OR_RAISE(background_writes, | |||
::arrow::internal::ParseBoolean(kv.second)); | |||
} else if (std::find(sas_token_query_parameters.begin(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use std::binary_search()
with sorted sas_token_query_parameters
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could do but I would be a bit concerned about keeping sas_token_query_parameters
sorted. It looks like C++20 allows using std::sort
with constexpr
but I believe arrow currently uses C++17. If you are concerned about the complexity of the lookup I think my preference would be to use a std::set
and forget about trying to make it a constexpr
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::set
is OK.
Rationale for this change
SAS token auth is sometimes useful and it the last one we haven't implemented.
What changes are included in this PR?
ConfigureSasCredential
AzureOptions::FromUri
so that simply appending a SAS token to a blob storage URI works. e.g.AzureOptions::FromUri("abfs://[email protected]/?se=2024-12-12T18:57:47Z&sig=pAs7qEBdI6sjUhqX1nrhNAKsTY%2B1SqLxPK%2BbAxLiopw%3D&sp=racwdxylti&spr=https,http&sr=c&sv=2024-08-04")
CopyFile
to use StartCopyFromUri instead of CopyFromUriAre these changes tested?
Yes
CopyFile
AzureOptions::FromUri
with a SAS token.I also made sure to run the tests which connect to real blob storage.
Are there any user-facing changes?
AzureOptions::FromUri
instead of failing fast. IMO this is a regression but still the best option to support SAS token.