Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-44308: [C++][FS][Azure] Implement SAS token authentication #45021

Merged

Conversation

Tom-Newton
Copy link
Contributor

@Tom-Newton Tom-Newton commented Dec 13, 2024

Rationale for this change

SAS token auth is sometimes useful and it the last one we haven't implemented.

What changes are included in this PR?

  • Implement ConfigureSasCredential
  • Update AzureOptions::FromUri so that simply appending a SAS token to a blob storage URI works. e.g. AzureOptions::FromUri("abfs://[email protected]/?se=2024-12-12T18:57:47Z&sig=pAs7qEBdI6sjUhqX1nrhNAKsTY%2B1SqLxPK%2BbAxLiopw%3D&sp=racwdxylti&spr=https,http&sr=c&sv=2024-08-04")
    • SAS tokens are made up of a bunch of URI query parameters that I'm not sure we can exhaustively list.
    • Therefore we now assume that any unrecognised URI query parameters are assumed to be part of a SAS token, instead of returning an error status.
  • Update CopyFile to use StartCopyFromUri instead of CopyFromUri

Are these changes tested?

Yes

  • Added new tests for authenticating with SAS and doing some operations including CopyFile
  • Added new tests for AzureOptions::FromUri with a SAS token.

I also made sure to run the tests which connect to real blob storage.

Are there any user-facing changes?

  • SAS token in now supported
  • Unrecognised URI query parameters are ignored by AzureOptions::FromUri instead of failing fast. IMO this is a regression but still the best option to support SAS token.

@Tom-Newton Tom-Newton marked this pull request as ready for review December 13, 2024 17:07
// Assume these are part of a SAS token. Its not ideal to make such an assumption
// but given that a SAS token is a complex set of URI parameters, that could be
// tricky to exhaustively list I think its the best option.
credential_kind = CredentialKind::kSasToken;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have the SAS token specification that includes parameter names used by a SAS token, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I had a quick search and couldn't find what we need. If you think it's important I can try a bit harder. The closest I found seemed to be unabbreviated versions of what actually appears in the sas token.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that doc is only for user delegated SAS tokens so I unioned it with the parameters for account and service SAS tokens and hopefully the spec is slowly changing.

I wasn't really confident on the best way to define a constant set of strings to do contains checks against in C++. Since there are only 27 values, I ended up with a constexpr array and std::find but please let me know if this is not a good option.

cpp/src/arrow/filesystem/azurefs.h Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting review Awaiting review labels Dec 14, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 14, 2024
@Tom-Newton
Copy link
Contributor Author

Wait... I might have just accidentally worked out how to avoid any of the special authentication stuff for copying...

@Tom-Newton Tom-Newton marked this pull request as draft December 14, 2024 12:56
@Tom-Newton Tom-Newton marked this pull request as ready for review December 14, 2024 14:22
cpp/src/arrow/filesystem/azurefs.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs_test.cc Outdated Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Show resolved Hide resolved
cpp/src/arrow/filesystem/azurefs.cc Show resolved Hide resolved
@github-actions github-actions bot removed the awaiting change review Awaiting change review label Dec 15, 2024
@github-actions github-actions bot added the awaiting changes Awaiting changes label Dec 15, 2024
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels Dec 15, 2024
@Tom-Newton Tom-Newton force-pushed the tomnewton/implement_sas_auth_on_azure/GH-44308 branch from d30e6ce to 7576c4c Compare December 15, 2024 11:15
Copy link
Member

@kou kou left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@@ -147,6 +159,10 @@ Status AzureOptions::ExtractFromUriQuery(const Uri& uri) {
} else if (kv.first == "background_writes") {
ARROW_ASSIGN_OR_RAISE(background_writes,
::arrow::internal::ParseBoolean(kv.second));
} else if (std::find(sas_token_query_parameters.begin(),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use std::binary_search() with sorted sas_token_query_parameters?

https://en.cppreference.com/w/cpp/algorithm/binary_search

Copy link
Contributor Author

@Tom-Newton Tom-Newton Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do but I would be a bit concerned about keeping sas_token_query_parameters sorted. It looks like C++20 allows using std::sort with constexpr but I believe arrow currently uses C++17. If you are concerned about the complexity of the lookup I think my preference would be to use a std::set and forget about trying to make it a constexpr.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::set is OK.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels Dec 17, 2024
@kou kou merged commit ba2b9e5 into apache:main Dec 17, 2024
37 of 38 checks passed
@kou kou removed the awaiting merge Awaiting merge label Dec 17, 2024
@github-actions github-actions bot added the awaiting changes Awaiting changes label Dec 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants