-
Notifications
You must be signed in to change notification settings - Fork 124
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add auth type for Azure storage #77
Conversation
# Conflicts: # docs/index.html
# Conflicts: # docs/index.html # spec/index.yaml
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for taking a stab at this! It seems the general problem of needing to configure a Polaris deployment to possibly use "application defaults" is potentially common to all cloud providers, even if the mechanics of what "application defaults" entail will differ.
This could be worth some more discussion on some subtle points in your linked issue #69 -- I'll post some additional thoughts there.
spec/polaris-management-service.yml
Outdated
required: | ||
- tenantId | ||
- authType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll probably want to be conservative about adding required
fields to the API objects, especially if they have impact on persisted entities. In this case, it could probably at least be made optional to be minimally invasive if the default preserves the existing behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! I will make this optional and have another enum: NONE
to fallback to SAS_TOKEN if no authType is specified.
case APPLICATION_DEFAULT: | ||
break; | ||
} | ||
credentialMap.put(PolarisCredentialProperty.AZURE_SAS_TOKEN, sasToken); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of overwriting this config key with "" when not using SAS_TOKEN auth type, if we pull this under the SAS_TOKEN case then in theory the server could be configured to simply either allow total fallthrough to "application defaults" that may look through environment variables, standard credential-config files, VM "metadata server", etc., or inheriting statically-configured credential settings in a Catalog's properties.
Such an option would need to be configurable at the top-level server config though, to specify whether individual catalogs should really be allowed to force using such defaults.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As in another RBAC rule to limit the authType?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I'm thinking one level higher, where the server-level global config can dictate whether or not credential-vending and subscoping is used at all. Some details in this comment: #69 (comment)
In particular,
At a high level we at least need to have a strict separation of effective privileges between the personas who can configure and run the Polaris server itself and those who can call createCatalog. In a mutual-trust setting, it makes sense to have relaxed constraints on the server-level configuration, but it needs to be possible to run the server in a secure mode as well where catalog creators are in a different realm of trust than the admins of the server.
Basically, instead of complicating the API model or RBAC model, maybe it'll be easier to do all this short-circuiting in BasePolarisCatalog.java
instead. In particular, this line is an example of how to define a server-level configuration setting:
polaris/polaris-service/src/main/java/io/polaris/service/catalog/BasePolarisCatalog.java
Line 200 in e89ff19
Boolean allowSpecifyingFileIoImpl = |
And maybe you can put the short-circuit here:
polaris/polaris-service/src/main/java/io/polaris/service/catalog/BasePolarisCatalog.java
Line 792 in e89ff19
tableLocations.forEach(tl -> validateLocationForTableLike(tableIdentifier, tl)); |
after the "validateLocationForTableLike" call and before any attempt to get a subscoped credential is made. Basically just LOGGER.atInfo and then return early.
# Conflicts: # .gitignore
Thanks @dennishuo, agree that the “application defaults” is potentially common to all cloud providers. In fact, I’m borrowing the “application defaults” concept from Nessie. |
# Conflicts: # docs/index.html # polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java # polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureStorageConfigurationInfo.java
@dennishuo can you take another look at this? I notice you were reviewing most recently. |
Continuing discussion from #208 (comment) There are two use cases to consider:
It seems the current state of this PR would only provide a way to do (1), by allowing catalog-creators to set per-catalog config values dictating for Polaris to use APPLICATION_DEFAULT behavior when reading/writing files itself. However, this ability poses a problem for situations where the set of admins who run the Polaris server are different from the set of admins who interact with the Polaris server to create catalogs. For this scenario, it's preferable to set For case (2), I don't think there's yet a proposed solution. The APPLICATION_DEFAULT concept itself is probably not sufficiently expressive for this, because by nature APPLICATION_DEFAULT hides a bunch of "convenience" fallthroughs for trying to look for credentials in the local environment, which might include standard credential files (e.g. ~/.awscredentials), environment variables, or local cloud VM "metadata servers" (e.g. http://169.254.169.254). Not all of these are created equal for suitability for credential-vending, if at all. The most plausible use case would be to have a flow that allows simply handing out VM instance metadata-based tokens for credential-vending:
I believe these are all designed to be "short-lived" credentials where security isn't compromised by handing them out, but they may lack the kinds of "downscoping" semantics normally needed in more advanced Polaris deployments. We could explore an option where these metadata-server-based tokens are returned for credential-vending purposes. |
@dennishuo , I'm not really understanding this scenario, meaning declaring I'm looking at how to use managed identities in Azure and hopefully can change the |
@dennishuo , unfortunately my company policy doesn’t allow me to create managed identity too and I’m not able to test the behaviour. I will test the skip credential subscoping again with |
@dennishuo , after more testing, |
Close it now. Feel free to reopen if needed. |
Description
This PR is to address 69
Type of change
Please delete options that are not relevant.
How Has This Been Tested?
This had being tested locally. With the changes in the PR, ADLS user will be able to choose using either SAS token or DefaultAzureCredentialBuilder for authentication. To verified that it is working, I verified by sending a curl request:
and after following the readme on create service principal, granting roles. I'm able to ran my spark job to write data to ADLS successfully.
Test Configuration:
Checklist:
Please delete options that are not relevant.