Add auth type for Azure storage #77

cgpoh · 2024-08-02T18:16:30Z

Description

This PR is to address 69

Type of change

Please delete options that are not relevant.

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)
This change requires a documentation update

How Has This Been Tested?

This had being tested locally. With the changes in the PR, ADLS user will be able to choose using either SAS token or DefaultAzureCredentialBuilder for authentication. To verified that it is working, I verified by sending a curl request:

curl -i -X POST -H "Authorization: Bearer $PRINCIPAL_TOKEN" -H 'Accept: application/json' -H 'Content-Type: application/json' \
  http://${POLARIS_HOST:-localhost}:8181/api/management/v1/catalogs \
  -d '{"name": "polaris", "type": "INTERNAL", "properties": {
        "default-base-location": "abfss://[email protected]/test/"
    },"storageConfigInfo": {
        "tenantId": "long-tenant-id",
        "storageType": "AZURE",
        "allowedLocations": [
            "abfss://[email protected]/test/"
        ],
        "authType": "APPLICATION_DEFAULT"
    } }'

and after following the readme on create service principal, granting roles. I'm able to ran my spark job to write data to ADLS successfully.

Test Configuration:

Firmware version:
Hardware:
Toolchain:
SDK:

Checklist:

Please delete options that are not relevant.

I have performed a self-review of my code
I have commented my code, particularly in hard-to-understand areas
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
New and existing unit tests pass locally with my changes
Any dependent changes have been merged and published in downstream modules
If adding new functionality, I have discussed my implementation with the community using the linked GitHub issue
I have signed and submitted the ICLA and if needed, the CCLA. See Contributing for details.

# Conflicts: # docs/index.html

# Conflicts: # docs/index.html # spec/index.yaml

dennishuo

Thanks for taking a stab at this! It seems the general problem of needing to configure a Polaris deployment to possibly use "application defaults" is potentially common to all cloud providers, even if the mechanics of what "application defaults" entail will differ.

This could be worth some more discussion on some subtle points in your linked issue #69 -- I'll post some additional thoughts there.

dennishuo · 2024-08-12T03:25:31Z

spec/polaris-management-service.yml

      required:
        - tenantId
+        - authType


We'll probably want to be conservative about adding required fields to the API objects, especially if they have impact on persisted entities. In this case, it could probably at least be made optional to be minimally invasive if the default preserves the existing behavior.

Thanks! I will make this optional and have another enum: NONE to fallback to SAS_TOKEN if no authType is specified.

dennishuo · 2024-08-12T03:31:37Z

...ris-core/src/main/java/io/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java

+      case APPLICATION_DEFAULT:
+        break;
+    }
+    credentialMap.put(PolarisCredentialProperty.AZURE_SAS_TOKEN, sasToken);


Instead of overwriting this config key with "" when not using SAS_TOKEN auth type, if we pull this under the SAS_TOKEN case then in theory the server could be configured to simply either allow total fallthrough to "application defaults" that may look through environment variables, standard credential-config files, VM "metadata server", etc., or inheriting statically-configured credential settings in a Catalog's properties.

Such an option would need to be configurable at the top-level server config though, to specify whether individual catalogs should really be allowed to force using such defaults.

As in another RBAC rule to limit the authType?

Actually, I'm thinking one level higher, where the server-level global config can dictate whether or not credential-vending and subscoping is used at all. Some details in this comment: #69 (comment)

In particular,

At a high level we at least need to have a strict separation of effective privileges between the personas who can configure and run the Polaris server itself and those who can call createCatalog. In a mutual-trust setting, it makes sense to have relaxed constraints on the server-level configuration, but it needs to be possible to run the server in a secure mode as well where catalog creators are in a different realm of trust than the admins of the server.

Basically, instead of complicating the API model or RBAC model, maybe it'll be easier to do all this short-circuiting in BasePolarisCatalog.java instead. In particular, this line is an example of how to define a server-level configuration setting:

polaris/polaris-service/src/main/java/io/polaris/service/catalog/BasePolarisCatalog.java

Line 200 in e89ff19

Boolean allowSpecifyingFileIoImpl =

And maybe you can put the short-circuit here:

polaris/polaris-service/src/main/java/io/polaris/service/catalog/BasePolarisCatalog.java

Line 792 in e89ff19

tableLocations.forEach(tl -> validateLocationForTableLike(tableIdentifier, tl));

after the "validateLocationForTableLike" call and before any attempt to get a subscoped credential is made. Basically just LOGGER.atInfo and then return early.

# Conflicts: # .gitignore

cgpoh · 2024-08-15T00:18:56Z

Thanks for taking a stab at this! It seems the general problem of needing to configure a Polaris deployment to possibly use "application defaults" is potentially common to all cloud providers, even if the mechanics of what "application defaults" entail will differ.

This could be worth some more discussion on some subtle points in your linked issue #69 -- I'll post some additional thoughts there.

Thanks @dennishuo, agree that the “application defaults” is potentially common to all cloud providers. In fact, I’m borrowing the “application defaults” concept from Nessie.

# Conflicts: # docs/index.html # polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java # polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureStorageConfigurationInfo.java

RussellSpitzer · 2024-08-28T03:36:13Z

@dennishuo can you take another look at this? I notice you were reviewing most recently.

dennishuo · 2024-08-29T02:28:56Z

Continuing discussion from #208 (comment)

There are two use cases to consider:

How to make the Polaris server itself use APPLICATION_DEFAULT credentials when reading/writing metadata files itself
How to vend out credentials to external engines that don't go through the currently-supported subscoping flows

It seems the current state of this PR would only provide a way to do (1), by allowing catalog-creators to set per-catalog config values dictating for Polaris to use APPLICATION_DEFAULT behavior when reading/writing files itself. However, this ability poses a problem for situations where the set of admins who run the Polaris server are different from the set of admins who interact with the Polaris server to create catalogs. For this scenario, it's preferable to set SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION=true at the server-config level to make Polaris use APPLICATION_DEFAULT semantics in its local server environment.

For case (2), I don't think there's yet a proposed solution. The APPLICATION_DEFAULT concept itself is probably not sufficiently expressive for this, because by nature APPLICATION_DEFAULT hides a bunch of "convenience" fallthroughs for trying to look for credentials in the local environment, which might include standard credential files (e.g. ~/.awscredentials), environment variables, or local cloud VM "metadata servers" (e.g. http://169.254.169.254).

Not all of these are created equal for suitability for credential-vending, if at all.

The most plausible use case would be to have a flow that allows simply handing out VM instance metadata-based tokens for credential-vending:

I believe these are all designed to be "short-lived" credentials where security isn't compromised by handing them out, but they may lack the kinds of "downscoping" semantics normally needed in more advanced Polaris deployments.

We could explore an option where these metadata-server-based tokens are returned for credential-vending purposes.

cgpoh · 2024-08-30T05:16:39Z

For this scenario, it's preferable to set SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION=true at the server-config level to make Polaris use APPLICATION_DEFAULT semantics in its local server environment.

@dennishuo , I'm not really understanding this scenario, meaning declaring SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION=true at the server-config level and have the compute jobs that are interfacing with Polaris setting the credentials in their respective env vars?

I'm looking at how to use managed identities in Azure and hopefully can change the APPLICATION_DEFAULT option to METADATA_SERVER option

cgpoh · 2024-09-04T10:27:06Z

@dennishuo , unfortunately my company policy doesn’t allow me to create managed identity too and I’m not able to test the behaviour. I will test the skip credential subscoping again with SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION=true as I realised that the last test I conducted, I have my azure credential set in adls cli.

cgpoh · 2024-09-05T03:30:54Z

@dennishuo , after more testing, SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION=true is sufficient for my current use case. We close this PR for now?

flyrain · 2024-09-12T20:53:11Z

Close it now. Feel free to reopen if needed.

feat: add auth type for Azure

3cd5acb

cgpoh requested a review from a team as a code owner August 2, 2024 18:16

cgpoh added 4 commits August 3, 2024 16:33

Merge branch 'main' into azure-auth-type

90883c1

# Conflicts: # docs/index.html

Merge branch 'main' into azure-auth-type

0ccdb2a

Merge branch 'main' into azure-auth-type

e9579be

Merge branch 'main' into azure-auth-type

2f7487b

# Conflicts: # docs/index.html # spec/index.yaml

dennishuo reviewed Aug 12, 2024

View reviewed changes

Merge branch 'main' into azure-auth-type

9718e02

# Conflicts: # .gitignore

cgpoh added 2 commits August 21, 2024 14:37

Merge branch 'main' into azure-auth-type

61f61bc

chore: make authType optional

a93ba77

This was referenced Aug 26, 2024

Fix register_table to properly initialize FileIO and refactor overall FileIO initialization to better reveal bugs #208

Merged

Correct the handling of access delegation mode #211

Merged

Merge branch 'main' into azure-auth-type

5da6414

# Conflicts: # docs/index.html # polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureCredentialsStorageIntegration.java # polaris-core/src/main/java/org/apache/polaris/core/storage/azure/AzureStorageConfigurationInfo.java

cgpoh requested review from jbonofre, ashvina, RussellSpitzer, snazy, vvcephei, takidau and jackye1995 as code owners August 27, 2024 06:41

Merge branch 'main' into azure-auth-type

e72c095

flyrain closed this Sep 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add auth type for Azure storage #77

Add auth type for Azure storage #77

cgpoh commented Aug 2, 2024 •

edited

Loading

dennishuo left a comment

dennishuo Aug 12, 2024

cgpoh Aug 15, 2024

dennishuo Aug 12, 2024

cgpoh Aug 15, 2024

dennishuo Aug 22, 2024

cgpoh commented Aug 15, 2024

RussellSpitzer commented Aug 28, 2024

dennishuo commented Aug 29, 2024

cgpoh commented Aug 30, 2024 •

edited

Loading

cgpoh commented Sep 4, 2024

cgpoh commented Sep 5, 2024

flyrain commented Sep 12, 2024

Add auth type for Azure storage #77

Add auth type for Azure storage #77

Conversation

cgpoh commented Aug 2, 2024 • edited Loading

Description

Type of change

How Has This Been Tested?

Checklist:

dennishuo left a comment

Choose a reason for hiding this comment

dennishuo Aug 12, 2024

Choose a reason for hiding this comment

cgpoh Aug 15, 2024

Choose a reason for hiding this comment

dennishuo Aug 12, 2024

Choose a reason for hiding this comment

cgpoh Aug 15, 2024

Choose a reason for hiding this comment

dennishuo Aug 22, 2024

Choose a reason for hiding this comment

cgpoh commented Aug 15, 2024

RussellSpitzer commented Aug 28, 2024

dennishuo commented Aug 29, 2024

cgpoh commented Aug 30, 2024 • edited Loading

cgpoh commented Sep 4, 2024

cgpoh commented Sep 5, 2024

flyrain commented Sep 12, 2024

cgpoh commented Aug 2, 2024 •

edited

Loading

cgpoh commented Aug 30, 2024 •

edited

Loading