-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Azure: Support vended credentials refresh in ADLSFileIO. #11577
base: main
Are you sure you want to change the base?
Azure: Support vended credentials refresh in ADLSFileIO. #11577
Conversation
cc// @nastra @jackye1995 @amogh-jahagirdar @munendrasn Can you please help with the review. |
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
private LoadCredentialsResponse fetchCredentials() { | ||
Map<String, String> headers = | ||
RESTUtil.merge( | ||
configHeaders(properties), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate why this is needed here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to implement the similar behaviour present in RESTSessionCatalog, where catalog can be configured to pass explicit headers to server by setting the configuration with header.
prefix.
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProvider.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
private static final long MIN_REFRESH_WAIT_MILLIS = 10; | ||
|
||
public AzureSasCredentialRefresher( | ||
Supplier<Pair<String, Long>> sasTokenWithExpirationSupplier, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this actually need to be a supplier given that it's being immediately used in L40?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the supplier is being used during initialization and also during each scheduled refresh to fetch the new credentials. Supplier holds the logic to fetch new credentials from the API endpoint, since we are going to use it multiple times I modelled it as supplier instead of single method call. Please suggest if there is a cleaner way to achieve the same.
azure/src/test/java/org/apache/iceberg/azure/adlsv2/VendedAzureSasCredentialProviderTest.java
Outdated
Show resolved
Hide resolved
.isEqualTo(credential.config().get(ADLS_SAS_TOKEN_PREFIX + STORAGE_ACCOUNT)); | ||
|
||
Thread.sleep(20); | ||
// Since expiration time past to current time, the refresh will fall back at minimum 10ms |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what this comment is trying to say. Also what happens if you remove the sleep time above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Since the credential refresh implementation is based on scheduling a new refresh using the server provided expiration time,
- this test is covering the scenario where server responds with already expired token (expires-at-ms < current-time).
- The scheduling delay for consequent refresh is minimum delay of 10ms as per the logic in refreshDelayMillis()
- If we remove the sleep time, the credentials will be fetched from API only once. Depending on the test case execution time (> 10ms) there can be second credential fetch. To keep the test case behaviour consistent I used thread sleep.
@ChaladiMohanVamsi thanks for working on this. Do you have a way of actually testing this PR with an ADLS environment and see whether the refreshes work? |
Co-authored-by: Eduard Tudenhoefner <[email protected]>
@nastra @amogh-jahagirdar Thanks for the review and suggestions. I have addressed the review comments. I was able to test the credentials refresh logic in ADLS environment, since I don't have the new credentials API endpoint spec implementation in my env I used loadTable endpoint for testing by tweaking few parts of this PR. |
azure/src/main/java/org/apache/iceberg/azure/AzureProperties.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/AzureSasCredentialRefresher.java
Outdated
Show resolved
Hide resolved
azure/src/main/java/org/apache/iceberg/azure/adlsv2/VendedAdlsCredentialProvider.java
Outdated
Show resolved
Hide resolved
import org.slf4j.Logger; | ||
import org.slf4j.LoggerFactory; | ||
|
||
public class VendedAdlsCredentialProvider implements Serializable, AutoCloseable { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this really need to be Serializable
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
VendedAdlsCredentialProvider
is an instance variable in AzureProperties (serializable) which is again an instance variable in ADLSFileIO (serializable). In distributed engines like Spark we serialize ADLSFileIO instance from driver to executor, to ensure each FileIO instance refresh the credentials independently we might need this class to be serializable.
@danielcweeks could you also take a look at this PR please? |
@ChaladiMohanVamsi Can you help to understand how you would initialize ADLSFileIO with these changes. Priorly we can call initialize with the SAS token or account key and initialize the credentials, can u share the sample code as to how you had initialized ADLSFileIO and tried testing with these changes ? |
Co-authored-by: Eduard Tudenhoefner <[email protected]>
@ShyamalaGowri There is no change in the way we initialize the FileIO. The initialization remains same (ref). This change is backward compatible, we can also initialize with the SAS token (existing way). This change introduces a new capability for ADLSFileIO to dynamically fetch sas token from the credential endpoint provided in the config during initialization. |
private LoadCredentialsResponse fetchCredentials() { | ||
Map<String, String> headers = | ||
RESTUtil.merge( | ||
configHeaders(properties), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you elaborate why we need the config headers here? I believe the auth headers should be enough
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tried to implement the similar behaviour present in RESTSessionCatalog for loadTable(), where catalog can be configured to pass explicit headers to server by setting the configuration with header. prefix.
While I was testing these changes with vended credentials from loadTable response (workaround to mimic new credential endpoint), I saw the possibility that REST server implementations may use headers as feature rollout gates/ additional metadata to decide credential expiration time. Similarly new credential endpoint may also support accepting headers from client as additional metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for now let's leave the config headers out, because we also don't have them for vended S3/GCS credential providers
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, will remove them.
Proposed Change
Add support to refresh and consume vended storage credentials for ADLSFileIO.
New Azure properties
client.refresh-credentials-enabled
property to control credential refresh and defaults to true.client.refresh-credentials-endpoint
defines endpoint to fetch the refresh credentials with below spec.Requirements of credential config
adls.sas-token-expire-at-ms.
similar to existing sas token prefixadls.sas-token.
Similar PRs for other FileIOs