-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEATURE REQUEST #32] On-Premises S3 / S3 Compatible... #389
base: main
Are you sure you want to change the base?
Conversation
This comment was marked as outdated.
This comment was marked as outdated.
...s-core/src/main/java/org/apache/polaris/core/storage/s3/S3CredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
...s-core/src/main/java/org/apache/polaris/core/storage/s3/S3CredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
spec/polaris-management-service.yml
Outdated
@@ -901,6 +903,58 @@ components: | |||
required: | |||
- roleArn | |||
|
|||
S3StorageConfigInfo: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this specific to s3compat, or is it also meant to be used for s3 itself?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation focuses on OnPrem S3 because there is already the AWS class..
However, in a next step I can try to let it work seamlessly with AWS too :
- I think that roleArn is not mandatory for AWS S3, so let it to the existing implementation for this scenario
- Using access and secret key should work with AWS S3 too
- I have overrided the AWS STS endpoint with S3 endpoint. I could add a modification, maybe with a STS endpoint property... something like "if property is empty" -> "STSclient call AWS default STS endpoint" else -> "STS client call the endpoint setted" or a boolean with a clear and explicit description
- Region, (maybe little more reflexion is needed to avoid conflict)
- Add region property
- I have removed the cross region tweak of the AWS FileIOClientFactory, it can be kept to assure a full compatibility
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we can unify these, I think that would be ideal. But I don't know enough about how S3 vs S3Compatible are similar/different to say how possible that is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will not be too hard to unify, in a next step. I miss AWS access to do tests, but for what I know or seen :
- STS endpoint is a AWS specific endpoint, in S3 compatible solutions, when it is available, it is merged with S3 endpoint.
- Region is available in S3 compatible solutions, but not used a lot, or mostly implemented to be compliant with aws sdk clients.
- about the cross region, it is more related to Polaris and Iceberg with the Iceberg overloaded S3fileIO, but I seen that [AWS] S3FileIO - Add Cross-Region Bucket Access iceberg#11259 have been merged 3 weeks ago --> so next Iceberg version will be ok soon.
MinIO claim that their product API is 100% compatible with AWS S3 API. Almost the same for many alternatives...
- The S3 Compatible implementation could easily propose an optional parameter "arnRole" like the mandatory one in the existing aws class, with less regexp patern to allow more flexibility for some implementation where "aws" inside the string is replaced by the product name (exemple "ecs" for DELL ECS)... It could help for a smooth transition
spec/polaris-management-service.yml
Outdated
enum: | ||
- TOKEN_WITH_ASSUME_ROLE | ||
- KEYS_SAME_AS_CATALOG | ||
- KEYS_DEDICATED_TO_CLIENT |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work? What identifies a client?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here client is anything trying to obtain keys (or security token service) from this catalog (spark, trino,...). There is no particular distinction of identity.
This is not the right term to use in the context of Polaris?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the term is correct, I was just stuck trying to understand how the service will track which keys are dedicated to which client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
It's simply one key for catalog itself, then another unique key for any clients whoever they are. I Let client distinction to the principal/role/privilege level. I think it is hard at the class storage/credential level to stick a pair of keys to each different clients.
It is a basic way, when SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION is True and there is not a temporary token, to not divulge internal catalog key and serve a key that can be deactivated or rotated for security concerns without breaking catalog itself.
After discussing with MonkeyCanCode here
Prod Deployment credentials the advantage in this proposal is that you have not to rely on the main credentials provided at the global Polaris service level.
Today if you revoke the Polaris service credentials for AWS, all catalogs with AWS storages are instantly KO.
In this implementation each catalog is independent. It is the same idea about clients keys, to not breaking catalog when clients keys are revoked or rotated fo security reasons.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have not to rely on the main credentials provided at the global Polaris service level.
I think this is the key point here. I agree that the experience you describe is bad, but I'm not sure that fixing it should be a blocker for s3compat support (or that this is the right fix).
Would you be okay saving this for later, or carving it out into a different PR? In my view relying on the global credentials in production is universally a bad idea, regardless of what STORAGE_TYPE you're using.
dd8d860
to
e2c296b
Compare
polaris-core/src/main/java/org/apache/polaris/core/storage/PolarisCredentialProperty.java
Show resolved
Hide resolved
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
.../java/org/apache/polaris/core/storage/s3compatible/S3CompatibleStorageConfigurationInfo.java
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I don't understand what isn't supported today with catalog-properties. E.g., in https://github.com/apache/polaris/blob/main/polaris-service/src/test/java/org/apache/polaris/service/catalog/PolarisSparkIntegrationTest.java , we use S3MockContainer
as an S3 endpoint, which requires the same path-style access and custom enpdoint configuration as what's included here. Can we not follow the same pattern for minio?
As a rule, I think vending static credentials is not a good idea. Some customization for how the STS client is instantiated, possibly with support for custom profiles for different catalogs could make sense. But I think, ultimately, the credentials returned should always be a temporary session token. Even if we just call GetSessionToken without requiring an IAM role, it would vastly more secure than sending raw credentials.
propertiesMap.put(PolarisCredentialProperty.AWS_ENDPOINT, storageConfig.getS3Endpoint()); | ||
propertiesMap.put( | ||
PolarisCredentialProperty.AWS_PATH_STYLE_ACCESS, | ||
storageConfig.getS3PathStyleAccess().toString()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are catalog properties, not credential-vending properties. These should be set at the catalog-level when it is created. Those properties would then be passed into the FileIO when it is constructed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be refactor for satisfying change to requested : boolean "SKIP_CREDENTIAL_SUBSCOPING_INDIRECTION".
I will try to find a way to move it in the catalog properties. But catalog properties are not forwarded to "S3CompatibleCredentialsStorageIntegration.java", only storage properties by default.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
failed to use catalog porperties, they are not forwarded to this class
public void createStsClient(S3CompatibleStorageConfigurationInfo s3storageConfig) { | ||
|
||
LOGGER.debug("S3Compatible - createStsClient()"); | ||
StsClientBuilder stsBuilder = software.amazon.awssdk.services.sts.StsClient.builder(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this constructed here rather than being passed in as a constructor parameter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Like AWS class ?
Find it weird to put something related to a storage type outside the class and provided by the constructor. Seems Azure class is keeping it inside the class too.
No ?
import software.amazon.awssdk.services.sts.model.AssumeRoleRequest; | ||
import software.amazon.awssdk.services.sts.model.AssumeRoleResponse; | ||
|
||
/** Credential vendor that supports generating */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment seems to just ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That was the unchanged comment copied from AWS class. Removed.
String cli = System.getenv(storageConfig.getS3CredentialsClientAccessKeyId()); | ||
String cls = System.getenv(storageConfig.getS3CredentialsClientSecretAccessKey()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems you could rely on the DefaultCredentialsProvider
and maybe allow profiles to be specified? This would allow for env variables, but also file configuration or other means of retrieving credentials.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not find how to conciliate this with createCatalog() REST API... And not happy with the idea to let this credentials catalog's fully and exclusively at Polaris service level.
Bad compromise ?
Not found how endpoint or other properties can be set from REST API create or update catalog. Only arnRole is accepted as mandatory parameter today in AWS storage type.
I agree, by default now it is STS "AssumeRole" without any "role". Raw credentials are poor fallback scenario when STS are not available. There is enterprise context where STS, assumeRole etc are not allowed. Only pair of keys are available. By example, Dell ECS require additional policy to enable STS AssumeRole. It's not activated out of the box. I tried to be explicit about this degraded security pattern. "GetSessionToken" is not part of S3 API, it is IAM API. It is not available in MinIO |
Hello everyone this PR seems to be blocked for a month now, is there anything we can do to make it to the end ? 🙏 |
.../org/apache/polaris/core/storage/s3compatible/S3CompatibleCredentialsStorageIntegration.java
Outdated
Show resolved
Hide resolved
fb42d0e
to
e5f227d
Compare
Sorry for the one-month break. I tried the approaches proposed in the comments. |
e5f227d
to
eec522f
Compare
eec522f
to
342f911
Compare
Refactored after many comments :
Thank you |
Description (edited) :
This is a proposition of Polaris core storage implementation, copy of the aws + new parameters : endpoint, path style...
By default it is trying to respect the same behavior about credentials than AWS (IAM STS). The same dynamic policy is applied, limiting the scope to the data queried. This is tested and is working with MinIO, and should works also with Dell ECS, NetApp StorageGRID, etc...
Otherwise if STS is not available 'Skip_Credential_Subscoping_Indirection' = true will disabling Polaris "SubScoping" of the credentials
Let me know your opinion about this design proposal.
Thank you
Included Changes:
Type of change:
Checklist:
Please delete options that are not relevant.