-
Notifications
You must be signed in to change notification settings - Fork 8.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HADOOP-18708: Support S3 Client Side Encryption(CSE) With AWS SDK V2 #6884
Conversation
@steveloughran @ahmarsuhail Could you please review the changes ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry, no time to review right now, vector IO problems
I don't want another s3 client as it will only hurt boot time. are already hitting problems here where client init time is O(jar) & I'm thinking of lazy creation of the async one.
look at how we did this before: we altered file lengths in listings.
- no to netty dependencies
- no to any dependency on the new library except when CSE is enabled.
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AFileSystem.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
Outdated
Show resolved
Hide resolved
FYI, Jira on create performance. I'm worrying about how to remove/delay client configuration, so adding a new client makes things way worse: HADOOP-19205 |
by default we are not initializing new s3 client. It is done under a config to provide backward compatibility only when required. |
It's still happening in FileSystem.initialize(). If I get time this week I'll do my PoC of lazy creation of transfer manager and async client, as that's doubled startup time already. all of that will be moved behind S3Store with only the copy() methods exposed. Ultimately I want the S3Client itself hidden behind that API, so here
Actually, |
#6892 is the draft pr with lazy creation; a new s3 client would go in there, but ideally we'd actually push the listing/transformation code into S3AStore so the extra client would never be visible outside it. |
@steveloughran - I really like the idea of lazy initializing async client and new interface for creating s3 client. I will wait for your changes to get in and refactor my code based on it. By this way i could lazy initialize unencryted s3 client as well when the config is enabled. |
exactly. when its not needed: no penalty. |
@steveloughran @ahmarsuhail I have rebased the changes on top of #6892 . Thanks |
2593281
to
bc9cfbd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(intellij seems to be hiding or has lost some of the review comments...let me see once this review is submitted
)
I'm afraid one thing I now want changed across the entire code base is to stop passing s3 client instances around, so that we can move in future to using the AsyncClient everywhere. The S3AStoreImpl class would provide accessor methods to all operations needed, s3afs would invoke them, while subsidiary components/classes would invoke S3Astore via callbacks which would never call to S3AFS. See BulkDeleteOperationCallbacksImpl for an example.
As well as delivering the alleged speed ups of the async client, the isolation should help testing, maintenance etc. My ultimate goal would be for the S3AFS class to to be a lot more minimal, with no Callback ever been implemented in it.
Anyway, that is a big undertaking. For now: no new accessors of s3client. And no network IO operations in S3AUtils, which should all ready be small utility methods (which is also why error handling is moving to )
Proposed:
- ListingOperationCallbacks to provide a createFileStatus() call to create a file status
- different implementations/branches of that call to eiher the S3AUtils class, or something which does the extra IO
this is going to have to be pulled out into listing callbacks, executed within the same auditspan and updating statistics. if a client is doing any IO we need to know who and why (server side) and how long (client)
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/AWSHeaders.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CSEMaterials.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/CSEUtils.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/HeaderProcessing.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManagerImpl.java
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/S3AUtils.java
Show resolved
Hide resolved
@steveloughran I really appreciate your time and effort to review this changes. Indeed it was a detailed review. Please take a look Thanks |
b2207a4
to
3b31b5a
Compare
@shameersss1 unless you have no choice, please try not to rebase and force push prs once reviews have started. makes it a lot harder to see what has changed |
@@ -727,6 +736,28 @@ S3-CSE to work. | |||
</property> | |||
``` | |||
|
|||
#### 2. CSE-CUSTOM | |||
- Set `fs.s3a.encryption.algorithm=CSE-CUSTOM`. | |||
- Set `fs.s3a.encryption.cse.custom.cryptographic.material.manager.class.name=<fully qualified class name>`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the new fs.s3a.encryption.cse
properties be added to core-default.xml
and index.md
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These configurations will be shown as part of the encryption page. similar to https://hadoop.apache.org/docs/r3.4.0/hadoop-aws/tools/hadoop-aws/encryption.html
Yeah, I had to rebase for merge conflicts in hadoop-common module. This commit (838dc24) contains the addressing of review comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that there is a little bit of complexity in S3AFileSystem with if (isCSEEnabled)
conditions in many places. I wonder if, instead of repeating the boolean checks all over the code, we could have a handler interface and extract the code to two implementations: one for isCSEEnabled and one for no CSE.
Other than that, the change looks good to me.
@steveloughran - Gentle reminder for the review |
@shameersss1 what do you think of @raphaelazzolini 's comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HEAD
I want to push all S3 client invocations behind S3AStore interface,
So we can move to the Async client with ease.
Then move things like ListingOperationCallbacks out of S3AFS, and only invoke S3AStore operations.
I'm going to suggest you add a headObject(path)
there which implements the core operation.
Invoke it in S3A fs.getObjectMetadata()
as well as in the new classes.
There are way too many HEAD requests going on now.
The result of any previous HEAD MUST be passed down and its values used/reused.
The requirements before any merge are:
- for a getFileStatus call, the # of HEAD requests SHALL be 1
- for directory listings, the #of head requests SHALL be the same of the number of files in the list.
- for delete and rename, there SHALL be no extra head calls, and .instruction files are processed.
Deletion/rename
the skipping of .instruction files means that
rename and delete won't work if they exist, because OperationCallbacksImpl
uses AcceptAllButS3nDirsAndCSEInstructionFile()
- Write test to demonstrate this, or that I am wrong.
- fix.
I would've expected the rename or delete test to have thrown this up if they had been executed against any directories with some of these files.
Tests should create these files, even if it is just by PUTting 0 byte files of that name.
Then verify that delete and rename handles them.
side issue: what about delete(key)
and BulkDelete
? it'll just leave the files around, won't it? That'll be something to document
minor
Can you review the new failure conditions and make sure the troubleshooting docs are current.
Some of the javadocs and method argument lists are over 100 lines.
Cut them down, in the special case that "the code actually looks worse"
if they aren't. This is to make it easier to view code side-by-side either in your IDE or github PR reviewer
* @throws IOException If an error occurs while retrieving the object. | ||
*/ | ||
@Override | ||
public ResponseInputStream<GetObjectResponse> getObject(S3AStore store, GetObjectRequest request, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move send arg onto next line
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
...p-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/BaseS3AFileSystemHandler.java
Outdated
Show resolved
Hide resolved
hadoop-tools/hadoop-aws/src/main/java/org/apache/hadoop/fs/s3a/impl/ClientManager.java
Show resolved
Hide resolved
@Override | ||
public long getS3ObjectSize(String key, long length, S3AStore store, String bucket, | ||
RequestFactory factory, HeadObjectResponse response) throws IOException { | ||
return CSEUtils.getUnPaddedObjectLength(store.getOrCreateS3Client(), bucket, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
pass the S3AStore into CSEUtils, have it create the client.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
/** | ||
* S3 client side encryption (CSE) utility class. | ||
*/ | ||
@InterfaceAudience.Public |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
* @throws IOException | ||
*/ | ||
@Test | ||
public void testSkippingCSEInstructionFileWithV1Compatibility() throws IOException { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
where you should add the new delete and rename tests. create the file in subdir 1 under methodPath, rename it, verify source dir is gone. delete rename target dir, again assert not found.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you have instruction file is causing too much confusion/difficulty. I think let's ignore instruction case for time being and document that having instruction is not supported.
Class exceptionClass = AccessDeniedException.class; | ||
if (CSEUtils.isCSEEnabled(getEncryptionAlgorithm( | ||
getTestBucketName(getConfiguration()), getConfiguration()).getMethod())) { | ||
exceptionClass = AWSClientIOException.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a different exception? It should be access denies in both cases, shouldn't it?
After all, that is what test is meant to do: delete no permissions are translated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike normal s3 client which throws AccessDeniedException
. S3 encryption client throws AWSClientIOException
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so the SDK is raising some exception? does it have the same semantics as Access Denied? that is: should it be remapped?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, The semantics is same "Caused by: software.amazon.encryption.s3.S3EncryptionClientException: Access Denied (Service: S3, Status Code: 403,"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show me the full stack? 403 normally maps to AccessDeniedException, and it'd be good to keep the same. AWSClientIOException is just our "something failed" if there's nothing else
- Add
maybeTranslateEncryptionClientException()
inErrorTranslation
to look at the exception, if the classname string value matches "software.amazon.encryption.s3.S3EncryptionClientException" then map to AccessDeniedException - call that to S3AUtils.translateException just before the
// no custom handling.
bit
It may seem bad, but look at maybeExtractChannelException()
and other bits to see worse.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The following is the complete stacktrace
[ERROR] testSingleObjectDeleteNoPermissionsTranslated(org.apache.hadoop.fs.s3a.ITestS3AFailureHandling) Time elapsed: 26.964 s <<< ERROR!
org.apache.hadoop.fs.s3a.AWSClientIOException: delete on <>: software.amazon.encryption.s3.S3EncryptionClientException: Access Denied (Service: S3, Status Code: 403,: Access Denied (Service: S3, Status Code: 403, )
at org.apache.hadoop.fs.s3a.S3AUtils.translateException(S3AUtils.java:228)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:124)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:163)
at org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.deleteObjectAtPath(S3AFileSystem.java:2637)
at org.apache.hadoop.fs.s3a.impl.DeleteOperation.deleteObjectAtPath(DeleteOperation.java:383)
at org.apache.hadoop.fs.s3a.impl.DeleteOperation.execute(DeleteOperation.java:234)
at org.apache.hadoop.fs.s3a.impl.DeleteOperation.execute(DeleteOperation.java:70)
at org.apache.hadoop.fs.s3a.impl.ExecutingStoreOperation.apply(ExecutingStoreOperation.java:76)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.trackDuration(IOStatisticsBinding.java:449)
at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteWithoutCloseCheck(S3AFileSystem.java:3595)
at org.apache.hadoop.fs.s3a.S3AFileSystem.delete(S3AFileSystem.java:3571)
at org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.lambda$testSingleObjectDeleteNoPermissionsTranslated$2(ITestS3AFailureHandling.java:207)
at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:500)
at org.apache.hadoop.test.LambdaTestUtils.intercept(LambdaTestUtils.java:386)
at org.apache.hadoop.fs.s3a.ITestS3AFailureHandling.testSingleObjectDeleteNoPermissionsTranslated(ITestS3AFailureHandling.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:59)
at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:56)
at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
at org.junit.rules.TestWatcher$1.evaluate(TestWatcher.java:61)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:299)
at org.junit.internal.runners.statements.FailOnTimeout$CallableStatement.call(FailOnTimeout.java:293)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.lang.Thread.run(Thread.java:750)
Caused by: software.amazon.encryption.s3.S3EncryptionClientException: Access Denied (Service: S3, Status Code: 403,
at software.amazon.encryption.s3.S3EncryptionClient.deleteObject(S3EncryptionClient.java:377)
at org.apache.hadoop.fs.s3a.impl.S3AStoreImpl.lambda$deleteObject$5(S3AStoreImpl.java:677)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.invokeTrackingDuration(IOStatisticsBinding.java:547)
at org.apache.hadoop.fs.statistics.impl.IOStatisticsBinding.lambda$trackDurationOfOperation$5(IOStatisticsBinding.java:528)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:468)
at org.apache.hadoop.fs.s3a.Invoker.retryUntranslated(Invoker.java:431)
at org.apache.hadoop.fs.s3a.impl.S3AStoreImpl.deleteObject(S3AStoreImpl.java:668)
at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObject(S3AFileSystem.java:3223)
at org.apache.hadoop.fs.s3a.S3AFileSystem.deleteObjectAtPath(S3AFileSystem.java:3248)
at org.apache.hadoop.fs.s3a.S3AFileSystem$OperationCallbacksImpl.lambda$deleteObjectAtPath$0(S3AFileSystem.java:2638)
at org.apache.hadoop.fs.s3a.Invoker.lambda$once$0(Invoker.java:165)
at org.apache.hadoop.fs.s3a.Invoker.once(Invoker.java:122)
... 30 more
Caused by: software.amazon.awssdk.services.s3.model.S3Exception: Access Denied (Service: S3, Status Code: 403,
at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleErrorResponse(AwsXmlPredicatedResponseHandler.java:156)
at software.amazon.awssdk.protocols.xml.internal.unmarshall.AwsXmlPredicatedResponseHandler.handleResponse(AwsXmlPredicatedResponseHandler.java:108)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, error translation should do the job.
|
||
- The V1 and V2 clients support reading unencrypted S3 objects, whereas the V3 client does not. In order to read S3 objects in a directory with a mix of encrypted and unencrypted objects. | ||
- Unlike the V2 and V3 clients which always pads 16 bytes, V1 client pads extra bytes to the next multiple of 16. For example if unencrypted object size is 12bytes, V1 client pads extra 4bytes to make it multiple of 16. | ||
- The V1 client supports storing encryption metadata in instruction file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain this in its own section and consequences: file rename loses this, file delete doesn't clean it up
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't be considering instruction file and documented the same.
### Compatibility Issues | ||
|
||
- The V1 and V2 clients support reading unencrypted S3 objects, whereas the V3 client does not. In order to read S3 objects in a directory with a mix of encrypted and unencrypted objects. | ||
- Unlike the V2 and V3 clients which always pads 16 bytes, V1 client pads extra bytes to the next multiple of 16. For example if unencrypted object size is 12bytes, V1 client pads extra 4bytes to make it multiple of 16. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Unlike the V2 and V3 clients which always append 16 bytes to a file, the, V1 client pads extra bytes to the next multiple of 16. For example if unencrypted object size is 28 bytes, the V1 client pads an extra 4 bytes to make it at multiple of 16.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
*/ | ||
public static final String S3_ENCRYPTION_CSE_INSTRUCTION_FILE_SUFFIX = ".instruction"; | ||
|
||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: cut the surplus lnes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
Thinking about this some more: those .instruction files raise a lot of problems; I don't like the way t There's already the notion that files beginning with . or _ are hidden, applications scanning directo So do we really need to hide them at all? Making them visible stops us playing yet more tricks with the What is actually in the files? Is it just some list of attributes encrypted with the same CSE settings? |
Just thinking if there is any better way to solve this. or may be we can simply quote ".instruction" files are not supported and should not be present. @steveloughran anythoughts on this ? |
37eead5
to
1deb43a
Compare
Force pushed to rebase with the trunk. @steveloughran - Thanks a lot for a detailed review. I know it is a pain to review changes which are big (~50 files). I really appreciate the time you put into this. I have addressed your comments with the new commit PS: Ignored instruction file handling for simplicilty |
💔 -1 overall
This message was automatically generated. |
💔 -1 overall
This message was automatically generated. |
🎊 +1 overall
This message was automatically generated. |
@steveloughran - raised a revision by addressing the comments and checkstyle issue ifx. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. Last big bit is the exception translation.
Let's try getting this in this week!
configureClientBuilder(S3AsyncClient.builder(), parameters, conf, bucket) | ||
.httpClientBuilder(httpClientBuilder); | ||
|
||
// TODO: Enable multi part upload with cse once it is available. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Create a followup JIRA and reference it here "multipart upload pending with HADOOP-xyz"
/** | ||
* An interface that defines the contract for handling certain filesystem operations. | ||
*/ | ||
public interface S3AFileSystemHandler { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
|
||
import java.io.IOException; | ||
|
||
import org.apache.hadoop.conf.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit, put block below the "other" package
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
public ResponseInputStream<GetObjectResponse> getObject(S3AStore store, GetObjectRequest request, | ||
RequestFactory factory) throws IOException { | ||
boolean isEncrypted = isObjectEncrypted(store.getOrCreateS3Client(), factory, request.key()); | ||
return isEncrypted ? store.getOrCreateS3Client().getObject(request) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aah.
import java.io.IOException; | ||
import java.net.URI; | ||
|
||
import org.apache.hadoop.conf.Configuration; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
move block below the software. one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
...ls/hadoop-aws/src/test/java/org/apache/hadoop/fs/s3a/ITestS3AClientSideEncryptionCustom.java
Show resolved
Hide resolved
String xAttrPrefix = "header."; | ||
|
||
// Assert KeyWrap Algo | ||
assertEquals("Key wrap algo isn't same as expected", KMS_KEY_WRAP_ALGO, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
assertJ assertThat
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
* tests. | ||
* @param conf Configuations | ||
*/ | ||
private static void unsetEncryption(Configuration conf) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it make sense to move this to S3ATestUtils and use elsewhere, unsetting all encryption options? Then use in ITestS3AClientSideEncryptionCustom
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes make sense!
Class exceptionClass = AccessDeniedException.class; | ||
if (CSEUtils.isCSEEnabled(getEncryptionAlgorithm( | ||
getTestBucketName(getConfiguration()), getConfiguration()).getMethod())) { | ||
exceptionClass = AWSClientIOException.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you show me the full stack? 403 normally maps to AccessDeniedException, and it'd be good to keep the same. AWSClientIOException is just our "something failed" if there's nothing else
- Add
maybeTranslateEncryptionClientException()
inErrorTranslation
to look at the exception, if the classname string value matches "software.amazon.encryption.s3.S3EncryptionClientException" then map to AccessDeniedException - call that to S3AUtils.translateException just before the
// no custom handling.
bit
It may seem bad, but look at maybeExtractChannelException()
and other bits to see worse.
@@ -156,7 +159,12 @@ public void testGeneratePoolTimeouts() throws Throwable { | |||
ContractTestUtils.createFile(fs, path, true, DATASET); | |||
final FileStatus st = fs.getFileStatus(path); | |||
try (FileSystem brittleFS = FileSystem.newInstance(fs.getUri(), conf)) { | |||
intercept(ConnectTimeoutException.class, () -> { | |||
Class exceptionClass = ConnectTimeoutException.class; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
won't be needed once the translation is fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
let's make this the next big change, with everything else blocked before it goes in. That way, no more merge pain |
🎊 +1 overall
This message was automatically generated. |
@steveloughran - Thanks a lot for the review. I have addressed your comment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks; I like the translation now and you can see from the tests how it makes invocation a lot more consistent. Yes -the error translation code is very complicated. But consider this: if we had not remapped all v1 SDK exceptions to IOEs before throwing them, it would have been nearly impossible to upgrade to the V2 SDK without breaking so much code downstream.
keyring = | ||
getKeyringProvider(cseMaterials.getCustomKeyringClassName(), cseMaterials.getConf()); | ||
} catch (RuntimeException e) { | ||
throw new IOException("Failed to instantiate a custom keyring provider", e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you include the error text of e in the new IOE; it's really easy to lose the base exception in log stack chains with deep wrapping/rewrapping.
IOException("Failed to instantiate a custom keyring provider: " + e, e)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make sense!
ack.
* @see SdkException | ||
* @see AwsServiceException | ||
*/ | ||
public static SdkException maybeExtractSdkExceptionFromEncryptionClientException( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a bit long. How about maybeProcessEncryptionClientException()
this says it is handled, without saying exactly what happens.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack.
return ReflectionUtils.newInstance(keyringProviderClass, conf, | ||
new Class[] {Configuration.class}, conf); | ||
} catch (Exception ex) { | ||
throw new RuntimeException("Failed to create Keyring provider", ex); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same comment about including the wrapped exception text
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ack
@steveloughran Yeah, Error translation feature make sense now. I have addressed your comments. |
🎊 +1 overall
This message was automatically generated. |
@steveloughran - Gentle reminder for review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
ok, merged. provide a 3.4 version and I'll pull that in too. |
Sure, will work on branch-3.4 after HADOOP-19336 |
Description of PR
This commit adds support for S3 client side encryption (CSE). CSE can configured in two modes CSE-KMS where keys are provided by AWS KMS and CSE-CUSTOM where custom keys are provided by implementing custom keyring
CSE is implemented using S3EncryptionClient (V3 client) and additional configurations (mentioned below) were added to make it compatible with the older encryption client V1 and V2 which is turned OFF by default.
Inorder to have compatibility with V1 client the following operations are done.
Default Behavior
The configurations to make it backward compatible is turned OFF by default considering the performance implications. The default behavior is as follows
This PR is based on the initial work done by @ahmarsuhail as part of #6164
How was this patch tested?
mvn -Dparallel-tests -DtestsThreadCount=16 clean verify
.