-
Notifications
You must be signed in to change notification settings - Fork 2.9k
REST encryption integration #13225
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
REST encryption integration #13225
Conversation
| // TODO(smaheshwar): This test is taken from https://github.com/apache/iceberg/pull/13066, with the | ||
| // exception of testCtas, but adapted for the REST catalog. When that merges, we can directly use | ||
| // those tests for the REST catalog as well by adding to the parameters method there, to have a | ||
| // single test class for table encryption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Highlighting this - this file is largely taken from #13066
| : Integer.parseInt(dekLength); | ||
| } | ||
|
|
||
| // Force re-creation of encryptingFileIO and encryptionManager |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #13066 (comment)
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
(Bump to remove staleness) |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
(Bump to remove staleness) |
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the [email protected] list. Thank you for your contributions. |
|
(Bump to remove staleness) |
d007aa2 to
889cbbf
Compare
| * @param dataKeyLength length of data encryption key (16/24/32 bytes) | ||
| * @param kmsClient Client of KMS used to wrap/unwrap keys in envelope encryption | ||
| */ | ||
| public StandardEncryptionManager( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could deprecate the old constructor
| import org.junit.jupiter.api.Disabled; | ||
| import org.junit.jupiter.api.TestTemplate; | ||
|
|
||
| // TODO(smaheshwar-pltr): This test is taken from https://github.com/apache/iceberg/pull/13066, with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a CTAS test to the ones authored by @ggershinsky here. Perhaps there's a world in which this PR merges first in which case can undo this comment. There are also review comments on these tests in #13066 that are potentially relevant
|
Given #7770 is merged, curious for thoughts on this PR. |
|
Could you elaborate on the api additions? I think it would help to have some more description on the general direction of this or |
| @Override | ||
| public FileIO io() { | ||
| // TODO(smaheshwar-pltr): Hive fetches encryption props here from uncommitted metadata | ||
| // RESTTableOperations.this.encryptionPropsFromMetadata(uncommittedMetadata); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Hive table operations are a bit different in how they handle things. Commenting this out on the Hive side causes the new CTAS to fail for Hive, and leaving this in for REST causes that test to fail.
I believe the temporary operations using the same IO should not be problematic and the constructor that initialises the encryption props with the LoadTableResponse metadata means that things work here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to remove this TODO but @ggershinsky maybe you'd like to take a look? 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, as long as the CTAS test is passing ok.. Encryption properties are delivered somewhat differently in this catalog.
| } | ||
|
|
||
| @TestTemplate | ||
| public void testCtas() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adding this test, I think it's useful (https://github.com/apache/iceberg/pull/13225/files#r2467103538)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this can be done by extending the https://github.com/apache/iceberg/blob/main/spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/sql/TestCTASEncryption.java ? This way, we won't have an unused table created before the test, and an undeleted table after the test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also, the encryption is verified by a direct read without keys
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you! I somehow missed this new test class, must've been added after I started working on this - updated to use that, looks better now 🙌
Sorry for lack of responses, was out of office for a bit. @ggershinsky, I'm happy to continue with this PR unless the approaches significantly differ, in which case happy to also hand it off or accept a review here! 🙏 |
@ggershinsky, makes sense - though I struggled slightly; I think your comment here #13066 (comment) sums it up well |
@huaxingao thanks for reminding, done! I think this PR is ready for an initial pass 🙏 |
| TableMetadata.Builder builder = TableMetadata.buildFrom(metadata); | ||
| for (Map.Entry<String, EncryptedKey> entry : | ||
| EncryptionUtil.encryptionKeys(encryption()).entrySet()) { | ||
| builder.addEncryptionKey(entry.getValue()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pointing out that this adds the required REST updates to the metadata to be committed. I think that additional REST requirements for these keys are not needed
Sure, no problem. I think the approach is similar. |
| private EncryptingFileIO encryptingFileIO; | ||
| private String tableKeyId; | ||
| private int encryptionDekLength; | ||
| private List<EncryptedKey> encryptedKeysFromMetadata; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please also factor in the #14427 changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll need to think about whether those changes are needed here actually, given the metadata file equality check in updateCurrentMetadata (test passes).
I suspect it's best to have such changes so when there's new metadata, the old keys are kept around. Will have to think if there's a case for this though
| append.appendFile(dataFiles.get(0)); | ||
|
|
||
| // append to the table in the meantime. use a separate load to avoid shared operations | ||
| validationCatalog.loadTable(tableIdent).newFastAppend().appendFile(dataFiles.get(0)).commit(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@szlta , could you have a quick look at this addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, this does look good to me
| RESTCatalogProperties.METRICS_REPORTING_ENABLED, | ||
| RESTCatalogProperties.METRICS_REPORTING_ENABLED_DEFAULT); | ||
| if (mergedProps.containsKey(CatalogProperties.ENCRYPTION_KMS_IMPL)) { | ||
| this.kmsClient = EncryptionUtil.createKmsClient(mergedProps); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add kmsClient to closeables so it’s closed with the catalog?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great point, thanks! 0ea78c1
| encryptedKeysFromMetadata = metadata.encryptionKeys(); | ||
|
|
||
| Map<String, String> tableProperties = metadata.properties(); | ||
| if (tableKeyId == null) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shall we always get tableKeyId in case there is a key rotation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we can always get it from the properties here though I suspect automatic key rotation often keeps key ID the same so perhaps reasonable to require that (otherwise would imagine you'd need tasks to commit new metadata with changed properties on rotation), cc @ggershinsky curious for thoughts / if there've been discussions on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the tableKeyId == null (here and below), it just means the table is not encrypted.
| List<MetadataUpdate> updates; | ||
|
|
||
| TableMetadata metadataToCommit = metadata; | ||
| if (encryption() instanceof StandardEncryptionManager) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it possible to add this update to the metadata before we call commit ?
per API contract we should try commit metadata
iceberg/core/src/main/java/org/apache/iceberg/TableOperations.java
Lines 45 to 63 in aa14aae
| * Replace the base table metadata with a new version. | |
| * | |
| * <p>This method should implement and document atomicity guarantees. | |
| * | |
| * <p>Implementations must check that the base metadata is current to avoid overwriting updates. | |
| * Once the atomic commit operation succeeds, implementations must not perform any operations that | |
| * may fail because failure in this method cannot be distinguished from commit failure. | |
| * | |
| * <p>Implementations must throw a {@link | |
| * org.apache.iceberg.exceptions.CommitStateUnknownException} in cases where it cannot be | |
| * determined if the commit succeeded or failed. For example if a network partition causes the | |
| * confirmation of the commit to be lost, the implementation should throw a | |
| * CommitStateUnknownException. This is important because downstream users of this API need to | |
| * know whether they can clean up the commit or not, if the state is unknown then it is not safe | |
| * to remove any files. All other exceptions will be treated as if the commit has failed. | |
| * | |
| * @param base table metadata on which changes were based | |
| * @param metadata new table metadata with updates | |
| */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, makes sense 🙏
I've mirrored the HiveTableOperations in this PR (I will try to reduce the duplication).
@ggershinsky I think this review comment along with others apply to the Hive side too that the REST-side should ideally have consistency with, WDYT here? Maybe makes sense to patch for Hive if relevant then I can rebase and reflect?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple of points,
- The stated requirement is to "check that the base metadata is current to avoid overwriting updates". This is met. We add a missing field to the new metadata after the check.
- There are lots of calls to
commitfrom various classes. Moving this outside will be technically hard (if it works at all).
| if (base != null) { | ||
| Set<String> removedProps = | ||
| base.properties().keySet().stream() | ||
| .filter(key -> !metadata.properties().containsKey(key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: for consistency ?
| .filter(key -> !metadata.properties().containsKey(key)) | |
| .filter(key -> !metadataToCommit.properties().containsKey(key)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks - eacbfe7
| if (removedProps.contains(TableProperties.ENCRYPTION_TABLE_KEY)) { | ||
| throw new RuntimeException("Cannot remove key in encrypted table"); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be an IllegalState ? instead of RTE ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cc @ggershinsky perhaps similar situation to #13225 (comment).
Thoughts on IllegalArgumentException? (If you alter this property and try to commit, IllegalArgumentException maybe makes sense?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll be ok with any decision on this; will add to the HiveTO
| String dekLength = tableProperties.get(TableProperties.ENCRYPTION_DEK_LENGTH); | ||
| encryptionDekLength = | ||
| (dekLength == null) | ||
| ? TableProperties.ENCRYPTION_DEK_LENGTH_DEFAULT | ||
| : Integer.parseInt(dekLength); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can use PropertyUtils ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, cc @ggershinsky maybe similar situation to #13225 (comment).
(Happy to think / put up these patches if it'd help, but the original code is @ggershinsky's so will wait 🙏)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm working on an "encryption clean up" patch, will add this to the todo list.
| return encryptionManager; | ||
| } | ||
|
|
||
| private void encryptionPropsFromMetadata(TableMetadata metadata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you everyone for great discussions !
There is one more perspective I wanna share, which IMHO we should consider
That metadata.json should be consistent to what we have done in other catalogs, one of the reasons to have a metadata.json is to have portability between different catalogs for example one can migrate from hive to REST and REST to hive, but if we store part of metadata in different place and plan to return in /config, imho it will become a concern to portability
Also please consider there are REST server implementations out there who cache the whole metadata.json in memory / db to avoid lookup to remote store.
In rest we can add requirement for an update that we can't drop it / meanwhile server can also double assert, if there is a concern for malicious clients and the key update or drop, then an implicit trust relation between catalog and client is required which means a catalog can reject the whole update if it doesn't trust the client ? if the concern is server tampering stuff, then a lot of things go at toss ? we requires similar trust relation for other works as wells :
- DEFINER views
- Access Decision Exchange
| encryptionProperties.put(TableProperties.ENCRYPTION_TABLE_KEY, tableKeyId); | ||
| encryptionProperties.put( | ||
| TableProperties.ENCRYPTION_DEK_LENGTH, String.valueOf(encryptionDekLength)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason why we don't list these properties here : https://iceberg.apache.org/docs/1.10.0/docs/configuration/?h=write.m#table-properties
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I'll send a patch to add them.
|
I wonder if #14465 could affect safety of encrypted tables, and if yes, how this can be handled.. |
IIUC, the encryption key must be accessible to the query engine in order for it to decrypt and encrypt the data. Could you please help me understand why allowing a custom builder would raise security concerns? From my perspective, even without that PR, users could still copy the code and replace the default TableOperations implementation. The PR just simply introduces a generic interface to make such customization cleaner and more maintainable. |
|
Thanks @XJDKC , it's likely just a matter of documenting the new interface to make sure the users are aware of the security aspects of the REST TO (if they plan to use table encryption).
Maybe its ok, but we need to check the risk for metadata integrity (if broken, can be used for data leaks and other attacks), as discussed in this PR comments - making sure the client gets the metadata from the REST server, and not from the metadata.json file. |
Thanks both for the discussions here. I don't see anything concerning with the motivation behind the other PR. If a a REST client wanted to read from the metadata JSON in storage, they can do so regardless after calling As such, I suspect the documentation should be at the spec level to advise clients if required, possibly depending on the corresponding REST spec discussions / conclusions (#14486 / https://lists.apache.org/thread/0nn11o4xf1nmw68d4px33sxw5tzzmgbo). |
|
I'm also fine with the motivation. Regarding the support for encrypted tables, I have two concerns,
I think both concerns can be addressed by adding a few lines to the |
This PR implements encryption for the REST catalog.
It's very similar to #13066 but for the REST catalog, not Hive.
cc @rdblue @RussellSpitzer @ggershinsky