Skip to content

Conversation

@pvary
Copy link
Contributor

@pvary pvary commented Nov 7, 2025

The EncryptedOutputFile objects generated by the StandardEncryptionManager.encrypt method hide the underlying OutputFile.

Unfortunately, there are some hidden requirements in the Parquet implementation for encryption to work properly.
Parquet encryption was only functioning when the target was a HadoopOutputFile. Using StandardEncryptedOutputFile.encryptingOutputFile() produced an AesGcmOutputFile, which resulted in corrupt files.

Updated StandardEncryptedOutputFile to return itself as the encryptingOutputFile, instead of exposing the underlying AesGcmOutputFile. The methods coming from the OutputFile are now wrapped and return the values directly from the wrapped output file.

Added a test to highlight the issue.

The test currently fails without the patch, but successful if we change:

        Parquet.write(encryptedOutputFile.encryptingOutputFile())

to

        Parquet.write(encryptedOutputFile)

After the patch both OutputFile objects could be used.

@pvary pvary force-pushed the stand_encry_fix branch 4 times, most recently from 8d5b0ef to 322292d Compare November 8, 2025 06:14
…ncryptionManager.ecrypt().encryptingOutputFile()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant