Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ci: Add hdfs test case for s3 and azure (#4168) #4183

Closed
wants to merge 2 commits into from

Conversation

ArmandoZ
Copy link
Contributor

@ArmandoZ ArmandoZ commented Feb 15, 2024

Issue: #4168

Adds test case for hdfs over s3 and azure blob storage. Tested locally with own environment.

TODO: split this into two PRs

env:
# Need a environment variable with this name to securely encrypt them - otherwise there will be warning
# See https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords
HADOOP_CREDSTORE_PASSWORD: op://services/hdfs/credstore_password
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need a environment variable with name HADOOP_CREDSTORE_PASSWORD to securely encrypt the credentials. If this variable is not set, we can still encrypt the credentials with default password, but not so secure.

Reference: https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain more why we need to care about this in test CI?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The credentials such as OPENDAL_AZBLOB_ACCOUNT_KEY are encrypted and saved to /tmp/hdfs/wasb.jceks in the ci machine's local file system, so that hdfs can pick up the credentials. If we don't specify HADOOP_CREDSTORE_PASSWORD, the encrypting key will be a default value that is publicly known.

If no one has chance to read from /tmp/hdfs/wasb.jceks in ci machine, this issue will be fine and we don't need HADOOP_CREDSTORE_PASSWORD. But otherwise, they will be able to decrypt the real credentials

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If no one has chance to read from /tmp/hdfs/wasb.jceks in ci machine, this issue will be fine and we don't need HADOOP_CREDSTORE_PASSWORD. But otherwise, they will be able to decrypt the real credentials

Yes, no one can read. We trust our committers to not allow such thing happen.

@Xuanwo
Copy link
Member

Xuanwo commented Feb 15, 2024

Hi, @ArmandoZ, thanks for your efforts! Would you like to split them into two PRs for eaiser review and testing?

# specific language governing permissions and limitations
# under the License.

name: hdfs_default_azblob
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s3 has minio setup and azblob has azurite setup. They can be tested even in forked repo. Would you like to use the same setup?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure! let me have them updated in new PRs

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree we should rename the test cases and file names accordingly? E.g.

  • hdfs_default_s3 -> hdfs_default_minio_s3
  • s3-core-site.xml -> minio-s3-core-site.xml

would be more specific

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you agree we should rename the test cases and file names accordingly?

Good idea, we can use hdfs_default_on_minio_s3 and minio-s3-core-site.xml.

@ArmandoZ
Copy link
Contributor Author

PR for hdfs test case over minio s3 is ready to review with test passed: #4184

For azurite environment looks like hadoop-azure may not be able to support custom domains (other than *.blob.core.windows.net). Still looking into it

@ArmandoZ
Copy link
Contributor Author

The remaining PR for azurite is also ready - #4185 . Seems to me this PR can be closed if no longer needed

@Xuanwo
Copy link
Member

Xuanwo commented Feb 16, 2024

The remaining PR for azurite is also ready - #4185 . Seems to me this PR can be closed if no longer needed

Merged. Let's close this PR.

@Xuanwo Xuanwo closed this Feb 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants