-
Notifications
You must be signed in to change notification settings - Fork 499
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ci: Add hdfs test case for s3 and azure (#4168) #4183
Conversation
env: | ||
# Need a environment variable with this name to securely encrypt them - otherwise there will be warning | ||
# See https://hadoop.apache.org/docs/current/hadoop-project-dist/hadoop-common/CredentialProviderAPI.html#Keystore_Passwords | ||
HADOOP_CREDSTORE_PASSWORD: op://services/hdfs/credstore_password |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need a environment variable with name HADOOP_CREDSTORE_PASSWORD
to securely encrypt the credentials. If this variable is not set, we can still encrypt the credentials with default password, but not so secure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you explain more why we need to care about this in test CI?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The credentials such as OPENDAL_AZBLOB_ACCOUNT_KEY
are encrypted and saved to /tmp/hdfs/wasb.jceks
in the ci machine's local file system, so that hdfs can pick up the credentials. If we don't specify HADOOP_CREDSTORE_PASSWORD
, the encrypting key will be a default value that is publicly known.
If no one has chance to read from /tmp/hdfs/wasb.jceks
in ci machine, this issue will be fine and we don't need HADOOP_CREDSTORE_PASSWORD
. But otherwise, they will be able to decrypt the real credentials
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If no one has chance to read from
/tmp/hdfs/wasb.jceks
in ci machine, this issue will be fine and we don't needHADOOP_CREDSTORE_PASSWORD
. But otherwise, they will be able to decrypt the real credentials
Yes, no one can read. We trust our committers to not allow such thing happen.
Hi, @ArmandoZ, thanks for your efforts! Would you like to split them into two PRs for eaiser review and testing? |
# specific language governing permissions and limitations | ||
# under the License. | ||
|
||
name: hdfs_default_azblob |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
s3 has minio
setup and azblob
has azurite
setup. They can be tested even in forked repo. Would you like to use the same setup?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure! let me have them updated in new PRs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree we should rename the test cases and file names accordingly? E.g.
hdfs_default_s3
->hdfs_default_minio_s3
s3-core-site.xml
->minio-s3-core-site.xml
would be more specific
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you agree we should rename the test cases and file names accordingly?
Good idea, we can use hdfs_default_on_minio_s3
and minio-s3-core-site.xml
.
PR for hdfs test case over minio s3 is ready to review with test passed: #4184 For azurite environment looks like |
The remaining PR for azurite is also ready - #4185 . Seems to me this PR can be closed if no longer needed |
Merged. Let's close this PR. |
Issue: #4168
Adds test case for hdfs over s3 and azure blob storage. Tested locally with own environment.
TODO: split this into two PRs