-
Notifications
You must be signed in to change notification settings - Fork 2.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Load content hash as the etag of the object when the UFS is S3 in 2.8 #18438
Open
Jackson-Wang-7
wants to merge
6
commits into
Alluxio:branch-2.8
Choose a base branch
from
Jackson-Wang-7:branch-2.8-etag
base: branch-2.8
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
### What changes are proposed in this pull request? Support STS for OSS ufs ### Why are the changes needed? 1. Plaintext AccessKey/AccessSecret is not safe and not Recommended for Aliyun 2. STS(Security Token Service) for OSS is more safe. For details, see https://help.aliyun.com/document_detail/32016.html ### Does this PR introduce any user facing changes? addition property keys 1. UNDERFS_OSS_STS_ENABLED 2. UNDERFS_OSS_RETRY_MAX 3. UNDERFS_OSS_ECS_RAM_ROLE Alluxio#16510 pr-link: Alluxio#16481 change-id: cid-7d486e84f82e5ab5211238b1f180b4a8fd8e5742
### What changes are proposed in this pull request? delete temporary files when uploading files to OBS fails. ### Why are the changes needed? Currently does not delete temporary files when obs delete fails. ### Does this PR introduce any user facing changes? No user facing changes. pr-link: Alluxio#16056 change-id: cid-159915bffd7e18df4123ff7dd4c44048245cbb83
### What changes are proposed in this pull request? Refactor s3 low level output stream and support OSS and OBS. ### Why are the changes needed? 1. extract the generic logic to `ObjectLowLevelOutputStream` to make it easy to support new object storage. 2. Support streaming uploads for OBS and OSS. 3. fix the bug that empty files cannot be persisted to UFS. 4. specify MD5 when upload parts. ### Does this PR introduce any user facing changes? `alluxio.underfs.oss.intermediate.upload.clean.age`: clean incomplete multi abort age for OSS. `alluxio.underfs.oss.streaming.upload.enabled`: Whether to enable stream upload for OSS. `alluxio.underfs.oss.streaming.upload.partition.size`: straming upload partition size for OSS. `alluxio.underfs.oss.streaming.upload.threads`: thread pool size for OSS streaming upload. `alluxio.underfs.obs.intermediate.upload.clean.age`: clean incomplete multi abort age for obs. `alluxio.underfs.obs.streaming.upload.enabled`: Whether to enable stream upload for OBS. `alluxio.underfs.obs.streaming.upload.partition.size`: straming upload partition size for OBS. `alluxio.underfs.obs.streaming.upload.threads`: thread pool size for OBS streaming upload. pr-link: Alluxio#16122 change-id: cid-91f6e2b5ec6b79d2175e71754654b59e650c0c32
Currently when complete is called on a file in Alluxio, a fingerprint of the file will be created by performing a GetStauts on the file on the UFS. If due to a concurrent write, the state of the file is different than what was written through Alluxio, the fingerprint will not actually match the content of the file in Alluxio. If this happens the state of the file in Alluxio will always be out of sync with the UFS, and the file will never be updated to the most recent version. This is because metadata sync uses the fingerprint to see if the file needs synchronization, and if the fingerprint does not match the file in Alluxio there will be inconsistencies. This PR fixes this by having the contentHash field of the fingerprint be computed while the file is actually written on the UFS. For object stores, this means the hash is taken from the result of the call to PutObject. Unfortunately HDFS does not have a similar interface, so the content hash is taken just after the output stream is closed to complete the write. There could be a small chance that someone changes the file in this window between the two operations. pr-link: Alluxio#16597 change-id: cid-64723be309bdb14b05613864af3b6a1bb30cba6d
Jackson-Wang-7
force-pushed
the
branch-2.8-etag
branch
from
November 24, 2023 09:22
d85a756
to
9554b7a
Compare
Automated checks report:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks. |
Jackson-Wang-7
changed the title
Branch 2.8 etag
Record content hash as the etag of the object when the UFS is S3
Nov 24, 2023
Jackson-Wang-7
changed the title
Record content hash as the etag of the object when the UFS is S3
Record content hash as the etag of the object when the UFS is S3 in 2.8
Nov 24, 2023
elega
approved these changes
Nov 24, 2023
Cherry-pick of existing commit. orig-pr: Alluxio#18440 orig-commit: Alluxio/alluxio@91e045b orig-commit-author: yuyang wang <[email protected]> pr-link: Alluxio#18446 change-id: cid-1204732240a5e73959b917eb6d9f0c97e05820dc
Jackson-Wang-7
force-pushed
the
branch-2.8-etag
branch
from
November 28, 2023 07:33
9554b7a
to
3f0d4f8
Compare
Jackson-Wang-7
changed the title
Record content hash as the etag of the object when the UFS is S3 in 2.8
Load content hash as the etag of the object when the UFS is S3 in 2.8
Nov 28, 2023
Automated checks report:
All checks passed! |
@Jackson-Wang-7 merge or close this? Is this a cherry pick? |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
What changes are proposed in this pull request?
Please outline the changes and how this PR fixes the issue.
Why are the changes needed?
Please clarify why the changes are needed. For instance,
Does this PR introduce any user facing changes?
Please list the user-facing changes introduced by your change, including