Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load content hash as the etag of the object when the UFS is S3 in 2.8 #18438

Open
wants to merge 6 commits into
base: branch-2.8
Choose a base branch
from

Conversation

Jackson-Wang-7
Copy link
Contributor

What changes are proposed in this pull request?

Please outline the changes and how this PR fixes the issue.

Why are the changes needed?

Please clarify why the changes are needed. For instance,

  1. If you propose a new API, clarify the use case for a new API.
  2. If you fix a bug, describe the bug.

Does this PR introduce any user facing changes?

Please list the user-facing changes introduced by your change, including

  1. change in user-facing APIs
  2. addition or removal of property keys
  3. webui

StephenRi and others added 4 commits November 24, 2023 10:10
### What changes are proposed in this pull request?

Support STS for OSS ufs

### Why are the changes needed?

1. Plaintext AccessKey/AccessSecret is not safe and not Recommended for
Aliyun
2. STS(Security Token Service) for OSS is more safe. For details, see
https://help.aliyun.com/document_detail/32016.html

### Does this PR introduce any user facing changes?

addition property keys
1. UNDERFS_OSS_STS_ENABLED
2. UNDERFS_OSS_RETRY_MAX
3. UNDERFS_OSS_ECS_RAM_ROLE

Alluxio#16510

pr-link: Alluxio#16481
change-id: cid-7d486e84f82e5ab5211238b1f180b4a8fd8e5742
### What changes are proposed in this pull request?

delete temporary files when uploading files to OBS fails.

### Why are the changes needed?

Currently does not delete temporary files when obs delete fails.

### Does this PR introduce any user facing changes?

No user facing changes.

pr-link: Alluxio#16056
change-id: cid-159915bffd7e18df4123ff7dd4c44048245cbb83
### What changes are proposed in this pull request?

Refactor s3 low level output stream and support OSS and OBS.

### Why are the changes needed?

1. extract the generic logic to `ObjectLowLevelOutputStream` to make it
easy to support new object storage.
2. Support streaming uploads for OBS and OSS.
3. fix the bug that empty files cannot be persisted to UFS.
4. specify MD5 when upload parts.

### Does this PR introduce any user facing changes?

`alluxio.underfs.oss.intermediate.upload.clean.age`: clean incomplete
multi abort age for OSS.
`alluxio.underfs.oss.streaming.upload.enabled`: Whether to enable stream
upload for OSS.
`alluxio.underfs.oss.streaming.upload.partition.size`: straming upload
partition size for OSS.
`alluxio.underfs.oss.streaming.upload.threads`: thread pool size for OSS
streaming upload.
`alluxio.underfs.obs.intermediate.upload.clean.age`: clean incomplete
multi abort age for obs.
`alluxio.underfs.obs.streaming.upload.enabled`: Whether to enable stream
upload for OBS.
`alluxio.underfs.obs.streaming.upload.partition.size`: straming upload
partition size for OBS.
`alluxio.underfs.obs.streaming.upload.threads`: thread pool size for OBS
streaming upload.

pr-link: Alluxio#16122
change-id: cid-91f6e2b5ec6b79d2175e71754654b59e650c0c32
Currently when complete is called on a file in Alluxio, a fingerprint of
the file will be created by performing a GetStauts on the file on the
UFS. If due to a concurrent write, the state of the file is different
than what was written through Alluxio, the fingerprint will not actually
match the content of the file in Alluxio. If this happens the state of
the file in Alluxio will always be out of sync with the UFS, and the
file will never be updated to the most recent version.
This is because metadata sync uses the fingerprint to see if the file
needs synchronization, and if the fingerprint does not match the file in
Alluxio there will be inconsistencies.

This PR fixes this by having the contentHash field of the fingerprint be
computed while the file is actually written on the UFS. For object
stores, this means the hash is taken from the result of the call to
PutObject. Unfortunately HDFS does not have a similar interface, so the
content hash is taken just after the output stream is closed to complete
the write. There could be a small chance that someone changes the file
in this window between the two operations.

pr-link: Alluxio#16597
change-id: cid-64723be309bdb14b05613864af3b6a1bb30cba6d
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • First word of title ("Branch") is not an imperative verb. Please use one of the valid words
  • Commits associated with Github account: PASS

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

@Jackson-Wang-7 Jackson-Wang-7 changed the title Branch 2.8 etag Record content hash as the etag of the object when the UFS is S3 Nov 24, 2023
@Jackson-Wang-7 Jackson-Wang-7 changed the title Record content hash as the etag of the object when the UFS is S3 Record content hash as the etag of the object when the UFS is S3 in 2.8 Nov 24, 2023
Jackson-Wang-7 and others added 2 commits November 28, 2023 15:30
Cherry-pick of existing commit.
orig-pr: Alluxio#18440
orig-commit: Alluxio/alluxio@91e045b
orig-commit-author: yuyang wang <[email protected]>

			pr-link: Alluxio#18446
			change-id: cid-1204732240a5e73959b917eb6d9f0c97e05820dc
@Jackson-Wang-7 Jackson-Wang-7 changed the title Record content hash as the etag of the object when the UFS is S3 in 2.8 Load content hash as the etag of the object when the UFS is S3 in 2.8 Nov 28, 2023
@alluxio-bot
Copy link
Contributor

Automated checks report:

  • PR title follows the conventions: PASS
  • Commits associated with Github account: PASS

All checks passed!

@jiacheliu3
Copy link
Contributor

@Jackson-Wang-7 merge or close this? Is this a cherry pick?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants