Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add extra header to signed url #4971

Merged
merged 24 commits into from
Mar 12, 2024
Merged

Add extra header to signed url #4971

merged 24 commits into from
Mar 12, 2024

Conversation

pingsutw
Copy link
Member

@pingsutw pingsutw commented Feb 28, 2024

Why are the changes needed?

  • Dataproxy fails to generate a signed URL if a file exists since metadata.etag does not equal contentMD5.
  • Flytekit fails to upload the same file twice since dataproxy fails to generate signed URL

What changes were proposed in this pull request?

  • Add extra header (x-amz-meta-flyteContentMD5, x-goog-meta-flyteContentMD5, and x-ms-meta-flyteContentMD5) to signed URL
  • The client-side has to add that header to the request as well. (Dataproxy will return a header for the client to use)
  • Dataproxy will check if the metadata flyteContentMD5 is equal to ContentMD5

How was this patch tested?

pyflyte --verbose run --remote test2.py wf --f test/test.csv --d test
import pandas as pd

from flytekit import task, workflow
from flytekit.types.directory import FlyteDirectory
from flytekit.types.file import FlyteFile


@task()
def print_data_directory(f: FlyteFile, d: FlyteDirectory):
    df = pd.read_csv(f)
    print(d)
    print(df.head())


@workflow
def wf(f: FlyteFile, d: FlyteDirectory):
    print_data_directory(f=f, d=d)

Setup process

Screenshots

  • AWS
image
  • Azure
image
  • GCP
image
  • minio
image

Check all the applicable boxes

  • I updated the documentation accordingly.
  • All new and existing tests passed.
  • All commits are signed-off.

Related PRs

Blocked by flyteorg/stow#13

Docs link

NA

@dosubot dosubot bot added size:M This PR changes 30-99 lines, ignoring generated files. enhancement New feature or request labels Feb 28, 2024
Copy link

codecov bot commented Feb 28, 2024

Codecov Report

Attention: Patch coverage is 63.63636% with 16 lines in your changes are missing coverage. Please review.

Project coverage is 59.00%. Comparing base (2256c2b) to head (afe2bf6).
Report is 29 commits behind head on master.

Files Patch % Lines
flyteadmin/dataproxy/service.go 56.52% 9 Missing and 1 partial ⚠️
flytestdlib/storage/stow_store.go 78.94% 3 Missing and 1 partial ⚠️
flytestdlib/storage/mem_store.go 0.00% 2 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4971      +/-   ##
==========================================
+ Coverage   58.97%   59.00%   +0.02%     
==========================================
  Files         645      645              
  Lines       55561    55578      +17     
==========================================
+ Hits        32766    32792      +26     
+ Misses      20200    20194       -6     
+ Partials     2595     2592       -3     
Flag Coverage Δ
unittests 59.00% <63.63%> (+0.02%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@dosubot dosubot bot added size:XL This PR changes 500-999 lines, ignoring generated files. and removed size:M This PR changes 30-99 lines, ignoring generated files. labels Feb 28, 2024
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw
Copy link
Member Author

cc @ddl-ebrown @vsbus as well

@pingsutw pingsutw mentioned this pull request Feb 28, 2024
3 tasks
wild-endeavor
wild-endeavor previously approved these changes Feb 28, 2024
@dosubot dosubot bot added the lgtm This PR has been approved by a maintainer label Feb 28, 2024
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
flyteadmin/go.mod Outdated Show resolved Hide resolved
flytestdlib/storage/stow_store.go Outdated Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
@ddl-ebrown
Copy link
Contributor

Thanks for taking this on so quickly @pingsutw !

Copy link
Contributor

@EngHabu EngHabu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I've two comments:

  1. Maintain the right abstractions, storage package should be the one handling the different behaviors between different storage backends. If you find yourself adding half the logic within one package and the other half in another package, something smells...
  2. Backward compatibility:
    a. Overwriting files in GCS and Azure just never worked... no need to attempt to maintain backward compatibility here.
    b. We should always add the ContentMD5 tag in all storage providers when CreateSignedURL is called
    c. When the SDK calls CreateSignedURL, we should check if the target exists and has that tag set, we should compare the request's MD5 to that value.
    d. When the SDK calls CreateSignedURL, if the target doesn't have MD5 tag set, We should compare it against ETag for S3 and just fail for all others (existing behavior)

flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
flytestdlib/storage/stow_store.go Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
flytestdlib/storage/storage.go Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
flyteadmin/dataproxy/service.go Outdated Show resolved Hide resolved
pingsutw added 2 commits March 5, 2024 12:25
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
go.mod Outdated Show resolved Hide resolved
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
Signed-off-by: Kevin Su <[email protected]>
@pingsutw pingsutw merged commit 1d2e305 into master Mar 12, 2024
48 checks passed
@pingsutw pingsutw deleted the fix-azure-upload branch March 12, 2024 00:27
yubofredwang pushed a commit to yubofredwang/flyte that referenced this pull request Mar 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request lgtm This PR has been approved by a maintainer size:XL This PR changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants