Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: return file metadata on read #11212

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

jshore1296
Copy link

implements #11211

This isn't in a mergeable state quite yet, but I wanted to start the discussion. This is only supported by the XML and gRPC apis. I'm unsure of how to handle the JSON api. Should I just document that the JSON api does not support metadata on read? Or is there a more elegant way to block the user from trying to read it and being surprised by it being empty?

@jshore1296 jshore1296 requested review from a team as code owners December 3, 2024 17:48
@product-auto-label product-auto-label bot added the api: storage Issues related to the Cloud Storage API. label Dec 3, 2024
@tritone
Copy link
Contributor

tritone commented Dec 4, 2024

implements #11211

This isn't in a mergeable state quite yet, but I wanted to start the discussion. This is only supported by the XML and gRPC apis. I'm unsure of how to handle the JSON api. Should I just document that the JSON api does not support metadata on read? Or is there a more elegant way to block the user from trying to read it and being surprised by it being empty?

I'd say we can just document that it is unavailable via the JSON API. There are other fields in ReaderObjectAttrs which are only populated sometimes.

I need to look at the rest of the code more deeply; just let me know when it's ready for review.

@jshore1296
Copy link
Author

Alright, this is ready for review.

@jshore1296
Copy link
Author

@tritone sending a ping in case the previous comment didn't notify you

Copy link
Contributor

@tritone tritone left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments; overall looks good. Thanks for adding tests!

@@ -795,6 +795,62 @@ func TestOpenReaderEmulated(t *testing.T) {
})
}

func TestOpenReaderEmulated_Metadata(t *testing.T) {
transportClientTest(skipHTTP("metadata on read not supported by JSON api"), t, func(t *testing.T, ctx context.Context, project, bucket string, client storageClient) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the HTTP client should use XML by default, does the test not pass with a default HTTP client?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

right you are.

t.Fatalf("closing object: %v", err)
}
if _, err := veneerClient.Bucket(bucket).Object(want.Name).Update(ctx, ObjectAttrsToUpdate{
Metadata: map[string]string{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's do at least 2 keys in here just to make sure the decoding logic works correctly.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@@ -1133,6 +1133,7 @@ func (c *grpcStorageClient) NewRangeReader(ctx context.Context, params *newRange
CacheControl: obj.GetCacheControl(),
LastModified: obj.GetUpdateTime().AsTime(),
Metageneration: obj.GetMetageneration(),
Metadata: obj.GetMetadata(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was concerned this wouldn't be sufficient because we didn't have it in the unmarshaler for gRPC, but it looks like we did that part already so this should be fine.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was surprised that was already in there!

@@ -204,10 +205,10 @@ func initIntegrationTest() func() error {
if err != nil {
log.Fatalf("NewStorageControlClient: %v", err)
}
if err := client.Bucket(bucketName).Create(ctx, testutil.ProjID(), nil); err != nil {
if err := client.Bucket(bucketName).Create(ctx, testutil.ProjID(), &BucketAttrs{SoftDeletePolicy: &SoftDeletePolicy{RetentionDuration: 0}}); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should maybe be a separate change in our test harness? Was it impossible to delete the bucket if this wasn't included?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, this is leftover. My org has a policy set that made this necessary for me to be able to run the tests. I'll remove this before merge.

@@ -5041,7 +5042,58 @@ func TestIntegration_ReaderAttrs(t *testing.T) {
Metageneration: attrs.Metageneration,
CRC32C: crc32c(c),
}
if got != want {
if !reflect.DeepEqual(got, want) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can use https://pkg.go.dev/github.com/google/go-cmp/cmp#Diff as elsewhere in these tests?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done.

if err := writeObject(ctx, o, defaultType, c); err != nil {
t.Errorf("Write for %v failed with %v", o.ObjectName(), err)
}
defer func() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use t.Cleanup to ensure this is run.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

}
}()

oa, err := o.Update(ctx, ObjectAttrsToUpdate{Metadata: map[string]string{"Custom-Key": "custom-value"}})
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Again, let's use at least 2 keys here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Metageneration: attrs.Metageneration,
CRC32C: crc32c(c),
}
if !reflect.DeepEqual(got, want) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use cmp.Diff and let's only check the Metadata field in this test.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -59,6 +59,10 @@ type ReaderObjectAttrs struct {
// Generation is the generation number of the object's content.
Generation int64

// Metadata represents user-provided metadata, in key/value pairs.
// Not supported by the JSON api. Use ObjectHandle.Attrs instead.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would rephrase this as follows:

// Metadata represents user-provided metadata, in key/value pairs.
//
// It can be nil if no metadata is present, or if the client uses the JSON
// API for downloads. Only the XML and gRPC APIs support getting
// custom metadata via the Reader; for JSON make a separate call to 
// ObjectHandle.Attrs.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@@ -586,6 +587,9 @@ func TestRetryConformance(t *testing.T) {
if host == "" {
t.Skip("This test must use the testbench emulator; set STORAGE_EMULATOR_HOST to run.")
}
if runtime.GOOS == "darwin" {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's remove this from the PR; seems non-relevant to this particular issue.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 there was handling for this in the test script so I moved it in here, good to remove though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: storage Issues related to the Cloud Storage API.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants