-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify asset path at top level in API #1700
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good overall, but I had one question about the user-facing interface changes.
if 'path' not in data['metadata'] or not data['metadata']['path']: | ||
raise serializers.ValidationError({'metadata': 'No path specified in metadata.'}) | ||
if 'path' in data['metadata']: | ||
raise serializers.ValidationError( | ||
{'metadata': 'Path must only be specified at top level.'} | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the part that makes this a breaking change?
Related question: it would be reasonable under the general usage of PUT endpoints to copy the metadata from a GET call and then paste it into the PUT call; if someone does that, will the presence of the path
entry cause this error?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is half of the breaking change yes, the other half is being required to specify the path
at the top level (i.e. violating either of these will fail the request).
Related question: it would be reasonable under the general usage of PUT endpoints to copy the metadata from a GET call and then paste it into the PUT call; if someone does that, will the presence of the path entry cause this error?
Yes it will cause this error. If we'd like, we could still allow path
in the metadata and just ignore it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming we'll disallow path: does this check belong in the serializer or the service layer? Since we would want to prevent it regardless of where it came from, I think it belongs in the service layer. And if that's the case, is path
still a necessary element in ASSET_COMPUTED_FIELDS
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we'd still need to keep path
as an entry in ASSET_COMPUTED_FIELDS
, since it's still returned as a part of the computed metadata (and as such we'd want to prevent storing it in the metadata directly).
The validation error I added here is more a signal externally to prevent confusion / false bug reporting (e.g. "I changed path in the metadata, why does nothing change?"). If we assume we're keeping path
as a field in ASSET_COMPUTED_FIELDS
, I don't see what the benefit of performing a check in the service layer is, since the field will be stripped out anyway.
What do you think? It's possible I'm missing something.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI Jake and I discussed this and settled on generalizing the "path can't be specified as metadata" requirement to all reserved (computed) fields. As it stands, the same bug as #1682 exists for the other fields.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this context computed means "computed from other properties of the asset or asset metadata". In the case of path
, since it's stored directly on an asset, it's "computed" by just injecting that field into the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks @AlmightyYakob - i still find it weird that the API requires this as a separate element as opposed to being able to pull it out internally from the metadata as it used to. locally the CLI validates the asset with these fields inside, and hence seems easiest from a programmer standpoint using the API to have one point of entry. since this is a breaking API change, just want to use the opportunity to both reflect that dandi objects are the metadata and ensuring a good pattern for outside clients, which are growing beyond us, interacting with the API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The intent of this change is to make it more clear the distinction between modifiable and un-modifiable metadata. The issue with the current state of things is as follows:
You have an asset and it's metadata, and you want to change a field in that asset's metadata. Depending on what field you want to modify, it will actually result in no change when the request is made to the API, since some fields are computed/derived/etc, and some are actually modifiable. Things like contentUrl
and digest
are examples of fields that you cannot modify directly. They are computed/derived from the asset, or its associated asset blob / zarr.
The path
field is a unique example, where it's a field that is computed (since that data is stored on the asset itself and is then injected into the metadata), yet prior to this proposed change, was also specified in the metadata, and then stripped out by the server. This is counter-intuitive behavior, and is the reason I originally intended on barring path
from being specified in the metadata with this PR. Dan and I also discussed extending this to all computed fields, so that there is a clear one-way data flow for any computed fields (top level -> metadata).
Leaving things the way they are now, how would a consumer of the API know which fields can be modified and which can't? The only way they could know (aside from reading the source code), is through trial and error, which seems undesirable from a user-facing point of view. If we were to update the API to make it explicit the fields which can and can't be modified, I think that would result in a better experience for consumers of the API.
I do see your point, in the sense that the format of the API and that of the Dandi schema would be diverging somewhat. However, regardless of the uniformity between the two formats, the dichotomy between metadata that can be updated, and computed metadata, still exists.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@AlmightyYakob - sorry for the delay.
The path field is a unique example, where it's a field that is computed (since that data is stored on the asset itself and is then injected into the metadata), yet prior to this proposed change, was also specified in the metadata, and then stripped out by the server.
regarding path
, i don't know why it was being stripped out or inserted by the server. path
is indeed special as it is the unique key to an asset. i.e. the same path cannot point to multiple blobs.
The only way they could know (aside from reading the source code), is through trial and error, which seems undesirable from a user-facing point of view.
indeed this is the case right now, and is really because of the API service not trusting clients to determine what those fields are. however, the exception is still for path
- which is definitely a client prerogative to set, not for the server to decide. the API should have nothing to say about paths except regarding "our" policies on clashes with either POST
/PUT
.
can we better understand why path should be computed?
the dichotomy between metadata that can be updated, and computed metadata, still exists.
let's figure out a way to add this info to dandischema itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also do not grasp why it is exactly computed, besides may be that it is first extracted from metadata for the asset record and then placed back into it (why?).
it may really be helpful to document the lifecycle of an asset somewhere (can point to code pieces as needed). this should include what is injected asset metadata and when. also if we are making API changes related to the asset, can we also address the blob/zarr vs blob_id/zarr_id difference between API returns and API POST/PUT endpoints and make them consistent. |
Good point, I agree.
You're referring to #1686, correct? I think we can do that, but I'd prefer to do it in a separate PR, to keep things clean. We should delay the merging/deployment of these two PRs until the CLI has the corresponding changes ready to go. |
I feel that this a "good to have" and not really "has to have" change, and that is why not sure if it is worth the API breakage at this point. Should we may be better
WDYT? |
I'm closing this PR, and will be making a new PR containing the generalized solution discussed here. That will still constitute a breaking API change, but will be less disruptive, as it'll be contained in one PR. |
Closes #1687, follow-up to #1689
This is a breaking API change.
Previously, for asset creation/update, the path of the asset was specified within the
metadata
field. This is confusing, as an asset's path is stored directly on the model itself (not in the metadata), and was the leading cause of #1682. This PR changes that behavior, by requiring an asset's path to be specified directly at the top level, alongsidemetadata
,blob_id
, andzarr_id
.