Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix validation error when only Zarr assets are uploaded #2062

Merged
merged 1 commit into from
Dec 4, 2024

Conversation

aaronkanzer
Copy link
Member

@aaronkanzer aaronkanzer commented Oct 30, 2024

Description of error

As documented in #1814, we receive the following validation error on the web application when uploading only Zarr asset(s) to a Dandiset (and not blobs):

assetsSummary: A Dandiset containing no files or zero bytes is not publishable

The error is raised from dandi-schema when a validation check occurs on the assetSummary field in the dandiset.yaml. See the models module:

@validator("assetsSummary")
def check_filesbytes(cls, values: AssetsSummary) -> AssetsSummary:
    if values.numberOfBytes == 0 or values.numberOfFiles == 0:
        raise ValueError(
            "A Dandiset containing no files or zero bytes is not publishable"
        )
    return values

And it arises because in the query below only the blob asset is evaluated:

'numberOfBytes': 1 if version.assets.filter(blob__size__gt=0).exists() else 0,

Description of proposed fix

The proposed changes additionally evaluate the size of Zarr assets that are in the COMPLETE state to generate the assetSummary:numberOfBytes metadata field.

cc @kabilar

Closes #1814

Copy link
Member

@waxlamp waxlamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need a bit more explanation of the actual bug, and why this fixes it. We need to be careful with Zarr statuses because those have a complex relationship with Assets that results in (perhaps overly) complex semantics.

dandiapi/api/services/metadata/__init__.py Outdated Show resolved Hide resolved
@aaronkanzer
Copy link
Member Author

aaronkanzer commented Oct 31, 2024

I think we need a bit more explanation of the actual bug, and why this fixes it. We need to be careful with Zarr statuses because those have a complex relationship with Assets that results in (perhaps overly) complex semantics.

In the case where there is no blob but just zarr FKs associated with the Asset, validation fails since the query of:

'numberOfBytes': 1 if version.assets.filter(blob__size__gt=0).exists() else 0,

only evaluates the blob Foreign Key. I stumbled upon this bug when uploading a dandiset of pure zarr if you'd like to replicate in DANDI Archive + dandischema's current state

@kabilar kabilar changed the title Resolve all asset types being evaluated during upload validation Fix validation error when only Zarr asset(s) is uploaded Nov 25, 2024
@kabilar kabilar changed the title Fix validation error when only Zarr asset(s) is uploaded Fix validation error when only Zarr assets are uploaded Nov 25, 2024
@kabilar
Copy link
Member

kabilar commented Nov 25, 2024

Hi @aaronkanzer, thank you for the original fix and pushing these changes upstream to DANDI.

Hi @waxlamp @jjnesbitt, I made some minor changes to the original comment above to further describe the issue and fix. Please take a look at it when you have a chance. Thank you.

@jjnesbitt
Copy link
Member

The original implementation of this PR was modifying both the version_aggregate_assets_summary and validate_version_metadata functions, testing only version_aggregate_assets_summary. However, only validate_version_metadata required any change to address issue this PR was created for, and the changes to version_aggregate_assets_summary would have no impact. I've reworked this PR to reflect that.

@aaronkanzer I believe the test_validate_version_metadata_only_zarr_assets test reflects the situation in the PR description. If that's not the case, please elaborate.

Copy link
Member

@mvandenburgh mvandenburgh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@aaronkanzer
Copy link
Member Author

The original implementation of this PR was modifying both the version_aggregate_assets_summary and validate_version_metadata functions, testing only version_aggregate_assets_summary. However, only validate_version_metadata required any change to address issue this PR was created for, and the changes to version_aggregate_assets_summary would have no impact. I've reworked this PR to reflect that.

@aaronkanzer I believe the test_validate_version_metadata_only_zarr_assets test reflects the situation in the PR description. If that's not the case, please elaborate.

@jjnesbitt Just reviewed the tests you added -- thanks so much for this -- yes, this makes sense, and looks good to me

@jjnesbitt jjnesbitt added patch Increment the patch version when merged release Create a release when this pr is merged labels Dec 4, 2024
@jjnesbitt jjnesbitt merged commit c2a737c into master Dec 4, 2024
11 checks passed
@jjnesbitt jjnesbitt deleted the ak-validation branch December 4, 2024 18:48
@dandibot
Copy link
Member

dandibot commented Dec 4, 2024

🚀 PR was released in v0.3.113 🚀

@dandibot dandibot added the released This issue/pull request has been released. label Dec 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
patch Increment the patch version when merged release Create a release when this pr is merged released This issue/pull request has been released.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Asset Summary validation fails to evaluate all "assets" size during validation of Version metadata
6 participants