Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Overhauling validation results to get them closer to cover different types of validators #1514

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yarikoptic
Copy link
Member

Ref:

and result of our chat with @candleindark while also reviewing results of linkml validation over dandisets:

Notes/TODOs:

BREAKING CHANGE: we renamed .bids_version to more generic .standard + .standard_version
Copy link

codecov bot commented Oct 18, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.68%. Comparing base (6aa414c) to head (45db66c).
Report is 153 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1514      +/-   ##
==========================================
+ Coverage   88.58%   88.68%   +0.09%     
==========================================
  Files          78       78              
  Lines       10589    10867     +278     
==========================================
+ Hits         9380     9637     +257     
- Misses       1209     1230      +21     
Flag Coverage Δ
unittests 88.68% <100.00%> (+0.09%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.



class Severity(Enum):
# TODO: decide on the naming consistency -- either prepend all with Validation or not
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic Are you talking about naming of the high level entities in this module, such as Severity and ValidationOrigin, or are you talking about the naming of the enum values within Severity?

Comment on lines +20 to +25
class Severity(IntEnum):
HINT = 1
WARNING = 2
ERROR = 3
INFO = 2 # new/unused, available in linkml
WARNING = 3
ERROR = 4
CRITICAL = 5 # new/unused, linkml has FATAL
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see three issues with this though they are small issues.

  1. We may include all the severity levels in the outputs of all validators, However, for a particular validator, only a subset of these may be used.
  2. The same severity level may mean two different things in different validators.
  3. Different validator may also use different severity level name to mean the same level.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. agree
  2. better not
  3. better not

But you are pointing to the ambiguous (not clearly defined across validators) semantic . Then ideally we should be able to establish "mapping" if there is difference in agreement.

I think we should just clear up here what we mean for each of the levels. E.g.

  • HINT: data is correct but could be improved
  • INFO: ... ???
  • WARNING: when "SHOULD" or "SHOULD NOT" level of requirement is violated
  • ERROR: when "MUST" or "MUST NOT" level of requirement is violated
  • CRITICAL: when makes given data completely unusable

may be we might want

  • INTERNAL: signals about the problem with validator itself. although @candleindark rightfuly notices that it is still an ERROR but about not data, but validator itself... so we might want to explicitly point to the object of the error (not data but validator)



class Scope(Enum):
FILE = "file"
FOLDER = "folder"
# Isaac: make it/add "dandiset-metadata" to signal specific relation to metadata
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need clarification for this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be forget about it for now -- seems to combine with ValidationObject below but we might need to rethink/define some additional levels

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as for the folder an example could be an unknown to BIDS folder name under sub-.../ses-.../ folder. I am not sure ATM if BIDS validator complains and how

name: str # Validator name
version: str # Validator version
standard: str | None = (
None # Standard being validated against # TODO: Enum for the standards??
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yarikoptic We can create an enum class to define the supported standards. Do you know what those those standards are ATM?

Copy link
Member Author

@yarikoptic yarikoptic Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure. For starters:

  • BIDS
  • DANDI-LAYOUT -- our "ad-hoc" naming schema. ATM we validate "explicitly" in dandi-cli code in validate_organized_path, and not as part of the dandi-schema
  • DANDI-SCHEMA
  • NWB
  • HED
  • OME-ZARR

Could be also at the level of the "file format" standard:

  • JSON
  • YAML
  • TSV

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might want also to create Enum for validators since we also have a set of them.

@candleindark
Copy link
Member

In updating the use of ValidationOrigin, it is not clear to me what should be provided for the standard and standard_version attributes of many ValidationOrigin objects. For example, this is not clear in ValidationOrigin objects in organized.py.

origin=ValidationOrigin(name="dandi", version=__version__),

Do we have a dandi standard? If we do, it doesn't seem that we version this standard.

I think we should indeed define an enum class for the supported standard, and also talk about how to retrieve the version of those standards.

@@ -209,17 +209,20 @@ def get_validation_errors(
import zarr

errors: list[ValidationResult] = []
origin: ValidationOrigin = ValidationOrigin(
name="zarr",
version=zarr.__version__,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this line intentional? The original one was version=zarr.version.version.

origin: ValidationOrigin = ValidationOrigin(
name="zarr",
version=zarr.__version__,
standard="zarr",
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here to add standard_version on what version of zarr it is .

DANDISET = "dandiset"
DATASET = "dataset"


# new/unused, may be should be gone
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be drop this for now indeed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants