Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

EVA-3564 - Simplify metadata conversion and validation #55

Merged
merged 5 commits into from
Sep 16, 2024

Conversation

apriltuesday
Copy link
Contributor

@apriltuesday apriltuesday commented Sep 10, 2024

  • Metadata conversion never fails (unless unable to open the excel file) and always generates a JSON to validate
  • Scientific name and Biosample name errors correctly parsed and propagated to report
  • Added scientific name / taxonomy ID coherence check to semantic validation
  • Refactored and updated tests

@apriltuesday apriltuesday changed the title Eva 3564 EVA-3564 - Simplify metadata conversion and validation Sep 10, 2024
/sample/3/bioSampleObject/name
must have required property 'name'
must have required property 'name'
must have required property 'name'
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure what's going on here, but this is the output I get from biovalidator...

@apriltuesday apriltuesday marked this pull request as ready for review September 11, 2024 09:17
@apriltuesday apriltuesday self-assigned this Sep 11, 2024
Copy link
Member

@tcezard tcezard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great

Comment on lines +197 to +198
if scientific_name in data:
data[SPECIES] = data[scientific_name]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it worth having a mechanism that convert a excel spreadsheet into more than one BioSample field ?
We could then provide this in the spreadsheet2json_conf.yaml as

Sample:
  header_row: 3
  optional:
    Scientific Name: [scientificName, species]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could do, it would only be used for this case though as far as I know - the name field goes outside of characteristics so has to be handled differently. I probably won't add it to this PR but we should keep it in mind.

Comment on lines +138 to +139
# Sometimes there are multiple (possibly redundant) errors listed under a single property,
# we only report the first
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should report that to biovalidator.js

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I'll make an issue for them.

@apriltuesday apriltuesday merged commit faee2c8 into EBIvariation:main Sep 16, 2024
1 check passed
@apriltuesday apriltuesday deleted the EVA-3564 branch September 16, 2024 13:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants