Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More robust name validation #703

Open
wants to merge 13 commits into
base: main
Choose a base branch
from

Conversation

aeisenbarth
Copy link
Contributor

@aeisenbarth aeisenbarth commented Sep 9, 2024

Closes #624

  • This pull request changes name validation rules:
    • allow additionally . (now allowing _-. and alphanumeric, which includes 0-9a-zA-Z but also other Unicode like ɑ and ²)
    • forbid full names ., ..
    • forbid prefix __
    • forbid names only differing in character case, like abc, Abc (only one of them allowed, no matter which case)
  • Name validation is now also applied to AnnData tables (keys/columns in obs, obsm, obsp, var, varm, varp, uns).
    • For obs and var dataframes, _index is forbidden.
  • Validation happens at construction time when adding elements to an element type dictionary (as before).
  • Additionally, validation happens before writing to Zarr.

@aeisenbarth aeisenbarth marked this pull request as draft September 9, 2024 14:41
Copy link

codecov bot commented Sep 9, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 91.89%. Comparing base (8323e15) to head (726657f).
Report is 3 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #703      +/-   ##
==========================================
+ Coverage   91.84%   91.89%   +0.04%     
==========================================
  Files          44       46       +2     
  Lines        6791     6956     +165     
==========================================
+ Hits         6237     6392     +155     
- Misses        554      564      +10     
Files with missing lines Coverage Δ
src/spatialdata/_core/_elements.py 91.86% <100.00%> (-0.10%) ⬇️
src/spatialdata/_core/spatialdata.py 91.47% <100.00%> (+0.57%) ⬆️
src/spatialdata/_core/validation.py 100.00% <100.00%> (ø)
src/spatialdata/models/__init__.py 100.00% <100.00%> (ø)
src/spatialdata/models/models.py 87.74% <100.00%> (-0.08%) ⬇️

... and 4 files with indirect coverage changes

@aeisenbarth aeisenbarth marked this pull request as ready for review September 9, 2024 19:47
@LucaMarconato
Copy link
Member

Excellent PR @aeisenbarth, thank you!

I performed my code review and applied directly the code changes. I list them here:

  • I added a check for layers as well for tables (updated design docs and tests accordingly)
  • There was a bug in _validate_all_elements(): it should be element_type == 'tables' (instead of 'table'`)
    • That if condition was not covered by tests, so I added a test for that
  • In test_spatialdata_operations.py, some checks for tables were missing (due to old code, that was expecting a single table), I update that
  • Same for some code in test_readwrite.py
  • in test_writing_invalid_name() a test for labels was commented; I uncommented it
  • I extended test_writing_invalid_name() to consider:
    • writing of a table with a valid name bug invalid "subnames" (this is the test I mentioned above that was not covering the "table" vs "tables" bug)
    • incremental writing of single elements (before validation of table "subnames" was triggered only on write(), now also on write_element()).
  • I now trigger the name validation also on TableModel().validate() and not just on TableModel().parse(). I added tests for that in test_models.py

@LucaMarconato
Copy link
Member

I ask you please to give a double check to my code changes, and if you agree with them (or after your edits), let's merge 😊

@LucaMarconato
Copy link
Member

The explanation in the Discussions on how to be able to read datasets with naming problems is great! One minor todo:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Naming constraints break compatibility with existing datasets
2 participants