Skip to content

Conversation

pryvkin10x
Copy link
Contributor

@pryvkin10x pryvkin10x commented Aug 1, 2025

  • The filenames comprising the morphology image have changed.
  • There's a new feature type called Protein Expression that ideally would be scaled down by the fixed-point scaling factor used to store the data in the HDF5. (Note: It isn't loaded here because the existing code takes only the "Gene Expression" feature type)

@codecov-commenter
Copy link

codecov-commenter commented Aug 1, 2025

Codecov Report

❌ Patch coverage is 7.50000% with 37 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.81%. Comparing base (7cccdd1) to head (c660343).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_io/readers/xenium.py 3.12% 31 Missing ⚠️
src/spatialdata_io/readers/_utils/_read_10x_h5.py 14.28% 6 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (7cccdd1) and HEAD (c660343). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (7cccdd1) HEAD (c660343)
3 2
Additional details and impacted files
@@             Coverage Diff             @@
##             main     #312       +/-   ##
===========================================
- Coverage   49.06%   37.81%   -11.26%     
===========================================
  Files          26       26               
  Lines        2686     2721       +35     
===========================================
- Hits         1318     1029      -289     
- Misses       1368     1692      +324     
Files with missing lines Coverage Δ
src/spatialdata_io/_constants/_constants.py 100.00% <100.00%> (ø)
src/spatialdata_io/readers/_utils/_read_10x_h5.py 17.64% <14.28%> (-67.46%) ⬇️
src/spatialdata_io/readers/xenium.py 19.11% <3.12%> (-53.53%) ⬇️

... and 4 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@pakiessling
Copy link

pakiessling commented Sep 23, 2025

I tried this out with out Xenium protein data and it loads all the images correctly.

I got a warning though:

The "cell_id" column in the cells metadata table does not match the "cell_id" column in the annotation table. This could be due to trying to read a new version that is not supported yet. Please report this issue.

grafik

Not sure what this means, what do the labels even indicate?

@LucaMarconato
Copy link
Member

Thanks for the feedback @pakiessling. The warning you got seems like a false alarm due to an update in anndata and that has been fixed in #314. Please try to merge main into the current PR.

@pryvkin10x
Copy link
Contributor Author

Rebased against main

@LucaMarconato
Copy link
Member

LucaMarconato commented Oct 3, 2025

Hi @pryvkin10x I merged your PR about the test dataset and recreated the artifacts. I re-ran the tests for this PR, which are green now.

@pryvkin10x
Copy link
Contributor Author

Hi @pryvkin10x I merged your PR about the test dataset and recreated the artifacts. I re-ran the tests for this PR, which are green now.

Thank you! Let me know if there's anything else I need to add to this PR before it can be merged it into main.

@LucaMarconato
Copy link
Member

I reviewed the PR—it looks good to me! I kindly ask you for some minor tasks.

Some comments:

  • Thanks for adding the small demo dataset and the tests around that; super helpful!
  • I noticed that the xenium() reader was not ready for Zarr v3 support (coming soon in spatialdata) because of some small API changes. This was independent of Xenium 4.0 and I fixed it with this PR: fix zarr.open (v3) lacks context manager #327
  • As you pointed out, the case feature_types == "Protein Expression" is not triggered by the Xenium reader and not covered by tests. Also, the small test datasets doesn't contain this case. I would consider one of the following: addressing this in this PR, or merging the PR as is and opening an issue to keep track of it for a follow up PR. This is a suggestion on how to cover that case:
    • I would modify the small test dataset to include var columns with feature_types == "Protein Expression"
    • I would add a parameter to _read_10x_h5() to consider also the protein case; or in alternative, I would call the _read_10x_h5() with gex_only=False and then filter out the columns that are not gene expression or protein.
    • I'd plot things to double check that the data is correct and then include an assert for the new info in the test_example_data_index_integrity().

Finally, I prepared a checklist in the contribution guide for streamlining the contribution process and making review easier. Most of the tasks are done already, or are not relevant for this PR (I marked them as can be skipped); I will post this in the comment below, adding a comment for the opened tasks, that I kindly ask you to address.

@LucaMarconato
Copy link
Member

LucaMarconato commented Oct 4, 2025

Checklist from the contribution guide

<--- I will add comments in italics

  • Specification

    • Provide a link to the public specification/changelog of the raw data format. <--- I added a link to the xenium reader
  • Reader implementation

    • Implement reader functions in src/spatialdata_io/readers/ (if no formal spec available, place under experimental/)
    • Place string constants in src/spatialdata_io/_constants/_constants.py
  • Example data

    • Provide (very) small public test dataset(s) (~100kB–10MB, preferred), licensed with a permissive license (e.g. CC BY 4.0) <--- I added a link to the webpage where the license is stated
    • Ensure dataset(s) cover edge cases of the format <--- The feature_types == "Protein Expression" edge case is not covered. As discussed above this could be left for a follow up PR. Also I noticed that there are 2 more tests datasets for 4.0.0. If they are redundant, I would indeed not include them in the tests, but if they cover edge cases, it would be important to include them.
    • can be skipped In addition to the small dataset(s), consider creating scripts for downloading and converting a real dataset in spatialdata-sandbox:
      • can be skipped download.py (fetch raw data → data/)
      • can be skipped to_zarr.py (convert raw data → data.zarr)
      • can be skipped Include requirements.txt (or add PEP 723 metadata to the scripts) for dependencies not included in spatialdata
    • can be skipped If the data is not available online, upload it to the "SpatialData Submissions" Zenodo community
    • can be skipped We do not support private data, with the exception of data that will soon be made available to the public. In such cases, please reach out to us. Then, please provide a private repo with download.py and to_zarr.py (or upload the data privately to the "SpatialData Submissions" Zenodo community)
  • Tests

    • Add test functions for the reader
    • Test multiple versions of the raw format if applicable
    • Verify visualization/alignment with spatialdata-plot or napari-spatialdata * <--- I did this, but leaving this here to check you also did *
    • Add proxy tests for spatial extent correctness (spatialdata.get_extent())
    • Add proxy tests for coloring/annotations of selected elements/features
    • Add tests for auxiliary/helper functions if present * <--- as discussed before, the new case in _read_v3_10x_h5() is untested *
  • Extra

    • Update CLI in src/spatialdata_io/__main__.py if reader/converter is added/modified
    • Consider contributing to the public project board (currently view only, please reach out for edits)
    • Provide feedback on this contribution guide to improve it further

@whtns whtns mentioned this pull request Oct 6, 2025
@spatts14
Copy link

spatts14 commented Oct 7, 2025

Hi,

I am unable to import my xenium data with the new morphology nomenclature.

All my packages are up to date and It still isn't working for me. Is there something I've missed?

# Name                    Version                   Build  Channel
anndata                   0.12.2                   pypi_0    pypi
scanpy                    1.11.4             pyhd8ed1ab_0    conda-forge
spatial-image             1.2.3              pyhd8ed1ab_1    conda-forge
spatialdata               0.5.0              pyhd8ed1ab_0    conda-forge
spatialdata-io            0.3.0              pyhd8ed1ab_0    conda-forge
squidpy                   1.6.5              pyhd8ed1ab_0    conda-forge

This is the error I'm getting

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Cell In[4], [line 1](vscode-notebook-cell:?execution_count=4&line=1)
----> [1](vscode-notebook-cell:?execution_count=4&line=1) sdata = xenium(xenium_path)

File ~/miniforge3/envs/xenium_5k/lib/python3.12/site-packages/spatialdata_io/_utils.py:48, in deprecation_alias.<locals>.deprecation_decorator.<locals>.wrapper(*args, **kwargs)
     46 class_name = f.__qualname__
     47 rename_kwargs(f.__name__, kwargs, aliases, class_name)
---> [48](https://file+.vscode-resource.vscode-cdn.net/Users/sarapatti/PhD_projects/Xenium_5k_analysis_pipeline/~/miniforge3/envs/xenium_5k/lib/python3.12/site-packages/spatialdata_io/_utils.py:48) return f(*args, **kwargs)

File ~/miniforge3/envs/xenium_5k/lib/python3.12/site-packages/spatialdata_io/readers/xenium.py:298, in xenium(path, cells_boundaries, nucleus_boundaries, cells_as_circles, cells_labels, nucleus_labels, transcripts, morphology_mip, morphology_focus, aligned_images, cells_table, n_jobs, imread_kwargs, image_models_kwargs, labels_models_kwargs)
    293     raise ValueError(
    294         "Expected 1 (no segmentation kit) or 4 (segmentation kit) files in the morphology focus directory, "
    295         f"found {len(files)}: {files}"
    296     )
    297 if files != {XeniumKeys.MORPHOLOGY_FOCUS_CHANNEL_IMAGE.value.format(i) for i in range(len(files))}:
--> [298](https://file+.vscode-resource.vscode-cdn.net/Users/sarapatti/PhD_projects/Xenium_5k_analysis_pipeline/~/miniforge3/envs/xenium_5k/lib/python3.12/site-packages/spatialdata_io/readers/xenium.py:298)     raise ValueError(
    299         "Expected files in the morphology focus directory to be named as "
    300         f"{XeniumKeys.MORPHOLOGY_FOCUS_CHANNEL_IMAGE.value.format(0)} to "
    301         f"{XeniumKeys.MORPHOLOGY_FOCUS_CHANNEL_IMAGE.value.format(len(files) - 1)}, found {files}"
    302     )
    303 if len(files) == 1:
    304     channel_names = {
    305         0: XeniumKeys.MORPHOLOGY_FOCUS_CHANNEL_0.value,
    306     }

ValueError: Expected files in the morphology focus directory to be named as morphology_focus_0000.ome.tif to morphology_focus_0003.ome.tif, found {'ch0000_dapi.ome.tif', 'ch0001_atp1a1_cd45_e-cadherin.ome.tif', 'ch0002_18s.ome.tif', 'ch0003_alphasma_vimentin.ome.tif'}```

@pryvkin10x
Copy link
Contributor Author

Verify visualization/alignment with spatialdata-plot or napari-spatialdata <- Checked this
Ensure dataset(s) cover edge cases of the format <- Added tests for this

This should succeed once this PR is merged and the prepare_test_data action is re-run: #330

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: No status

Development

Successfully merging this pull request may close these issues.

5 participants