Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for multi-modal segmentation #8605

Merged
merged 20 commits into from
Sep 30, 2024

Conversation

jsicherman
Copy link
Contributor

This adds the ability to read segmentation_method from the cells.zarr.zip for Xenium + multi-modal segmentation datasets. Zarr reading does not seem to be very straightforward with R, though, so some softdepends are needed. An example of this is shown, as well as some demonstrations on how to load other XOA outputs, in a small addition to the spatial vignette.

I also used this opportunity to clean up Xenium reading a bit, and since we will be moving from .csv.gz outputs to exclusively .parquet outputs, added support for that.

@jsicherman
Copy link
Contributor Author

@AustinHartman would be great to also get a review on this PR. Thanks!

@pmarks
Copy link
Contributor

pmarks commented Apr 29, 2024

@dcollins15 @rsatija ping on this -- what can we do to move this forward?

@alikhuseynov
Copy link

any updates on this?

@rsatija
Copy link
Collaborator

rsatija commented Jul 12, 2024

Sincere apologies for the delay. We've been slow to respond to PRs but are going to prioritize this going forward. We'll get back to you soon but to confirm, @pmarks you are signed off on this request?

@pmarks
Copy link
Contributor

pmarks commented Jul 12, 2024

@rsatija thanks for taking a look! Yes, should be ready to go.

@rsatija
Copy link
Collaborator

rsatija commented Jul 22, 2024

Hi @pmarks , sounds great. We're reviewing and just wondering if you could point us to a public dataset for testing, thanks!

@pmarks
Copy link
Contributor

pmarks commented Jul 22, 2024

@jsicherman
Copy link
Contributor Author

Thanks all! Just resolved a conflict that popped up on this branch. Please let me know if there's anything else I can do to help get this reviewed!

@jsicherman
Copy link
Contributor Author

Hi @rsatija @dcollins15, sorry to keep pestering you on this but is there any progress on getting this merged and into a CRAN release? We have quite a few reports of users being blocked in their analyses by the change to transcripts.csv.gz, which would be solved with this PR

Thanks!

@rsatija
Copy link
Collaborator

rsatija commented Sep 13, 2024

Thanks! Agreed that we need to move forward with this. To me, everythign looks good, and I see that you are up-to-date with our develop branch

@dcollins15 if you or the team see any issues here please let @jsicherman know, and if not, we'll go ahead and merge this in a week from today.

@SciComp8
Copy link

Hi @rsatija @dcollins15, sorry to keep pestering you on this but is there any progress on getting this merged and into a CRAN release? We have quite a few reports of users being blocked in their analyses by the change to transcripts.csv.gz, which would be solved with this PR

Thanks!

Seems ReadXenium() from the development version of Seurat is incompatible with newly released Xenium 5K data (https://www.10xgenomics.com/datasets/preview-data-xenium-prime-gene-expression)? Any update on loading new Xenium data? Thank you!

@alikhuseynov
Copy link

Hi @rsatija @dcollins15, sorry to keep pestering you on this but is there any progress on getting this merged and into a CRAN release? We have quite a few reports of users being blocked in their analyses by the change to transcripts.csv.gz, which would be solved with this PR
Thanks!

Seems ReadXenium() from the development version of Seurat is incompatible with newly released Xenium 5K data (https://www.10xgenomics.com/datasets/preview-data-xenium-prime-gene-expression)? Any update on loading new Xenium data? Thank you!

You can use this repo https://github.com/10XGenomics/seurat/tree/develop for now

@jsicherman
Copy link
Contributor Author

Seems ReadXenium() from the development version of Seurat is incompatible with newly released Xenium 5K data (https://www.10xgenomics.com/datasets/preview-data-xenium-prime-gene-expression)? Any update on loading new Xenium data? Thank you!

@ScienceComputing I just confirmed that Seurat::LoadXenium('/path/to/Xenium_Prime_Human_Lymph_Node_Reactive_FFPE') works as expected if you install from the 10XGenomics/develop branch for now (devtools::install_github('10XGenomics/seurat@develop'))

Copy link
Contributor

@dcollins15 dcollins15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies @jsicherman and @pmarks, I was under the impression that some feedback had been communicated outside of this thread, it wasn't my intent to leave you hanging!

Aside from one last copy-paste error, things seem to be working as expected with the Human Lung Cancer and Human Pancreas datasets. Unfortunately, I'm still unable to parse the older Mouse Brain Coronal Subset data (see comments below).

Once everything is working, there are a few documentation-related tasks to handle before this can be safely merged:

  • stars and (maybe) gmp should be added to the DESCRIPTION under Suggests (i.e., listed as a suggested dependency)
  • The @importFrom utils unzip tag needs to be added to the docstring for ReadXenium
  • The NAMESPACE and man/*.Rd files need to be updated — this can be done using roxygen2::roxygenise or devtools::document
  • An entry should be added to the NEWS.md file describing the changes introduced in this PR. The more detail provided in the changelog, the better for users

After those are taken care of, the final step is to bump the devel version and date in the DESCRIPTION file. Since another change is just about ready to be merged, I expect this will land as v5.1.0.9006 🚀

R/preprocessing.R Outdated Show resolved Hide resolved
R/preprocessing.R Outdated Show resolved Hide resolved
vignettes/seurat5_spatial_vignette_2.Rmd Show resolved Hide resolved
R/preprocessing.R Outdated Show resolved Hide resolved
@jsicherman
Copy link
Contributor Author

Thanks for the thorough review @dcollins15! Will try to put up some fixes on Monday. Should be able to remove the softdepends on gmp, stars and jsonlite, as some things changed in the outputs between when I put this up and now to simplify this integration.

@SciComp8
Copy link

SciComp8 commented Sep 23, 2024

@ScienceComputing

Thank you so much for the updates. I tried the new dev version of Seurat, but the Xenium 5K data still cannot be loaded normally. Any insights on this issue?

Error: file not found
In addition: Warning message:
In CPL_read_mdim(file, array_name, options, offset, count, step, :
GDAL Error 1: Decompressor blosc not handled

> data.path <- 'Xenium_Prime_Human_Lymph_Node_Reactive_FFPE_outs'
> list.files(data.path)
 [1] "analysis_summary.html"                                            
 [2] "analysis.tar.gz"                                                  
 [3] "analysis.zarr.zip"                                                
 [4] "aux_outputs.tar.gz"                                               
 [5] "cell_boundaries.csv.gz"                                           
 [6] "cell_boundaries.parquet"                                          
 [7] "cell_feature_matrix.h5"                                           
 [8] "cell_feature_matrix.tar.gz"                                       
 [9] "cell_feature_matrix.zarr.zip"                                     
[10] "cells.csv.gz"                                                     
[11] "cells.parquet"                                                    
[12] "cells.zarr.zip"                                                   
[13] "experiment.xenium"                                                
[14] "gene_panel.json"                                                  
[15] "metrics_summary.csv"                                              
[16] "morphology_focus"                                                 
[17] "morphology.ome.tif"                                               
[18] "nucleus_boundaries.csv.gz"                                        
[19] "nucleus_boundaries.parquet"                                       
[20] "transcripts.parquet"                                              
[21] "transcripts.zarr.zip"                                             
[22] "Xenium_Prime_Human_Lymph_Node_Reactive_FFPE_cell_types.csv"       
[23] "Xenium_Prime_Human_Lymph_Node_Reactive_FFPE_gene_groups.csv"      
[24] "Xenium_Prime_Human_Lymph_Node_Reactive_FFPE_he_image.ome.tif"     
[25] "Xenium_Prime_Human_Lymph_Node_Reactive_FFPE_he_imagealignment.csv"

> xenium.obj <- Seurat::LoadXenium(data.dir=data.path)
Error: file not found
In addition: Warning message:
In CPL_read_mdim(file, array_name, options, offset, count, step,  :
  GDAL Error 1: Decompressor blosc not handled

> sessionInfo()
R version 4.4.0 (2024-04-24)
Platform: aarch64-apple-darwin20
Running under: macOS Ventura 13.3

Matrix products: default
BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: America/New_York
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] progressr_0.14.0   jsonlite_1.8.9     gmp_0.7-5          stars_0.6-6       
[5] sf_1.0-17          abind_1.4-5        Seurat_5.1.0.9004  SeuratObject_5.0.2
[9] sp_2.1-3          

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.16.0      magrittr_2.0.3        
  [4] spatstat.utils_3.1-0   fs_1.6.4               vctrs_0.6.5           
  [7] ROCR_1.0-11            memoise_2.0.1          spatstat.explore_3.2-7
 [10] htmltools_0.5.8.1      usethis_2.2.3          sctransform_0.4.1     
 [13] parallelly_1.37.1      KernSmooth_2.23-22     htmlwidgets_1.6.4     
 [16] ica_1.0-3              plyr_1.8.9             plotly_4.10.4         
 [19] zoo_1.8-12             cachem_1.0.8           igraph_2.0.3          
 [22] mime_0.12              lifecycle_1.0.4        pkgconfig_2.0.3       
 [25] Matrix_1.7-0           R6_2.5.1               fastmap_1.1.1         
 [28] fitdistrplus_1.1-11    future_1.33.2          shiny_1.8.1.1         
 [31] digest_0.6.35          colorspace_2.1-0       patchwork_1.2.0       
 [34] tensor_1.5             RSpectra_0.16-1        irlba_2.3.5.1         
 [37] pkgload_1.3.4          fansi_1.0.6            spatstat.sparse_3.0-3 
 [40] httr_1.4.7             polyclip_1.10-6        compiler_4.4.0        
 [43] proxy_0.4-27           remotes_2.5.0          DBI_1.2.2             
 [46] fastDummies_1.7.3      pkgbuild_1.4.4         MASS_7.3-60.2         
 [49] sessioninfo_1.2.2      classInt_0.4-10        tools_4.4.0           
 [52] units_0.8-5            lmtest_0.9-40          httpuv_1.6.15         
 [55] future.apply_1.11.2    goftest_1.2-3          glue_1.7.0            
 [58] nlme_3.1-164           promises_1.3.0         grid_4.4.0            
 [61] Rtsne_0.17             cluster_2.1.6          reshape2_1.4.4        
 [64] generics_0.1.3         gtable_0.3.5           spatstat.data_3.0-4   
 [67] class_7.3-22           tidyr_1.3.1            data.table_1.15.4     
 [70] utf8_1.2.4             spatstat.geom_3.2-9    RcppAnnoy_0.0.22      
 [73] ggrepel_0.9.5          RANN_2.6.1             pillar_1.9.0          
 [76] stringr_1.5.1          spam_2.10-0            RcppHNSW_0.6.0        
 [79] later_1.3.2            splines_4.4.0          dplyr_1.1.4           
 [82] lattice_0.22-6         survival_3.5-8         deldir_2.0-4          
 [85] tidyselect_1.2.1       miniUI_0.1.1.1         pbapply_1.7-2         
 [88] gridExtra_2.3          scattermore_1.2        devtools_2.4.5        
 [91] matrixStats_1.3.0      stringi_1.8.3          lazyeval_0.2.2        
 [94] codetools_0.2-20       tibble_3.2.1           cli_3.6.2             
 [97] uwot_0.2.2             xtable_1.8-4           reticulate_1.36.1     
[100] munsell_0.5.1          Rcpp_1.0.12            globals_0.16.3        
[103] spatstat.random_3.2-3  png_0.1-8              parallel_4.4.0        
[106] ellipsis_0.3.2         ggplot2_3.5.1          dotCall64_1.1-1       
[109] profvis_0.3.8          urlchecker_1.0.1       listenv_0.9.1         
[112] viridisLite_0.4.2      scales_1.3.0           ggridges_0.5.6        
[115] e1071_1.7-14           leiden_0.4.3.1         purrr_1.0.2           
[118] rlang_1.1.3            cowplot_1.1.3    

@alikhuseynov
Copy link

CPL_read_mdim

seems to be related ->
r-spatial/stars#663 (comment)
do you have recent GDAL version? eg:

gdalinfo --version
GDAL 3.9.0, released 2024/05/07

@SciComp8
Copy link

CPL_read_mdim

seems to be related -> r-spatial/stars#663 (comment) do you have recent GDAL version? eg:

gdalinfo --version
GDAL 3.9.0, released 2024/05/07

Yes, I do have the newly released GDAL. What’s the next step I should take to successfully load the data? Highly appreciate your help and time.

gdalinfo --version
GDAL 3.9.2, released 2024/08/13

@jsicherman
Copy link
Contributor Author

@ScienceComputing try re-installing (devtools::install_github("10XGenomics/seurat@develop", force = TRUE)) and re-attempting your call. I have updated it to not read the cells.zarr.zip in favor of the cells.csv.gz file, which should naturally resolve this issue.

@dcollins15 let me know if you want me to tackle these on this PR or if that's something you'd rather have your team manage. Changelog below.

The NAMESPACE and man/*.Rd files need to be updated — this can be done using roxygen2::roxygenise or devtools::document
An entry should be added to the NEWS.md file describing the changes introduced in this PR. The more detail provided in the changelog, the better for users

  • Improved: more fine-grained control over what parts of a Xenium experiment are loaded in LoadXenium
  • Added: ability to load Xenium nucleus segmentation masks
  • Added: read additional feature types from Xenium experiments
  • Added: read some run metadata (run start time, preservation method, panel used, organism, tissue type, instrument software version and stain kit used)
  • Improved: read cell_feature_matrix.h5 when present in favor of the MEX format files
  • Improved: more robust output/error handling in ReadXenium
  • Added: ability to read segmentation_method directly into Xenium meta.data
  • Added/fixed: ability to read .parquet files using arrow
  • Added: vignette for reading a Xenium dataset with the cell segmentation kit

@YfCem
Copy link

YfCem commented Sep 25, 2024

Thank you so much for the updates! I've been really looking forward to this to be able to analyze my Xenium data. (multi tissue Mouse panel + 100 custom genes)
This is probably a rookie question, but when I try to load my data I get the following error. Any ideas?

Error in CreateAssayObject():
! No cell names (colnames) names present in the input matrix

list.files(path)
 [1] "analysis"                     "analysis.zarr.zip"           
 [3] "analysis_summary.html"        "aux_outputs"                 
 [5] "cell_boundaries.csv.gz"       "cell_boundaries.parquet"     
 [7] "cell_feature_matrix"          "cell_feature_matrix.h5"      
 [9] "cell_feature_matrix.zarr.zip" "cells.csv"                   
[11] "cells.csv.7z"                 "cells.csv.gz"                
[13] "cells.parquet"                "cells.zarr.zip"              
[15] "experiment.xenium"            "gene_panel.json"             
[17] "metrics_summary.csv"          "morphology.ome.tif"          
[19] "morphology_focus"             "nucleus_boundaries.csv.gz"   
[21] "nucleus_boundaries.parquet"   "transcripts.parquet"         
[23] "transcripts.zarr.zip"     

 xenium.obj <- LoadXenium(path)
  Genome matrix has multiple modalities, returning a list of matrices for this genome
Error in `CreateAssayObject()`:
! No cell names (colnames) names present in the input matrix
Backtrace:
    ▆
 1. └─Seurat::LoadXenium(...)
 2.   └─SeuratObject::CreateAssayObject(counts = data$matrix[[name]])
 3.     └─rlang::abort(message = "No cell names (colnames) names present in the input matrix")
 
 sessionInfo()
R version 4.4.1 (2024-06-14 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 10 x64 (build 19045)

Matrix products: default


locale:
[1] LC_COLLATE=English_United Kingdom.utf8  LC_CTYPE=English_United Kingdom.utf8   
[3] LC_MONETARY=English_United Kingdom.utf8 LC_NUMERIC=C                           
[5] LC_TIME=English_United Kingdom.utf8    

time zone: Europe/Berlin
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] ggplot2_3.5.1      future_1.34.0      Seurat_5.1.0.9004  SeuratObject_5.0.2
[5] sp_2.1-4          

loaded via a namespace (and not attached):
  [1] RColorBrewer_1.1-3     rstudioapi_0.16.0      jsonlite_1.8.9        
  [4] magrittr_2.0.3         spatstat.utils_3.1-0   farver_2.1.2          
  [7] fs_1.6.4               vctrs_0.6.5            ROCR_1.0-11           
 [10] memoise_2.0.1          spatstat.explore_3.3-2 htmltools_0.5.8.1     
 [13] usethis_3.0.0          curl_5.2.2             sctransform_0.4.1     
 [16] parallelly_1.38.0      KernSmooth_2.23-24     htmlwidgets_1.6.4     
 [19] desc_1.4.3             ica_1.0-3              plyr_1.8.9            
 [22] plotly_4.10.4          zoo_1.8-12             cachem_1.1.0          
 [25] igraph_2.0.3           mime_0.12              lifecycle_1.0.4       
 [28] pkgconfig_2.0.3        Matrix_1.7-0           R6_2.5.1              
 [31] fastmap_1.2.0          fitdistrplus_1.2-1     shiny_1.9.1           
 [34] digest_0.6.37          colorspace_2.1-1       patchwork_1.3.0       
 [37] ps_1.8.0               tensor_1.5             RSpectra_0.16-2       
 [40] irlba_2.3.5.1          pkgload_1.4.0          progressr_0.14.0      
 [43] fansi_1.0.6            spatstat.sparse_3.1-0  httr_1.4.7            
 [46] polyclip_1.10-7        abind_1.4-8            compiler_4.4.1        
 [49] remotes_2.5.0          withr_3.0.1            bit64_4.0.5           
 [52] fastDummies_1.7.4      pkgbuild_1.4.4         R.utils_2.12.3        
 [55] MASS_7.3-60.2          sessioninfo_1.2.2      tools_4.4.1           
 [58] lmtest_0.9-40          httpuv_1.6.15          future.apply_1.11.2   
 [61] goftest_1.2-3          R.oo_1.26.0            glue_1.7.0            
 [64] callr_3.7.6            nlme_3.1-164           promises_1.3.0        
 [67] grid_4.4.1             Rtsne_0.17             cluster_2.1.6         
 [70] reshape2_1.4.4         generics_0.1.3         hdf5r_1.3.11          
 [73] gtable_0.3.5           spatstat.data_3.1-2    tzdb_0.4.0            
 [76] R.methodsS3_1.8.2      tidyr_1.3.1            data.table_1.16.0     
 [79] utf8_1.2.4             spatstat.geom_3.3-3    RcppAnnoy_0.0.22      
 [82] ggrepel_0.9.6          RANN_2.6.2             pillar_1.9.0          
 [85] stringr_1.5.1          spam_2.10-0            RcppHNSW_0.6.0        
 [88] later_1.3.2            splines_4.4.1          dplyr_1.1.4           
 [91] lattice_0.22-6         survival_3.6-4         bit_4.0.5             
 [94] deldir_2.0-4           tidyselect_1.2.1       miniUI_0.1.1.1        
 [97] pbapply_1.7-2          gridExtra_2.3          scattermore_1.2       
[100] devtools_2.4.5         matrixStats_1.4.1      stringi_1.8.4         
[103] lazyeval_0.2.2         codetools_0.2-20       tibble_3.2.1          
[106] cli_3.6.3              uwot_0.2.2             arrow_17.0.0.1        
[109] xtable_1.8-4           reticulate_1.39.0      munsell_0.5.1         
[112] processx_3.8.4         Rcpp_1.0.13            globals_0.16.3        
[115] spatstat.random_3.3-2  png_0.1-8              spatstat.univar_3.0-1 
[118] parallel_4.4.1         ellipsis_0.3.2         assertthat_0.2.1      
[121] dotCall64_1.1-1        profvis_0.3.8          urlchecker_1.0.1      
[124] listenv_0.9.1          viridisLite_0.4.2      scales_1.3.0          
[127] ggridges_0.5.6         leiden_0.4.3.1         purrr_1.0.2           
[130] rlang_1.1.4            cowplot_1.1.3

@jsicherman
Copy link
Contributor Author

@YfCem Were any edits made to the contents of cell_feature_matrix/ in your outs folder? If you do...

fread(file.path(path, 'cell_feature_matrix', 'features.tsv.gz'), header = FALSE)

You should see cell IDs. This error would suggest to me that, for some reason, that file is empty? Though admittedly I haven't dug around in the code to load the MEX outputs too much.

@YfCem
Copy link

YfCem commented Sep 25, 2024

@jsicherman Thanks for getting back to me so quick! I gave your recommendation a try and this is what I got:

fread(file.path(path, 'cell_feature_matrix', 'features.tsv.gz'), header = FALSE)
                          V1                      V2                        V3
                      <char>                  <char>                    <char>
  1:      ENSMUSG00000109644           0610005C13Rik           Gene Expression
  2:      ENSMUSG00000028441           1110017D15Rik           Gene Expression
  3:      ENSMUSG00000096001           2610528A11Rik           Gene Expression
  4:      ENSMUSG00000040412           5330417C22Rik           Gene Expression
  5:      ENSMUSG00000018451           6330403K07Rik           Gene Expression
 ---                                                                          
537: NegControlCodeword_0537 NegControlCodeword_0537 Negative Control Codeword
538: NegControlCodeword_0538 NegControlCodeword_0538 Negative Control Codeword
539: NegControlCodeword_0539 NegControlCodeword_0539 Negative Control Codeword
540: NegControlCodeword_0540 NegControlCodeword_0540 Negative Control Codeword
541: UnassignedCodeword_0020 UnassignedCodeword_0020       Unassigned Codeword

I'm interpreting this as the file is not empty?

Appreciate all the help!

@alikhuseynov
Copy link

CPL_read_mdim

seems to be related -> r-spatial/stars#663 (comment) do you have recent GDAL version? eg:

gdalinfo --version
GDAL 3.9.0, released 2024/05/07

Yes, I do have the newly released GDAL. What’s the next step I should take to successfully load the data? Highly appreciate your help and time.

gdalinfo --version
GDAL 3.9.2, released 2024/08/13

Probably it is now solved with the latest commits.
However if you want to try out Bioconductor stuff, here would be the example -> SFE_xenium, btw image can be loaded too for visualization and/or some image analysis

@dcollins15
Copy link
Contributor

Thanks for getting these updates pushed up so quickly, @jsicherman! I'll have another round of comments ready for you tomorrow 👌

To answer your question — ideally, the NAMESPACE, man/*.Rd, and NEWS.md updates should be made before this PR is merged. I'd be happy to take care of the final updates once the other changes are ready, but I don't think I have write access to the 10XGenomics/seruat:develop branch.

I'll also take a stab at refining the changelog you provided to ensure consistency — we appreciate you getting it started 🙏

In general, things are looking sweet! Excited to get this functionality merged and released 🤓

@SciComp8
Copy link

@ScienceComputing try re-installing (devtools::install_github("10XGenomics/seurat@develop", force = TRUE)) and re-attempting your call. I have updated it to not read the cells.zarr.zip in favor of the cells.csv.gz file, which should naturally resolve this issue.

@dcollins15 let me know if you want me to tackle these on this PR or if that's something you'd rather have your team manage. Changelog below.

The NAMESPACE and man/*.Rd files need to be updated — this can be done using roxygen2::roxygenise or devtools::document
An entry should be added to the NEWS.md file describing the changes introduced in this PR. The more detail provided in the changelog, the better for users

  • Improved: more fine-grained control over what parts of a Xenium experiment are loaded in LoadXenium
  • Added: ability to load Xenium nucleus segmentation masks
  • Added: read additional feature types from Xenium experiments
  • Added: read some run metadata (run start time, preservation method, panel used, organism, tissue type, instrument software version and stain kit used)
  • Improved: read cell_feature_matrix.h5 when present in favor of the MEX format files
  • Improved: more robust output/error handling in ReadXenium
  • Added: ability to read segmentation_method directly into Xenium meta.data
  • Added/fixed: ability to read .parquet files using arrow
  • Added: vignette for reading a Xenium dataset with the cell segmentation kit

Thank you so much for your wonderful updates! It works!

@SciComp8
Copy link

CPL_read_mdim

seems to be related -> r-spatial/stars#663 (comment) do you have recent GDAL version? eg:

gdalinfo --version
GDAL 3.9.0, released 2024/05/07

Yes, I do have the newly released GDAL. What’s the next step I should take to successfully load the data? Highly appreciate your help and time.

gdalinfo --version
GDAL 3.9.2, released 2024/08/13

Probably it is now solved with the latest commits. However if you want to try out Bioconductor stuff, here would be the example -> SFE_xenium, btw image can be loaded too for visualization and/or some image analysis

Yes, the Xenium 5K data works well with your team's latest commits. Thank you sooo much for your amazing help & suggestion. Absolutely, would like to give it try!

Copy link
Contributor

@dcollins15 dcollins15 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🚀 thanks for all your patience @jsicherman

I've got one small error-handling nit below — it's non-blocking and easy for us to address in quick follow-up PR, so I'll let you decide if it's worth adjusting now 😄

Just a heads-up: none of the datasets I've been using appear to have the segmentation_method column in either of the cells.csv.gz" or cells.parquet files. It's entirely possible I might have pulled down the wrong versions...

The last piece of housekeeping is to bump the version and date in the DESCRIPTION file and then I'll merge this in as v5.1.0.9006 🥳

R/convenience.R Show resolved Hide resolved
R/preprocessing.R Outdated Show resolved Hide resolved
We can directly visualize cells which were segmented according to each method.

```{r}
ImageDimPlot(xenium.obj, fov = "fov", dark.background = F, group.by = "segmentation_method", cols = c('#ffabc3', '#a9a900', '#a9ceff'))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-blocking but this line is throwing an error because of the missing meta.data column. We need to make some other updates to this vignette before our next release so we can make any required adjustments then.

NEWS.md Show resolved Hide resolved
@jsicherman
Copy link
Contributor Author

Thanks @dcollins15! Appreciate the discussion on the PR 🚀 !

@dcollins15 dcollins15 merged commit 63a7b1a into satijalab:develop Sep 30, 2024
1 check failed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants