Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Artifact$open() method #117

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions .github/workflows/R-CMD-check.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,13 @@ jobs:
echo "LDFLAGS=-L$OPENBLAS/lib" >> $GITHUB_ENV
echo "CPPFLAGS=-I$OPENBLAS/include" >> $GITHUB_ENV

- name: Install {tiledbsoma}
if: runner.os == 'Linux'
run: |
options(repos = c("https://chanzuckerberg.r-universe.dev", getOption("repos")))
install.packages("tiledbsoma")
shell: Rscript {0}

- name: Install lamindb
run: |
pip install lamindb[aws]
Expand Down
6 changes: 6 additions & 0 deletions .github/workflows/pkgdown.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,12 @@ jobs:
needs: website
quarto-version: pre-release

- name: Install {tiledbsoma}
run: |
options(repos = c("https://chanzuckerberg.r-universe.dev", getOption("repos")))
install.packages("tiledbsoma")
shell: Rscript {0}

- name: Install lamindb
run: |
pip install lamindb[aws]
Expand Down
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,9 @@
# laminr devel

## NEW FUNCTIONALITY

- Add a `open()` method to the `Artifact` class to connect to TileDB-SOMA stores (PR #117).

# laminr v0.2.0

This release adds support for creating new artifacts in a LaminDB instance.
Expand Down
29 changes: 29 additions & 0 deletions R/Artifact.R
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,35 @@ ArtifactRecord <- R6::R6Class( # nolint object_name_linter
}
},
#' @description
#' Return a backed data object. Currently only supports TileDB-SOMA
#' artifacts.
#'
#' @return A [tiledbsoma::SOMACollection] or [tiledbsoma::SOMAExperiment]
#' object
open = function() {
is_tiledbsoma <- private$get_value("suffix") == ".tiledbsoma" ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll have to see if we have access to that.

private$get_value("_accessor") == "tiledbsoma"

if (!is_tiledbsoma) {
cli::cli_abort(
"The {.code open} method is only supported for TileDB-SOMA artifacts"
)
}

check_requires(
"Opening TileDB-SOMA artifacts", "tiledbsoma",
extra_repos = "https://chanzuckerberg.r-universe.dev"
)

artifact_uri <- paste0(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that it won't work for s3 paths i think. At least the correct region should be provided, we have additional setup here https://github.com/laminlabs/lamindb/blob/39e0f529a41d9cbc475e52fe3aa0b8095af21494/lamindb/core/storage/_tiledbsoma.py#L33 but not sure what is available in R.

private$get_value("storage")$root,
"/",
private$get_value("key")
)

tiledbsoma::SOMAOpen(artifact_uri)
},
#' @description
#' Print a more detailed description of an `ArtifactRecord`
#'
#' @param style Logical, whether the output is styled using ANSI codes
Expand Down
2 changes: 1 addition & 1 deletion R/Instance.R
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ create_instance <- function(instance_settings, is_default = FALSE) {

py_lamin <- NULL
if (isTRUE(is_default)) {
check_requires("Connecting to Python", "reticulate", type = "warning")
check_requires("Connecting to Python", "reticulate", alert = "warning")

py_lamin <- tryCatch(
reticulate::import("lamindb"),
Expand Down
49 changes: 34 additions & 15 deletions R/utils.R
Original file line number Diff line number Diff line change
Expand Up @@ -6,37 +6,56 @@
#' @param what A message stating what the packages are required for. Used at the
#' start of the error message e.g. "{what} requires...".
#' @param requires Character vector of required package names
#' @param type Type of message to give if packages are missing
#' @param alert Type of message to give if packages are missing
#' @param extra_repos Additional repositories that are required to install the
#' checked packages
#'
#' @return Invisibly, Boolean whether or not all packages are available or
#' raises an error if any are missing and `type = "error"`
#' @noRd
check_requires <- function(what, requires, type = c("error", "warning")) {
type <- match.arg(type)
check_requires <- function(what, requires,
alert = c("error", "warning", "message", "none"),
extra_repos = NULL) {
alert <- match.arg(alert)

is_available <- map_lgl(requires, requireNamespace, quietly = TRUE)

msg_fun <- switch(type,
msg_fun <- switch(alert,
error = cli::cli_abort,
warning = cli::cli_warn
warning = cli::cli_warn,
message = cli::cli_inform,
none = NULL
)

if (any(!is_available)) {
if (!any(is_available) && !is.null(msg_fun)) {
missing <- requires[!is_available]
missing_str <- paste0("'", paste(missing, collapse = "', '"), "'") # nolint object_usage_linter
msg_fun(
c(
"{what} requires the {.pkg {missing}} package{?s}",
"i" = paste(
"Install {cli::qty(missing)}{?it/them} using",
"{.run install.packages(c({missing_str}))}"

msg <- "{what} requires the {.pkg {missing}} package{?s}"

if (!is.null(extra_repos)) {
msg <- c(
msg,
"i" = paste0(
"Add repositories using {.run options(repos = c(",
paste0("'", paste(extra_repos, collapse = "', '"), "'"),
", getOption('repos'))}, then:"
)
),
call = rlang::caller_env()
)
}

msg <- c(
msg,
"i" = paste(
"Install {cli::qty(missing)}{?it/them} using",
"{.run install.packages(c({missing_str}))}"
)
)

msg_fun(msg, call = rlang::caller_env())
}

invisible(all(is_available))
invisible(any(is_available))
}

#' Check if we are in a knitr notebook
Expand Down
15 changes: 15 additions & 0 deletions tests/testthat/test-utils.R
Original file line number Diff line number Diff line change
@@ -1,7 +1,22 @@
test_that("check_requires works", {
expect_true(check_requires("Imported packages", "cli"))

expect_error(
check_requires("Missing packages", "a_missing_package"),
regexp = "Missing packages requires"
)

expect_warning(
check_requires("Missing packages", "a_missing_package", alert = "warning"),
regexp = "Missing packages requires"
)

expect_message(
check_requires("Missing packages", "a_missing_package", alert = "message"),
regexp = "Missing packages requires"
)

expect_false(
check_requires("Missing packages", "a_missing_package", alert = "none")
)
})
1 change: 1 addition & 0 deletions vignettes/architecture.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -417,6 +417,7 @@ classDiagram
+...field value accessors...
+cache(): String
+load(): AnnData | DataFrame | ...
+open(): SOMACollection | SOMAExperiment
+describe(): NULL
}
style Artifact fill:#ffe1c9
Expand Down
3 changes: 3 additions & 0 deletions vignettes/development.qmd
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,8 @@ This document outlines the features of the **{laminr}** package and the roadmap
* [x] **Delete artifacts**: Delete an existing artifact.
* [ ] **Manage artifact metadata**: Add, update, and delete artifact metadata.
* [ ] **Work with collections**: Create, manage, and query collections of artifacts.
* [ ] **Stream backed artifacts**: Connect to file-backed artifacts (`$open`).
- [x] `tiledbsoma`: Stream TileDB-SOMA objects

### Track notebooks & scripts

Expand Down Expand Up @@ -141,6 +143,7 @@ A first version of the package that allows users to:
* Implement data lineage visualization.
* Introduce data curation features (validation, standardization, annotation).
* Enhance support for bionty registries and ontology interactions.
* Connect to TileDB-SOMA artifacts.

### Future versions

Expand Down
45 changes: 26 additions & 19 deletions vignettes/laminr.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,6 @@ knitr::opts_chunk$set(
# actually upload results to the LaminDB instance
# -> testuser1 is a test account that cannot upload results
submit_eval <- laminr:::.get_user_settings()$handle != "testuser1"
submit_eval <- FALSE
```

This vignette introduces the basic **{laminr}** workflow.
Expand Down Expand Up @@ -149,29 +148,37 @@ DotPlot(seurat_obj, features = unique(markers$gene)) +
ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90, vjust = 0.5))
```

# Slice the tiledbsoma array store
# Slice a TileDB-SOMA array store

Alternatively to accessing individual CELLxGENE datasets from LaminDB, the **{cellxgene.census}** package can be used to slice the TileDB-SOMA array store for CELLxGENE Census, a concatenated version of most datasets in CELLxGENE.
When artifacts contain TileDB-SOMA array stores they can be opened and sliced using the [**{tiledbsoma}** package](https://single-cell-data.github.io/TileDB-SOMA/index.html).

```{r slice-tiledbsoma, eval=FALSE}
library(cellxgene.census)
```{r slice-tiledbsoma, eval = requireNamespace("tiledbsoma", quietly = TRUE)}
# Set some environment variables to avoid an issue with {tiledbsoma}
# https://github.com/chanzuckerberg/cellxgene-census/issues/1261
Sys.setenv(TILEDB_VFS_S3_REGION = "us-west-2")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aha, ok, the region is provided here, i see. But it is probably much better to provide via tiledb context.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was just a hacky workaround because there is a bug in the {tiledbsoma} package. I'll look into the context stuff but I'm not sure if we get all the necessary information from the API.

Copy link
Member

@falexwolf falexwolf Nov 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that work-around is OK for now given you clearly marked it in the comments!

Sergei can help find a cleaner solution over the next weeks & months; it's not so urgent yet that this is 100% polished.

I think this is good to merge!

Sys.setenv(AWS_DEFAULT_REGION = "us-west-2")
Sys.setenv(TILEDB_VFS_S3_NO_SIGN_REQUEST = "true")

census <- open_soma()

organism <- "Homo sapiens"
gene_filter <- "feature_id %in% c('ENSG00000107317', 'ENSG00000106034')"
cell_filter <- "cell_type == 'sympathetic neuron'"
cell_columns <- c(
"assay", "cell_type", "tissue", "tissue_general", "suspension_type", "disease"
# Define a filter to select specific cells
value_filter <- paste(
"tissue == 'brain' &&",
"cell_type %in% c('microglial cell', 'neuron') &&",
"suspension_type == 'cell' &&",
"assay == '10x 3\\' v3'"
)

seurat_obj2 <- get_seurat(
census = census,
organism = organism,
var_value_filter = gene_filter,
obs_value_filter = cell_filter,
obs_column_names = cell_columns
)
# Get the artifact containing the CELLxGENE Census TileDB-SOMA store
census_artifact <- cellxgene$Artifact$get("FYMewVq5twKMDXVy0001")
# Open the SOMACollection
soma_collection <- census_artifact$open()
# Slice the store to get a SOMADataFrame containing metadata for the cells of interest
cell_metadata <- soma_collection$get("census_data")$get("homo_sapiens")$obs$read(value_filter = value_filter)
# Concatenate the results to an arrow::Table
cell_metadata <- cell_metadata$concat()
# Convert to a data.frame
cell_metadata <- cell_metadata$to_data_frame()

cell_metadata
```

# Save the results
Expand Down
Loading