-
Notifications
You must be signed in to change notification settings - Fork 26
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate new data given labels #53
Comments
Hi Feng, It is not written clearly (I apologize); basically, in step-by-step mode, the final |
Thank you so much for the quick reply, I tried to make a toy model for this, where I have a sce_train for model fitting and a sce_test for just inference. I only used sce_test in the last step. It seems there was a mismatching shape issue. Can you help me check this?
|
Oh I forgot this. |
Thank you for your information. I tested as suggested.
The code stopped at
But I checked |
I made a minimal test script here. Really appreciate your help!
|
Thanks for this demo! I found a bug and we will update the package as soon as possible. I will @ you once we update the package. |
Thank you and look forward to your update. |
Hi Feng, @feng-bao-ucsf I just updated the code. The newest code on GitHub should work. When I was running the demo code you provided, I noticed that one parameter ( MOBSC_newcount <- simu_new(
sce = sce_train,
mean_mat = MOBSC_para$mean_mat,
sigma_mat = MOBSC_para$sigma_mat,
zero_mat = MOBSC_para$zero_mat,
quantile_mat = NULL,
copula_list = MOBSC_copula$copula_list,
n_cores = 2,
family_use = "nb",
input_data = con_data_train$dat,
new_covariate = con_data_test$dat,
filtered_gene = con_data_train$filtered_gene
) The parameter Thanks again for your interest in scDesign3. Please let us know if you have any further questions! |
Thank you so much for the quick response. I tested the code using the data from figshare, and it ran perfectly. However, when I tried to run with my own data, I found there might be some issues for the Here is my code to test the data and the data: library(scDesign3)
library(SingleCellExperiment)
library(ggplot2)
library(dplyr)
data <- read.csv("data_intestine.csv", row.names = 1)
meta <- read.csv("meta_intestine.csv", row.names = 1)
meta <- meta[, c("x", "y", "label")]
colnames(meta) <- c("x", "y", "cell_type")
# Add 1 to each element in the 'label' column
meta$label <- meta$cell_type + 1
sce <- SingleCellExperiment(
assays = list(counts = t(data)),
colData = meta
)
sce_train <- sce[1:10, 1:50]
sce_test <- sce[1:10, 51:80]
set.seed(123)
con_data_train <- construct_data(
sce = sce_train,
assay_use = "counts",
celltype = "cell_type",
pseudotime = NULL,
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
con_data_test <- construct_data(
sce = sce_test,
assay_use = "counts",
celltype = "cell_type",
pseudotime = NULL,
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
MOBSC_marginal <- fit_marginal(
data = con_data_train,
predictor = "gene",
mu_formula = "cell_type",
sigma_formula = "cell_type",
family_use = "nb",
n_cores = 2,
usebam = FALSE,
parallelization = "pbmcmapply"
)
MOBSC_copula <- fit_copula(
sce = sce_train,
assay_use = "counts",
marginal_list = MOBSC_marginal,
family_use = "nb",
copula = "gaussian",
n_cores = 2,
input_data = con_data_train$dat
)
MOBSC_para <- extract_para(
sce = sce_train,
marginal_list = MOBSC_marginal,
n_cores = 1,
family_use = "nb",
new_covariate = con_data_test$newCovariate,
data = con_data_train$dat
)
MOBSC_newcount <- simu_new(
sce = sce_train,
mean_mat = MOBSC_para$mean_mat,
sigma_mat = MOBSC_para$sigma_mat,
zero_mat = MOBSC_para$zero_mat,
quantile_mat = NULL,
copula_list = MOBSC_copula$copula_list,
n_cores = 2,
family_use = "nb",
input_data = con_data_train$dat,
new_covariate = con_data_test$dat,
filtered_gene = con_data_train$filtered_gene
)
The error info is:
|
Dear Dongyuan and Qingyang, I also tried other example datasets from the scDesign3. It seems the same error message also happened for those. Can you help check that? Really appreciate it. library(scDesign3)
library(SingleCellExperiment)
library(dplyr)
example_sce <- readRDS((url("https://figshare.com/ndownloader/files/40581980")))
# I also tried this dataset
# example_sce <- readRDS((url("https://figshare.com/ndownloader/files/40581965")))
sce_train <- example_sce[1:10, 1:50]
sce_test <- example_sce[1:10, 51:80]
set.seed(123)
con_data_train <- construct_data(
sce = sce_train,
assay_use = "counts",
celltype = "cell_type",
pseudotime = NULL,
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
con_data_test <- construct_data(
sce = sce_test,
assay_use = "counts",
celltype = "cell_type",
pseudotime = NULL,
spatial = NULL,
other_covariates = NULL,
corr_by = "1"
)
MOBSC_marginal <- fit_marginal(
data = con_data_train,
predictor = "gene",
mu_formula = "cell_type",
sigma_formula = "cell_type",
family_use = "nb",
n_cores = 2,
usebam = FALSE,
parallelization = "pbmcmapply"
)
MOBSC_copula <- fit_copula(
sce = sce_train,
assay_use = "counts",
marginal_list = MOBSC_marginal,
family_use = "nb",
copula = "gaussian",
n_cores = 2,
input_data = con_data_train$dat
)
MOBSC_para <- extract_para(
sce = sce_train,
marginal_list = MOBSC_marginal,
n_cores = 1,
family_use = "nb",
new_covariate = con_data_test$newCovariate,
data = con_data_train$dat
)
MOBSC_newcount <- simu_new(
sce = sce_train,
mean_mat = MOBSC_para$mean_mat,
sigma_mat = MOBSC_para$sigma_mat,
zero_mat = MOBSC_para$zero_mat,
quantile_mat = NULL,
copula_list = MOBSC_copula$copula_list,
n_cores = 2,
family_use = "nb",
input_data = con_data_train$dat,
new_covariate = con_data_test$dat,
filtered_gene = con_data_train$filtered_gene
)
|
Hi Feng, I just uploaded the code. The error is due to the incorrect data format that results when selecting only one column from a data frame. I tested the newest code with your script above. The data with the link (https://figshare.com/ndownloader/files/40581980) ran with no error. The data with the link (https://figshare.com/ndownloader/files/40581965) produces an error. However, it causes an error because of the training and testing data. The training data does not contain CD14+ monocyte cells, while the testing data does; thus, the GAMLSS model can't predict the parameters for an unseen cell type. Once I removed the three CD14+ monocyte cells from the testing data, the data with the link (https://figshare.com/ndownloader/files/40581965) also ran with no error. Thanks for letting us know about this bug, and please let us know if you have further questions. |
Thank you so much for the update. It works perfectly on the test datasets. I am working to extend the test to additional datasets and will close this slot shortly. |
I did another test on prediction of new samples but based on spatial positions. Here is my test code: library(scDesign3)
library(SingleCellExperiment)
library(dplyr)
example_sce <- readRDS((url("https://figshare.com/ndownloader/files/40582019")))
# example_sce <- readRDS((url("https://figshare.com/ndownloader/files/40581977")))
sce_train <- example_sce[1:10, 1:50]
sce_test <- example_sce[1:10, 51:80]
set.seed(123)
con_data_train <- construct_data(
sce = sce_train,
assay_use = "counts",
celltype = "cell_type",
pseudotime = NULL,
spatial = c("spatial1", "spatial2"),
other_covariates = NULL,
corr_by = "1"
)
con_data_test <- construct_data(
sce = sce_test,
assay_use = "counts",
celltype = "cell_type",
pseudotime = NULL,
spatial = c("spatial1", "spatial2"),
other_covariates = NULL,
corr_by = "1"
)
MOBSC_marginal <- fit_marginal(
data = con_data_train,
predictor = "gene",
mu_formula = "s(spatial1, spatial2, bs = 'gp', k= 400)",
sigma_formula =1,
family_use = "nb",
n_cores = 2,
usebam = FALSE,
parallelization = "pbmcmapply"
)
MOBSC_copula <- fit_copula(
sce = sce_train,
assay_use = "counts",
marginal_list = MOBSC_marginal,
family_use = "nb",
copula = "gaussian",
n_cores = 2,
input_data = con_data_train$dat
)
MOBSC_para <- extract_para(
sce = sce_train,
marginal_list = MOBSC_marginal,
n_cores = 1,
family_use = "nb",
new_covariate = con_data_test$newCovariate,
data = con_data_train$dat
)
MOBSC_newcount <- simu_new(
sce = sce_train,
mean_mat = MOBSC_para$mean_mat,
sigma_mat = MOBSC_para$sigma_mat,
zero_mat = MOBSC_para$zero_mat,
quantile_mat = NULL,
copula_list = MOBSC_copula$copula_list,
n_cores = 2,
family_use = "nb",
input_data = con_data_train$dat,
new_covariate = con_data_test$dat,
filtered_gene = con_data_train$filtered_gene
)
The code stops at
Any ideas on this error? Really appreciate your help! |
Hi Dongyuan,
Thanks for sharing the work. I have a question about using the model: can I first train model on one dataset and learn the relation between cell type and omics parameters; then I have a new list of cell types, can I use trained model to generate the gene data based on the new cell type list? I did not find this function from examples, maybe I missed that.
Thank you!
Feng
The text was updated successfully, but these errors were encountered: