Hey,

I'm not sure exactly what you are looking for, but the reason why we didn't include csv/h5ad for each simulation was that we ran many different simulations varying parameters.

However, the code should be very simple to run. The simulations are generated in main() of utils/benchmark-utils/run_benchmark.py. Although we don't explicitly save it as csv/h5ad, the important part of the code is:

p = args.p
N = args.n
n_active_pathways = args.n_pathways

n_control_pathways = 5
gene_set_size = 20
k = 5
overlap = 0.3
signal_strength = 15
lam = 500/((gene_set_size)*(gene_set_size - 1))
gene_set_FPR = 0
gene_set_FNR = 0
model_kwargs = dict(a=0.3, c=0.3)
n_components = k + n_active_pathways + n_control_pathways

# simulate data
lam = 500/((gene_set_size)*(gene_set_size - 1))
data2, A_star2, theta_star2 = simulate_base_data(N= N,k = k,p = p,scale = 25)
data,base, A_star,gene_sets = create_pathways(n_control_pathways,n_active_pathways=n_active_pathways, gene_set_size=gene_set_size,p =p,N=N,overlap = overlap, signal_strength = signal_strength)
noisy_gs = noisy_gene_sets(gene_sets, p, gene_set_FNR,gene_set_FPR) 
X = data + data2


adj_matrix = np.zeros((p,p))
for gene_set in noisy_gs:
    for i in gene_set:
        for j in gene_set:
            if i!=j:
                adj_matrix[i,j] = 1

I = create_mask(noisy_gs, G_input = len(noisy_gs), D= X.shape[1]).T.astype(int)
terms = np.array([str(i) for i in range(I.shape[1])])

Where you can specify p = # of genes, n = # of cells, n_pathways = # of active pathways. The outputs you'll likely be interested in is X, adj_matrix and noisy_gs and it should be easy to save these in whatever format you like.

Also note that the code above includes functions defined in utils/benchmark-utils/simulation_functions.py.

Simulated data #3

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions