Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simulated data #3

Open
M0hammadL opened this issue Jan 9, 2024 · 5 comments
Open

Simulated data #3

M0hammadL opened this issue Jan 9, 2024 · 5 comments

Comments

@M0hammadL
Copy link

hi

Thank you so much for the cool work; I was looking for the simulated data you generated in the paper Fig2 could you please point me to the h5ad for that dataset?

@russellkune
Copy link
Collaborator

Hi it wasn't saved as an h5ad. Are you able to run "Bassez_Spectra_scHPF_NMF_Slalom_comparison.ipynb"?

@M0hammadL
Copy link
Author

Thank you, did bot see this, is there a csv/h5ad not to run that whole notebook?

@russellkune
Copy link
Collaborator

russellkune commented Jan 9, 2024

I think we can try to generate this for you. Which panel of figure 2 are you ref. to?

@M0hammadL
Copy link
Author

Thank you so much, we wanted benchmark it properly

@mrland99
Copy link
Contributor

Hey,

I'm not sure exactly what you are looking for, but the reason why we didn't include csv/h5ad for each simulation was that we ran many different simulations varying parameters.

However, the code should be very simple to run. The simulations are generated in main() of utils/benchmark-utils/run_benchmark.py. Although we don't explicitly save it as csv/h5ad, the important part of the code is:

p = args.p
N = args.n
n_active_pathways = args.n_pathways

n_control_pathways = 5
gene_set_size = 20
k = 5
overlap = 0.3
signal_strength = 15
lam = 500/((gene_set_size)*(gene_set_size - 1))
gene_set_FPR = 0
gene_set_FNR = 0
model_kwargs = dict(a=0.3, c=0.3)
n_components = k + n_active_pathways + n_control_pathways

# simulate data
lam = 500/((gene_set_size)*(gene_set_size - 1))
data2, A_star2, theta_star2 = simulate_base_data(N= N,k = k,p = p,scale = 25)
data,base, A_star,gene_sets = create_pathways(n_control_pathways,n_active_pathways=n_active_pathways, gene_set_size=gene_set_size,p =p,N=N,overlap = overlap, signal_strength = signal_strength)
noisy_gs = noisy_gene_sets(gene_sets, p, gene_set_FNR,gene_set_FPR) 
X = data + data2


adj_matrix = np.zeros((p,p))
for gene_set in noisy_gs:
    for i in gene_set:
        for j in gene_set:
            if i!=j:
                adj_matrix[i,j] = 1

I = create_mask(noisy_gs, G_input = len(noisy_gs), D= X.shape[1]).T.astype(int)
terms = np.array([str(i) for i in range(I.shape[1])])

Where you can specify p = # of genes, n = # of cells, n_pathways = # of active pathways. The outputs you'll likely be interested in is X, adj_matrix and noisy_gs and it should be easy to save these in whatever format you like.

Also note that the code above includes functions defined in utils/benchmark-utils/simulation_functions.py.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants