Integrating several samples, vanishing genes :) #30

Dalhte · 2023-08-11T08:36:56Z

Hi there.
First, thanks for this very nice tool :)
I have a small probleme (it's probably trivial but I'm very new to bioinformatic analysis): I have 4 differents 10X multiom ATAC+RNA samples : PG2, PG6 PG24 and PG13. I integrated those using seurat/signac first then I tried to run Multivelo
I treated the different samples separately for the preprocessing steps :
For example for PG2 :

adata_atacPG2 = sc.read_10x_mtx('/media/david/F/yard/apps/cellranger-arc-2.0.2/PG2/outs/filtered_feature_bc_matrix/', var_names='gene_symbols', cache=True, gex_only=False)
adata_atacPG2 = adata_atacPG2[:,adata_atacPG2.var['feature_types'] == "Peaks"]
adata_atacPG2 = mv.aggregate_peaks_10x(adata_atacPG2,'/media/david/F/yard/apps/cellranger-arc-2.0.2/PG2/outs/atac_peak_annotation.tsv', '/media/david/F/yard/apps/cellranger-arc-2.0.2/PG2/outs/analysis/feature_linkage/feature_linkage.bedpe',verbose=True)
mv.tfidf_norm(adata_atacPG2)

I renamed the cells with unique barcodes:

barcodes = adata_atacPG2.obs.index
barcodesnew = ['PG2_' + bc[0:len(bc)-2] for bc in barcodes]
adata_atacPG2.obs.index = barcodesnew

Having done that on the four samples, I generated a single object by concatenation :

adata_atacPG = adata_atacPG2.concatenate([adata_atacPG6, adata_atacPG24, adata_atacPG13])

Then I processed the RNA and so one. Everything seems to work very nicely BUT I lose some of the genes (and some important ones that is). After investigation, I realized that these genes were lost during the concatenation step, probably because these are specifically present in some of the adata_atacPGXX objects but not in every ones.

How may I tackle this problem ?

Best
David

Dalhte · 2023-08-11T10:42:51Z

Hello again
I solved part of my problem modifying the concat parameters to :

adata_atacPG = adata_atacPG2.concatenate([adata_atacPG6, adata_atacPG24, adata_atacPG13], join='outer')
And retrieved 3000 more genes. But, I'm quite greedy and I would love to have some more (There are some interesting ones still missing). I realized those last few genes are absent from the atac_peak_annotation.tsv files. It's quite surprising as signac LinkPeaks function manage to find linkeage between these gene and ATAC open regions. So my question: Do you know if it might be possible to extract the
-atac_peak_annotation.tsv
and prehaps also the feature_linkage.bedpe files
from Seurat/signac analyis ?
Best
David

Dalhte · 2023-08-11T13:34:19Z

me again, realizing my last question was discussed in previous #29 and #1 post !

Dalhte · 2023-08-12T10:49:10Z

Hi there
I still have some problems I cannot solve this time :
In my RNA anndata object (adatam1), I have 13873 genes
In my ATAC anndata object ( adata_atacPGc), I also have 13873 genes, I checked that they are exactly the same
I just runned the mv.recover_dynamics_chrom on these two objects :

adata_result1 = mv.recover_dynamics_chrom(adatam1, adata_atacPGc,max_iter=5, init_mode="invert",verbose=False,parallel=True,save_plot=False,rna_only=False,fit=True,n_anchors=500,extra_color_key='celltype')
I got :
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 13873/13873 [19:44:34<00:00, 5.12s/it]
With the 13873 genes
But :
adata_result1
AnnData object with n_obs × n_vars = 28298 × 8630
I'm loosing more than 5000 genes, some of them with RNA velocity in the RNA objects, and some of them really relevant for what I'm analyzing.
How can I recover these genes ?
Thanks for your help
Best
David

danielee0707 · 2023-08-12T21:31:51Z

Hi David,
We recommend first using scVelo's preprocessing steps to prepare the input data. The "filter_and_normalize" function is able to select genes with minimum counts in the unspliced and spliced matrices. Multivelo also does some filtering steps internally to discard low-quality genes and make sure the velocity inference algorithm will run and converge successfully. These steps are crucial to ensure that low-quality genes won't obstruct downstream analyses such as lineage prediction. Unfortunately, there's no way to turn these steps off currently.

Dalhte closed this as completed Aug 11, 2023

Dalhte reopened this Aug 12, 2023

Dalhte closed this as completed Aug 22, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrating several samples, vanishing genes :) #30

Integrating several samples, vanishing genes :) #30

Dalhte commented Aug 11, 2023

Dalhte commented Aug 11, 2023

Dalhte commented Aug 11, 2023

Dalhte commented Aug 12, 2023

danielee0707 commented Aug 12, 2023

Integrating several samples, vanishing genes :) #30

Integrating several samples, vanishing genes :) #30

Comments

Dalhte commented Aug 11, 2023

Dalhte commented Aug 11, 2023

Dalhte commented Aug 11, 2023

Dalhte commented Aug 12, 2023

danielee0707 commented Aug 12, 2023