Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ensure that merged object counts are stored as sparse dgCMatrix #289

Open
allyhawkins opened this issue Feb 3, 2025 · 5 comments
Open

Comments

@allyhawkins
Copy link
Member

When working with the merged objects, I noticed that the counts matrix wasn't quite the correct format. It looks we are storing a DelayedMatrix instead of a dgCMatrix. I think we want to match what's typically in a SingleCellExperiment object, which would be a dgCMatrix. I don't think this is super pressing, but we might consider adding a line to the merge_sce_list() function to convert to dgCMatrix before returning.

@sjspielman
Copy link
Member

Do we have a sense of what effect this will have on runtime? I don't really have a sense either way but I can dream that maybe making this sparse could help!

@allyhawkins
Copy link
Member Author

Do we have a sense of what effect this will have on runtime? I don't really have a sense either way but I can dream that maybe making this sparse could help!

I think it will actually slow down any scripts that use this function since the conversion itself can take some time. Working with a dataset that has 13 samples this took 30 seconds or so, and I can imagine it might be longer on bigger objects. But it does decrease the size of the counts matrix. I saw about a 20% reduction in size in the matrix itself when storing as dgCMatrix over DelayedArray.

@sjspielman
Copy link
Member

But it does decrease the size of the counts matrix. I saw about a 20% reduction in size in the matrix itself when storing as dgCMatrix over DelayedArray.

Yeah, I was also thinking about time to write the file, so maybe things even out!

@jashapiro
Copy link
Member

Just noting that we should use the more general CsparseMatrix rather than specifying the lower level representation.

@jashapiro
Copy link
Member

jashapiro commented Feb 10, 2025

We should also maybe test setting delayed=FALSE in the combineCols function call?

Either change will likely increase memory usage for the script, so we may need to be cognizant of how it may affect our ability to merge large objects.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants