Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): pre-processing functions for dask with sparse chunks #2856

Merged
merged 96 commits into from
Mar 22, 2024

Conversation

flying-sheep
Copy link
Member

@flying-sheep flying-sheep commented Feb 15, 2024

  • Release notes not necessary because:

Note I did not follow:
https://gist.github.com/Intron7/bbf5058794be7b81d3953ae39c17d8b8

This is because this PR is basically very simple. I just added an axis_sum function for dispatching on dask arrays (with sparse chunks) which now handles the needed functionality and then it propagates up to various functions as noted in the release note: scanpy.pp.scale, scanpy.pp.filter_cells, scanpy.pp.filter_genes, scanpy.pp.scale and scanpy.pp.highly_variable_genes

Copy link

codecov bot commented Feb 15, 2024

Codecov Report

Attention: Patch coverage is 96.71053% with 5 lines in your changes are missing coverage. Please review.

Project coverage is 75.47%. Comparing base (921fcca) to head (937c6db).

❗ Current head 937c6db differs from pull request most recent head b3581ea. Consider uploading reports for the commit b3581ea to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2856      +/-   ##
==========================================
+ Coverage   75.25%   75.47%   +0.22%     
==========================================
  Files         116      116              
  Lines       12788    12896     +108     
==========================================
+ Hits         9623     9733     +110     
+ Misses       3165     3163       -2     
Files Coverage Δ
scanpy/preprocessing/_distributed.py 95.23% <100.00%> (-4.77%) ⬇️
scanpy/preprocessing/_highly_variable_genes.py 95.60% <100.00%> (ø)
scanpy/preprocessing/_utils.py 54.66% <100.00%> (+5.41%) ⬆️
scanpy/testing/_pytest/params.py 100.00% <100.00%> (ø)
scanpy/_utils/__init__.py 74.40% <98.87%> (+5.70%) ⬆️
scanpy/preprocessing/_simple.py 85.15% <96.15%> (+1.94%) ⬆️
scanpy/preprocessing/_normalization.py 85.71% <85.00%> (-3.32%) ⬇️

@flying-sheep flying-sheep changed the title WIP dask support for var_mean Support calling var_mean on dask arrays containing sparse matrices Feb 19, 2024
@flying-sheep flying-sheep added this to the 1.10.0 milestone Feb 19, 2024
@flying-sheep flying-sheep modified the milestones: 1.10.0, 1.11.0 Feb 23, 2024
@ilan-gold ilan-gold force-pushed the dask-sparse-mean-var branch 3 times, most recently from 987c7d5 to 59b3f65 Compare February 27, 2024 14:06
@ilan-gold ilan-gold changed the title Support calling var_mean on dask arrays containing sparse matrices (feat): pre-processing functions for dask with sparse chunks Feb 27, 2024
@ilan-gold ilan-gold self-assigned this Feb 28, 2024
@ilan-gold ilan-gold requested a review from ivirshup March 8, 2024 06:49
@ilan-gold
Copy link
Contributor

pre-commit.ci autofix

@ilan-gold ilan-gold requested a review from ivirshup March 22, 2024 15:41
@ivirshup ivirshup merged commit 4b757d8 into main Mar 22, 2024
11 checks passed
@ivirshup ivirshup deleted the dask-sparse-mean-var branch March 22, 2024 16:22
meeseeksmachine pushed a commit to meeseeksmachine/scanpy that referenced this pull request Mar 22, 2024
ivirshup pushed a commit that referenced this pull request Mar 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Highly variable genes for sparse dataset in backed mode
4 participants