Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor anntorchdataset, add correct default dtypes #2250

Merged
merged 8 commits into from
Aug 28, 2023
Merged

Conversation

martinkim0
Copy link
Contributor

@martinkim0 martinkim0 commented Aug 24, 2023

  • Closes #xxxx (Replace xxxx with the GitHub issue number)
  • Tests added and passed if fixing a bug or adding a new feature
  • All code checks passed.
  • Added type annotations to new arguments/methods/functions.
  • Added an entry in the latest docs/release_notes/index.md file if fixing a bug or adding a new feature.
  • If the changes are patches for a version, I have added the on-merge: backport to x.x.x label.

Summary

  • Centralize and clean up code that validates getitem_tensors
  • Now applies the correct default dtypes to different data registry keys. Before, everything was fetched as np.float32, but not continuous data is fetched as np.float32 and categorical data is fetched as np.int64. This leads to less dtype conversion downstream, e.g., an embedding module that does not have to convert indexes to Long.
  • Cleaner way to force batched single observations
  • Correctly applies sparse tensor conversion for backed sparse data now
  • Adds more tests for AnnTorchDataset

@codecov
Copy link

codecov bot commented Aug 24, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05% 🎉

Comparison is base (ed28ad7) 89.09% compared to head (5bc57c6) 89.14%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #2250      +/-   ##
==========================================
+ Coverage   89.09%   89.14%   +0.05%     
==========================================
  Files         145      145              
  Lines       11817    11812       -5     
==========================================
+ Hits        10528    10530       +2     
+ Misses       1289     1282       -7     
Files Changed Coverage Δ
scvi/data/_manager.py 98.52% <ø> (ø)
scvi/dataloaders/_ann_dataloader.py 93.33% <ø> (ø)
scvi/model/base/_training_mixin.py 95.45% <ø> (ø)
scvi/data/_anntorchdataset.py 95.83% <100.00%> (+5.47%) ⬆️
scvi/data/_utils.py 93.95% <100.00%> (+0.25%) ⬆️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@martinkim0 martinkim0 changed the title Refactor anntorchdataset Refactor anntorchdataset, add correct default dtypes Aug 28, 2023
@martinkim0 martinkim0 merged commit fce33b2 into main Aug 28, 2023
5 checks passed
@martinkim0 martinkim0 deleted the dataset-dtypes branch August 28, 2023 20:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant