Refactor anntorchdataset, add correct default dtypes #2250

martinkim0 · 2023-08-24T18:38:04Z

Closes #xxxx (Replace xxxx with the GitHub issue number)
Tests added and passed if fixing a bug or adding a new feature
All code checks passed.
Added type annotations to new arguments/methods/functions.
Added an entry in the latest docs/release_notes/index.md file if fixing a bug or adding a new feature.
If the changes are patches for a version, I have added the on-merge: backport to x.x.x label.

Summary

Centralize and clean up code that validates getitem_tensors
Now applies the correct default dtypes to different data registry keys. Before, everything was fetched as np.float32, but not continuous data is fetched as np.float32 and categorical data is fetched as np.int64. This leads to less dtype conversion downstream, e.g., an embedding module that does not have to convert indexes to Long.
Cleaner way to force batched single observations
Correctly applies sparse tensor conversion for backed sparse data now
Adds more tests for AnnTorchDataset

codecov · 2023-08-24T19:21:13Z

Codecov Report

Patch coverage: 100.00% and project coverage change: +0.05% 🎉

Comparison is base (ed28ad7) 89.09% compared to head (5bc57c6) 89.14%.

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #2250      +/-   ##
==========================================
+ Coverage   89.09%   89.14%   +0.05%     
==========================================
  Files         145      145              
  Lines       11817    11812       -5     
==========================================
+ Hits        10528    10530       +2     
+ Misses       1289     1282       -7

Files Changed	Coverage Δ
scvi/data/_manager.py	`98.52% <ø> (ø)`
scvi/dataloaders/_ann_dataloader.py	`93.33% <ø> (ø)`
scvi/model/base/_training_mixin.py	`95.45% <ø> (ø)`
scvi/data/_anntorchdataset.py	`95.83% <100.00%> (+5.47%)`	⬆️
scvi/data/_utils.py	`93.95% <100.00%> (+0.25%)`	⬆️

... and 3 files with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Martin Kim added 2 commits August 24, 2023 11:37

Refactor anntorchdataset

d0562e2

Add sparse tests

387397f

Martin Kim added 3 commits August 24, 2023 12:51

Add more cases

79f62be

Update docs

e330d13

Add release note and tests

5700747

martinkim0 changed the title ~~Refactor anntorchdataset~~ Refactor anntorchdataset, add correct default dtypes Aug 28, 2023

Martin Kim added 3 commits August 28, 2023 11:56

Add one more test

63678e3

Update release note

3f31d6b

Update docs

5bc57c6

martinkim0 merged commit fce33b2 into main Aug 28, 2023
5 checks passed

martinkim0 deleted the dataset-dtypes branch August 28, 2023 20:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor anntorchdataset, add correct default dtypes #2250

Refactor anntorchdataset, add correct default dtypes #2250

martinkim0 commented Aug 24, 2023 •

edited

Loading

codecov bot commented Aug 24, 2023 •

edited

Loading

Refactor anntorchdataset, add correct default dtypes #2250

Refactor anntorchdataset, add correct default dtypes #2250

Conversation

martinkim0 commented Aug 24, 2023 • edited Loading

Summary

codecov bot commented Aug 24, 2023 • edited Loading

Codecov Report

martinkim0 commented Aug 24, 2023 •

edited

Loading

codecov bot commented Aug 24, 2023 •

edited

Loading