Skip to content
This repository has been archived by the owner on Apr 1, 2022. It is now read-only.
/ lumo_data Public archive

A pytorch DataLoader used loky-backend multiprocess context and a dataset added cache support

License

Notifications You must be signed in to change notification settings

lumo-tech/lumo_data

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

lumo.data

# pip install lumo_data
pip install git+https://github.com/sailist/lumo_data

howtouse

Loky-backend DataLoader

A new DataLoader that use loky-backend multiprocess-context and new Fetcher.

# from torch.utils.data import DataLoader
from lumo_data import DataLoader

loader = DataLoader(dataset=..., batch_size=..., num_workers=...)
for batch in loader:
    ...

Notifiable Dataset

Override notify method, then you can write some code to cache your next batch data at one time.

from lumo_data import Dataset, DataLoader


class CachedDataset(Dataset):
    ...

    def notify(self, ids):
        self.cache = []
        # sometimes, load a batch data at one time is faster than
        # load multiple sample singly. Thats the meaning of `notify`. 
        chunk = load_chuhnk_data_method(ids)
        for sample in chunk:
            self.cache.append(sample)

    def __getitem__(self, item):
        return self.cache.pop(0)


class NocacheDataset(Dataset):
    ...

    # you can also use this one like the original Dataset
    def __getitem__(self, item):
        return self.data[item]


# The DataLoader will use a new fetcher to make sure the `notify` method will be called.
loader = DataLoader(CachedDataset(), num_workers=4)
for batch in loader:
    ...

Dataset Builder

Reference

TODO

See also

About

A pytorch DataLoader used loky-backend multiprocess context and a dataset added cache support

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published