Sample notebooks are available in the folder sample notebooks.
The loading speed of hard drives is well below the processing speed of modern GPUs. This is problematic for machine learning algorithms, specially for medical imaging datasets with large instances.
For example, consider the following case: we have a dataset containing 500 whole-slide-images (WSIs) each of which are approximately 100000x100000. We want the dataloader to repeatedly do the following steps:
- randomly select one of those huge images (i.e., WSIs).
- crop and return a random 224x224 patch from the huge image.
PyDmed solves this issue.
The following two classes are pretty much the whole API of PyDmed.
BigChunk
: a relatively big chunk from a patient. It can be, e.g., a 5000x5000 patch from a huge whole-slide-image.SmallChunk
: a small data chunk collected from a big chunk. It can be, e.g., a 224x224 patch cropped from a 5000x5000 big chunk. In the below figure,SmallChunk
s are the blue small patches.
The below figure illustrates the idea of PyDmed.
As long as some BigChunk
s are loaded into RAM, we can quickly collect some SmallChunk
s and pass them to GPU(s).
As illustrated below, BigChunk
s are loaded/replaced from disk time to time.
We regularly check for possible issues and update pydmed. Please check out "Issues" if you faced any problems running pydmed. If you couldn't find your issue there, please raise the issue so we can improve pydmed.
PyDmed is now available as a pyton local package. To use PyDmed one needs to have the folder called PyDmed/
(by, e.g., cloning the repo).
Afterwards, the folder has to be added to sys.path
as done in sample notebook 1
or in the sample colab notebook.
To cide pydmed, please cite the following paper
@inproceedings{akbarnejad2021deep, title={Deep Fisher Vector Coding For Whole Slide Image Classification}, author={Akbarnejad, Amir and Ray, Nilanjan and Bigras, Gilbert}, booktitle={2021 IEEE 18th International Symposium on Biomedical Imaging (ISBI)}, pages={243--246}, year={2021}, organization={IEEE} }