TorchIO + PyTorch Lightning when using a Queue. #602
Replies: 3 comments 4 replies
-
Hello this is an important but difficult question. A few hints though: This may explain a few more full volume in memory, although in your example there is no dead time to simulate GPU use, so the "use" of the data from the queue is almost instantaneous I just realize, there is a more plausible explaination : The question is difficult, because, when you then add some transformation, there will be more memory need, and how much will depend on the transformation |
Beta Was this translation helpful? Give feedback.
-
Not sure also if it is the way to go, but I test an other way to monitor memory usage; import resource
import time
class DummyModule(pl.LightningModule):
def configure_optimizers(self):
pass
def training_step(self, *args, **kwargs):
#pdb.set_trace() # Use inspect_mem() here.
time.sleep(1)
main_memory = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss /1000
child_memory = resource.getrusage(resource.RUSAGE_CHILDREN).ru_maxrss /1000
print(f'max memory peak: {main_memory + child_memory} MB') Now with how much memory is one subject ?:
When I run my version I see that it start at 1062 MB and grow with iterations, I test with 50 epoch and see a max of 21000 MB so ~ twice the expect size ... This assumption that one worker need only one full subject in memory, may be wrong, not sure what exactly happend but I see differences when changing the samples_per_volume more sample_per_volume, need a little bit more memory I do not see much difference when I play with 1 second versus no sleep in the training ... |
Beta Was this translation helpful? Give feedback.
-
Hey @fepegar! Sorry for coming back to answer this so late! This post summarizes all the discussion I previously mentioned. I created a small snippet for you to try out (I know you are busy with your PhD, no worries!). from multiprocessing import Manager
import matplotlib.pyplot as plt
import numpy as np
import psutil
import torchio as tio
from torch.utils.data import DataLoader
from tqdm import tqdm
class SubjectsDataset(tio.SubjectsDataset):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Only changes.
manager = Manager()
self._subjects = manager.list(self._subjects)
if __name__ == "__main__":
n_subjects = 10000000
subjects = [tio.Subject(image=tio.ScalarImage(path="")) for _ in range(n_subjects)]
data = tio.SubjectsDataset(subjects, load_getitem=False)
mem_used = [psutil.virtual_memory().used / 1024 ** 3]
dl = DataLoader(data, batch_size=50, shuffle=True, pin_memory=False, num_workers=8)
for i, item in tqdm(enumerate(dl), total=n_subjects / dl.batch_size):
if i % 1000 == 0:
mem = psutil.virtual_memory()
mem_used.append(mem.used / 1024 ** 3)
plt.plot(np.array(mem_used))
plt.savefig("memory_used.png") To try out the suggestion you can simply change Note that I was able to reproduce on a Linux VM but not on a Macbook. This could happen because Windows and MacOS handle multiprocessing with Other thing to note is the number of subjects is a very big number; probably no existing dataset on medical imaging is that big. I haven't tried out if the same would happen with less subjects but more images, custom reader, i.e. more complex subjects instead of the dummy ones from the snippet. |
Beta Was this translation helpful? Give feedback.
-
Hi, I wanted to ask about what is the expected behavior of the
Queue
when it is being used on pytorch lightning. I am trying to debug someworker killed...
errors which I suspect are from running out of memory. I created a simple script (not entirely sure it is the right way) to evaluate the tensors that are in memory.From what I understand of the
Queue
behavior theCounter
should print max 40 tensors of shape [1, 128, 128, 128] (corresponding to themax_length
) and max 8 tensors of shape [1, 181, 217, 181] (corresponding to thenum_workers
and the shape of the Colin27 images).When I run the
inspect_mem
on the trace I always get 108 tensors of shape [1, 128, 128, 128] and a variable number of tensors of shape [1, 181, 217, 181], sometimes 0, 15, 18 or even 21.Do these numbers make sense? Is there something on PyTorch Lightning + TorchIO that makes the Queue keep more data in memory?
Beta Was this translation helpful? Give feedback.
All reactions