nnUNetv2 inference on AWS Lambda - RuntimeError: Background workers died. #2311
Unanswered
SohaSpecifix
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Hi All,
I have trained my nnUNetv2 model on my system, and I use the following code for inference:
predictor = nnUNetPredictor(
tile_step_size=1,
use_gaussian=True,
use_mirroring=False,
perform_everything_on_device=True, #device replaced with gpu when tested on cloud
device=torch.device('cuda', 0),
verbose=True,
verbose_preprocessing=True,
allow_tqdm=True
)
# initializes the network architecture, loads the checkpoint
predictor.initialize_from_trained_model_folder(
join(nnUNet_trained_models, 'Dataset015_ulna/nnUNetTrainer__nnUNetPlans__3d_fullres'),
use_folds=(0,),
checkpoint_name='checkpoint_final.pth',
)
# variant 1: give input and output folders
predictor.predict_from_files(
"data/nifti/input",
"data/nifti/output",
save_probabilities=False, overwrite=True,
num_processes_preprocessing=1, num_processes_segmentation_export=1,
folder_with_segs_from_prev_stage=None, num_parts=1, part_id=0)
This code works well on my system with GeForce RTX 3080 Ti with 16 GB VRAM and 32 GB RAM (I manually adjusted my swap size to max 100 GB so in total 132 GB), but when I deploy it on Amazon lambda servers, ever the largest I had access to (p3.16x large with 488GB RAM and 128 GB VRAM), I still get this error:
RuntimeError: Background workers died. Look for the error message further up! If there is none then your RAM was full and the worker was killed by the OS. Use fewer workers or get more RAM in that case!
Any suggestion is appreciated!
(Would this get resolved if I would train my model with nnUNet (V1) instead of v2?)
Beta Was this translation helpful? Give feedback.
All reactions