Dataset class usage in jupyter lab #20

manerotoni · 2024-02-24T16:21:57Z

Hi,
I spotted an issue in the notebooks when using python 3.11 (and may be other versions) on my Windows machine.
Somehow the Dataset Class (CustomDataset) when defined in the notebook (e.g. torch_infection_classifier.ipynb) issues an error upon the

x, y = next(iter(train_loader))

The error on the command windows is

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\apoliti\Miniconda3\envs\dl-for-micro-2\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\apoliti\Miniconda3\envs\dl-for-micro-2\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'CustomDataset' on <module '__main__' (built-in)>

A few google search indicate that the somehow the declaration in the notebook is not understood and there is some incompatibilities of multiprocessing and interactive mode (jupyter notebooks)
See https://discuss.pytorch.org/t/issue-with-pretrained-resnet-fixed/109637/4 and https://stackoverflow.com/questions/73763151/multiprocessing-error-self-reduction-pickle-loadfrom-parent-attributeerror

The text was updated successfully, but these errors were encountered:

manerotoni · 2024-02-24T16:23:37Z

Fortunately it works with 3.8 as the version of python used on BAND. I tried the class definition in a file and then it works nicely.

May be after the course we should make sure to fix this. I am not sure if this is a bug from multiprocessing module or a feature.

manerotoni · 2024-02-24T17:06:16Z

Similar problem with 3.8.10 (identical python version as on BAND).

It is all a little strange as there must be some Windows/package issues. I have been using very similar code in other projects and never had this error.

I move my changes on BAND now to wrap the course work

constantinpape · 2024-02-25T19:29:53Z

Good to know that this issue exists on windows.

This is most likely because multiprocessing works differently on Windows. We probably need some workaround that imports the dataset from utils.py for that case.

constantinpape · 2024-02-25T19:30:36Z

P.s I can't really fix this, I don't have access to a Windows Machine. We can see how to address this after the course.

manerotoni · 2024-03-06T12:14:16Z

I just add this link https://bobswinkels.com/posts/multiprocessing-python-windows-jupyter/
Basically the best option would be to outsource the function in an extra python file to be imported.
It is a little unfortunate that the error only appears on the command windows and not within the jupyter notebook

manerotoni · 2024-03-21T16:50:30Z

I found out why in my case the loader did not create problem. If you set num_workers = 0 (use main thread only, which is the default) than it does not complain that does not find the Dataset class.

For the sake of inter OS usability I would remove this option. For the course it is not crucial.

for the moment just the display of images with num_workers >0 is really slow. Not sure why, may be because it needs to start all threads. In fact the time increases with more threads. Not sure if the training is faster, when the threads are all running on the back.

constantinpape · 2024-03-21T18:14:15Z

@manerotoni : great that you figured this out. Let's set the number of workers to 0. This is indeed not crucial at all here.
(It can make a difference for more complex pipelines but I don't expect a big difference here at all.)

Do you want to create a PR to fix this?

for the moment just the display of images with num_workers >0 is really slow. Not sure why, may be because it needs to start all threads.

Yes, this is slower because all threads need to start.

manerotoni · 2024-03-22T11:42:49Z

I will do a PR

manerotoni added the bug Something isn't working label Feb 24, 2024

manerotoni assigned manerotoni and constantinpape Feb 24, 2024

manerotoni changed the title ~~Dataset class in python 3.11~~ Dataset class usage in jupyter lab Mar 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dataset class usage in jupyter lab #20

Dataset class usage in jupyter lab #20

manerotoni commented Feb 24, 2024

manerotoni commented Feb 24, 2024

manerotoni commented Feb 24, 2024

constantinpape commented Feb 25, 2024

constantinpape commented Feb 25, 2024

manerotoni commented Mar 6, 2024

manerotoni commented Mar 21, 2024 •

edited

Loading

constantinpape commented Mar 21, 2024

manerotoni commented Mar 22, 2024

Dataset class usage in jupyter lab #20

Dataset class usage in jupyter lab #20

Comments

manerotoni commented Feb 24, 2024

manerotoni commented Feb 24, 2024

manerotoni commented Feb 24, 2024

constantinpape commented Feb 25, 2024

constantinpape commented Feb 25, 2024

manerotoni commented Mar 6, 2024

manerotoni commented Mar 21, 2024 • edited Loading

constantinpape commented Mar 21, 2024

manerotoni commented Mar 22, 2024

manerotoni commented Mar 21, 2024 •

edited

Loading