Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dataset class usage in jupyter lab #20

Open
manerotoni opened this issue Feb 24, 2024 · 8 comments
Open

Dataset class usage in jupyter lab #20

manerotoni opened this issue Feb 24, 2024 · 8 comments
Assignees
Labels
bug Something isn't working

Comments

@manerotoni
Copy link
Collaborator

Hi,
I spotted an issue in the notebooks when using python 3.11 (and may be other versions) on my Windows machine.
Somehow the Dataset Class (CustomDataset) when defined in the notebook (e.g. torch_infection_classifier.ipynb) issues an error upon the

x, y = next(iter(train_loader))

The error on the command windows is

Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "C:\Users\apoliti\Miniconda3\envs\dl-for-micro-2\Lib\multiprocessing\spawn.py", line 122, in spawn_main
    exitcode = _main(fd, parent_sentinel)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\apoliti\Miniconda3\envs\dl-for-micro-2\Lib\multiprocessing\spawn.py", line 132, in _main
    self = reduction.pickle.load(from_parent)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: Can't get attribute 'CustomDataset' on <module '__main__' (built-in)>

A few google search indicate that the somehow the declaration in the notebook is not understood and there is some incompatibilities of multiprocessing and interactive mode (jupyter notebooks)
See https://discuss.pytorch.org/t/issue-with-pretrained-resnet-fixed/109637/4 and https://stackoverflow.com/questions/73763151/multiprocessing-error-self-reduction-pickle-loadfrom-parent-attributeerror

@manerotoni manerotoni added the bug Something isn't working label Feb 24, 2024
@manerotoni
Copy link
Collaborator Author

Fortunately it works with 3.8 as the version of python used on BAND. I tried the class definition in a file and then it works nicely.

May be after the course we should make sure to fix this. I am not sure if this is a bug from multiprocessing module or a feature.

@manerotoni
Copy link
Collaborator Author

Similar problem with 3.8.10 (identical python version as on BAND).

It is all a little strange as there must be some Windows/package issues. I have been using very similar code in other projects and never had this error.

I move my changes on BAND now to wrap the course work

@constantinpape
Copy link
Contributor

Good to know that this issue exists on windows.

This is most likely because multiprocessing works differently on Windows. We probably need some workaround that imports the dataset from utils.py for that case.

@constantinpape
Copy link
Contributor

P.s I can't really fix this, I don't have access to a Windows Machine. We can see how to address this after the course.

@manerotoni manerotoni changed the title Dataset class in python 3.11 Dataset class usage in jupyter lab Mar 6, 2024
@manerotoni
Copy link
Collaborator Author

I just add this link https://bobswinkels.com/posts/multiprocessing-python-windows-jupyter/
Basically the best option would be to outsource the function in an extra python file to be imported.
It is a little unfortunate that the error only appears on the command windows and not within the jupyter notebook

@manerotoni
Copy link
Collaborator Author

manerotoni commented Mar 21, 2024

I found out why in my case the loader did not create problem. If you set num_workers = 0 (use main thread only, which is the default) than it does not complain that does not find the Dataset class.

For the sake of inter OS usability I would remove this option. For the course it is not crucial.

for the moment just the display of images with num_workers >0 is really slow. Not sure why, may be because it needs to start all threads. In fact the time increases with more threads. Not sure if the training is faster, when the threads are all running on the back.

@constantinpape
Copy link
Contributor

@manerotoni : great that you figured this out. Let's set the number of workers to 0. This is indeed not crucial at all here.
(It can make a difference for more complex pipelines but I don't expect a big difference here at all.)

Do you want to create a PR to fix this?

for the moment just the display of images with num_workers >0 is really slow. Not sure why, may be because it needs to start all threads.

Yes, this is slower because all threads need to start.

@manerotoni
Copy link
Collaborator Author

I will do a PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants