Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pycuda mem_get_ipc_handle() error on Windows 10 #32

Open
goffredogiordano opened this issue Apr 7, 2017 · 6 comments
Open

pycuda mem_get_ipc_handle() error on Windows 10 #32

goffredogiordano opened this issue Apr 7, 2017 · 6 comments

Comments

@goffredogiordano
Copy link

I would like to run the theano_alexnet training from this useful github project.
My computer is a Windows 10 native-machine 64 bit Intel core i7. I use WinPython-64bit-3.4.4.4QT5 from WinPython 3.4.4.3, Visual Studio 2015 Community Edition Update 3, CUDA 8.0.44 (64-bit), cuDNN v5.1 (August 10, 2016) for CUDA 8.0, Git source control based on MinGW compiler and OpenBLAS 0.2.14. As fundamental python libraries Theano is 0.9.0beta1 version, Scipy is 0.19.0, Keras 1.2.2, Lasagne 0.2.dev1, Numpy 1.11.1, hickle 2.0.4, h5py 2.6.0, pycuda, pylearn2, zeromq. I received help from theano_group on google. I have successfully pre-processed a subset of the ImageNet data using the script generate_data.sh, which generated all of the expected folders and files. The subset of data that are used are compressed into 195 .hkl (hickle) files for validation (each file is about 50 Mb) in the folder Validation_Alexnet_b256_b_256.0 and 0000_0.hkl, 0000_1.hkl,...0194_0.hkl,0194_1.hkl files (each file is about 25 Mb) in the folder Validation_Alexnet_b256_b_128.0. In the training folder there are no files. When I'm trying to run the train.py it releases me these errors:

C:\deep_learning\alexnet>python train.py
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

Using gpu device 0: GeForce GT 740M (CNMeM is enabled with initial size: 80.0% of memory, cuDNN 5105)
WARNING (theano.sandbox.cuda): The cuda backend is deprecated and will be removed in the next release (v0.10). Please switch to the gpuarray backend. You can get more information about how to switch at this URL:
https://github.com/Theano/Theano/wiki/Converting-to-the-new-gpu-back-end%28gpuarray%29

... building the model

conv (cudnn) layer with shape_in: (3, 227, 227, 256)
Process Process-1:
Traceback (most recent call last):
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\process.py", line 254, in _bootstrap
self.run()
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\multiprocessing\process.py", line 93, in run
self._target(*self._args, **self._kwargs)
File "C:\deep_learning\alexnet\train.py", line 52, in train_net
model = AlexNet(config)
File "C:\deep_learning\alexnet\alex_net.py", line 62, in init
lib_conv=lib_conv,
File "./lib\layers.py", line 168, in init
dnn.dnn_conv(img=input_shuffled[:, :self.channel / 2,
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\var.py", line 540, in getitem
return theano.tensor.subtensor.advanced_subtensor(self, *args)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\gof\op.py", line 604, in call
node = self.make_node(*inputs, **kwargs)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\subtensor.py", line 2140, in make_node
index = tuple(map(as_index_variable, index))
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\subtensor.py", line 2081, in as_index_variable
return make_slice(idx)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\gof\op.py", line 604, in call
node = self.make_node(*inputs, **kwargs)
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\type_other.py", line 39, in make_node
list(map(as_int_none_variable, inp)),
File "C:\deep_learning\WinPython-64bit-3.4.4.4Qt5\python-3.4.4.amd64\lib\site-packages\theano\tensor\type_other.py", line 20, in as_int_none_variable
raise TypeError('index must be integers')
TypeError: index must be integers


PyCUDA ERROR: The context stack was not empty upon module cleanup.

A context was still active when the context stack was being
cleaned up. At this point in our execution, CUDA may already
have been deinitialized, so there is no way we can finish
cleanly. The program will be aborted now.
Use Context.pop() to avoid this problem.

Someone could help me to know what it is wrong?
Thanks in advance for expert help and your time.
Greetings,
Goffredo

@hma02
Copy link
Contributor

hma02 commented Apr 7, 2017

@goffredogiordano

The main error here is a TypeError rather than PyCUDA Error. The PyCUDA Error shows when code exits without proper context clean up. Not sure why the title of this issue is about "mem_get_ipc_handle" though.

The TypeError seems more of a Windows-Theano related issue, as the code does not show this error on linux.

I will ask @nouiz and @abergeron about this and setup an issue there.

@abergeron
Copy link

It's not possible to use cuda ipc handles in windows, this only works on linux. It's a limitation that comes from CUDA so we can't do anything about it.

@nouiz
Copy link

nouiz commented Apr 7, 2017 via email

@goffredogiordano
Copy link
Author

Thank you to everyone's helping me. I would like to check if someone resolved this problem @nouiz , but if it is a problem related to ipc handles in windows as suggested from @abergeron, I think there is actually no solution.
Thanks.

If someone should find some solutions, please let me known

@nouiz
Copy link

nouiz commented Apr 8, 2017 via email

@goffredogiordano
Copy link
Author

I have resolved my issues thanks also to Theano google group.
In layers.py I have modified line 56 with center_margin = int((image_shape[2] - cropsize) / 2) because I used Python 3.x version. Then lines 104 to 107

self.filter_shape[0] = self.filter_shape[0] // 2
self.filter_shape[3] = self.filter_shape[3] // 2
self.image_shape[0] = self.image_shape[0] // 2
self.image_shape[3] = self.image_shape[3] // 2

and line 125 with input[:self.channel // 2, :, :, :])
and line 133 input[self.channel // 2:, :, :, :])
also line 168 dnn.dnn_conv(img=input_shuffled[:, :int(self.channel / 2),
and line 179 dnn.dnn_conv(img=input_shuffled[:, self.channel // 2:,
Then because I had some problems with TypeError regard with dtype constructor (I referred to the http://deeplearning.net/software/theano/library/tensor/basic.html) I resolved the other errors in alex_net.py in line 26 modifying y = T.ivector('y')

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants