-
Notifications
You must be signed in to change notification settings - Fork 70
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inline operations while calling a torch class #10
Comments
Ok. That's a pretty comprehensive bug-report. Thanks! :-) question: which version of python are you using, ie 2.7, 3.4 or 3.5? |
I guess python2.7,rigth? because of this line:
(python 3 would be:
) |
You are right, Python 2.7 |
So, using python 2, I get the following output:
Leaving aside the output of |
Your output from 'write' is right, I get something looking like this: This is the value: This is the value: [ 0.00000000e+00 -1.58456325e+29 4.35138757e-27 -2.00048709e+00 As you can see, the values after setting the tensor from an inline operation are wrong, at least some of them. |
Ok. Some difference in our systems or installation. Let's compare systems and so on first... mine is:
You? |
(hmmm.... I'm actually using a different version of torch too actually... let me double-check on a different torch distro...) |
(using mainstream torch distro, same results as my earlier results above) |
Mine is quite different.
|
I'll try on an Ubuntu 14.04 in a while too |
ok. biggest diference here is: "Mac OS X". Why? Because on Mac, pytorch uses |
Ohh ok, I noticed that.... I'll let you know if I can reproduce the problem in the ubuntu machine. |
(the switch is here by the way, just in case you were curious: https://github.com/hughperkins/pytorch/blob/master/src/nnWrapper.jinja2.cpp#L56-L60
|
(on linux, can switch at build time like this:
) |
Nice, I'll try both builds, one with luajit and the other with lua. |
hmmm, on linux, with lua, results still ok. After building with config set to use lua, I run like this:
... which shows the correct results ... and then verify in
|
Your numpy is older than mine. Any reason for using an older numpy? I can downgrade my numpy if necessary. I have to go now. Will look again tomorrow... |
No reason for that numpy, I'll update mine and repeat the experiments. |
I tried on an Ubuntu Machine with numpy version 1.10 and works properly. |
Hmmm. That makes debugging challenging :-P. Can you double-check that you are indeed linking with
... and see if this changes anything? |
(you can also remove libluajit.[something] from ~/torch/install/lib too, and check stuff still runs. If it doesnt, you're probalby linking to libluajit, and that has memory issues on Mac
) |
Building with The pop operation from the Torch class is still not working as expected though. |
Yes. It was never intended that
Can you type
Yes, agreed. will look into this. Can you post the answer to |
Sorry for the late response, |
Oh, I never fixed the For the
Run it like:
|
well... for the memory thing, yeah it is a memory issue, but I'm not quite sure what a good solution would be. We can wipe the array to zeros by doing:
output:
so, we basically allocate a new numpy array of zeros, and that wipes the existiing array. probably the torch tensors should add a reference to the passed in numpy arrays somehow, so they dont get freed/reused. |
I guess the np array should be stored in the python torch tensor object somewhere probably, to hold a reference. However I'm not quite sure how to do this for now... What I've tried: In PyTorch.jinja2.pyx
This complained that nparray is not in tensor, so I modified Tensor.jinja2.pxd, to add this, as an
This then builds and runs (with a very agressive
However, seems not to affect the lifetime: the above test code still prints zeros. I tried also putting into the Tensor.jinja2.pyx:
... however this fails at runtime, with Thoughts? |
Can I leave you to look at what options we can use for incrementing the reference? I'm trying to get a paper on cltorch published out to arxiv (perhaps 'dumped' onto arxiv is a better term). And there is only one of me :-D |
After running the shell script I got regarding the memory issue when pulling from torch to python, this is interesting actually, allocating a new numpy array wipes the tensor... Thanks once more for the help and time! |
Note to self: so that I have the code if I get a moment to check this issue, it is here: https://gist.github.com/hughperkins/fdf8e27daec983e69767ef6bbaf1db9f Current output:
|
Note to self: relevant similar issue, and commit/solution:
|
Note to self: the draft fix from earlier, that doesnt work... : https://github.com/hughperkins/pytorch/compare/trying-fix-inline-issue?expand=1 |
Unit test (failing) added in c76846d |
Added a bunch of debugging in ea08d23 Output: https://gist.github.com/hughperkins/98243072eafe24b8b9f919a306fe9d07 |
Seems like the numpy array inside the PyTorchTensor is having its lifetime linked to that of the parent PyTorchTensor correctly, but the PyTorchTensor itself is being destroyed, ie, not being incref'd based on it being owned by the underlying lua object now. |
So, it looks like:
(via
)
Obviously simply storing an integer in the lua registry wont in itself lock the lifetime of the PyTorchTensor python object to any lua objects... |
This is kind of non-trivial to fix (need to start thinking about gc and stuff probably, if start holding a bunch of python-side references to python objects that have been passed lua-side). So, since most stuff works without it for now, I think I shall leave it for now. |
Hi,
First of all thanks for pytorch, is an awesome work and really useful!
I just found that when doing inline operations (i.e: on a numpy array) and passing that to a method on Torch to lets say store it as a class value something goes wrong. I guess this has to do with memory reference while passing data from one side to the other.
I made a simple example code reproducing the issue.
I have a python main.py file that imports, creates and uses a Torch class.
The torch class has an 'init' method, which does nothing, a 'set' method which sets a tensor of the class to a given numpy array. And a write method that prints the mentioned tensor.
The main.py file only creates the class and set/prints the torch tensor inside the torch class. First by creating a numpy array and modifying and storing the value in a separate variable and after modifying the numpy array inline while passing to torch.
The problem is, when passing the numpy array created inline in the call, the values the Torch class stores are not right.
The way this doesn't happen is:
Running the example will be way more clarifying that my explanation.
Not sure if this is supposed to work this way, I just though that in case you are not aware this might help you and others.
I attach the two mentioned pieces of code in a zip file.
code.zip
The text was updated successfully, but these errors were encountered: