Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Memory Issue #10

Open
MatteoTomassetti opened this issue Feb 21, 2017 · 7 comments
Open

Memory Issue #10

MatteoTomassetti opened this issue Feb 21, 2017 · 7 comments

Comments

@MatteoTomassetti
Copy link

Hi,
Thank you for sharing your code publicly, but I'm having some memory issues when running it on AWS.

I'm spinning a g2.2xlarge instance on AWS and try to run your code for only the first 1000 lines of news.2011.en.shuffled.

Have you ever got an error message like this one (see below)? And if so, is there a way to change the parameters to avoid or maybe should I select another type of AWS instance?

Just for completeness these are the parameters I was trying to test

NUMBER_OF_ITERATIONS = 20000
EPOCHS_PER_ITERATION = 5
RNN = recurrent.LSTM
INPUT_LAYERS = 2
OUTPUT_LAYERS = 2
AMOUNT_OF_DROPOUT = 0.3
BATCH_SIZE = 500
HIDDEN_SIZE = 700
INITIALIZATION = "he_normal" # : Gaussian initialization scaled by fan_in (He et al., 2014)
MAX_INPUT_LEN = 40
MIN_INPUT_LEN = 3
INVERTED = True
AMOUNT_OF_NOISE = 0.2 / MAX_INPUT_LEN
NUMBER_OF_CHARS = 100 # 75

And this is the error that I'm getting

Iteration 1
Train on 3376 samples, validate on 376 samples
Epoch 1/5
Traceback (most recent call last):
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Cuda error: GpuElemwise node_m71c627ae87c918771aac75471af66509_0 Add: out of memory.
    n_blocks=30 threads_per_block=256
   Call: kernel_Add_node_m71c627ae87c918771aac75471af66509_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], local_dims[1], i0_data, local_str[0][0], local_str[0][1], i1_data, local_str[1][0], local_str[1][1], o0_data, local_ostr[0][0], local_ostr[0][1])


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "<stdin>", line 10, in main_news
  File "<stdin>", line 8, in iterate_training
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/models.py", line 672, in fit
    initial_epoch=initial_epoch)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 1196, in fit
    initial_epoch=initial_epoch)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/engine/training.py", line 891, in _fit_loop
    outs = f(ins_batch)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/keras/backend/theano_backend.py", line 959, in __call__
    return self.function(*inputs)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 898, in __call__
    storage_map=getattr(self.fn, 'storage_map', None))
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/gof/link.py", line 325, in raise_with_op
    reraise(exc_type, exc_value, exc_trace)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/six.py", line 685, in reraise
    raise value.with_traceback(tb)
  File "/home/ubuntu/anaconda3/lib/python3.5/site-packages/theano/compile/function_module.py", line 884, in __call__
    self.fn() if output_subset is None else\
RuntimeError: Cuda error: GpuElemwise node_m71c627ae87c918771aac75471af66509_0 Add: out of memory.
    n_blocks=30 threads_per_block=256
   Call: kernel_Add_node_m71c627ae87c918771aac75471af66509_0_Ccontiguous<<<n_blocks, threads_per_block>>>(numEls, local_dims[0], local_dims[1], i0_data, local_str[0][0], local_str[0][1], i1_data, local_str[1][0], local_str[1][1], o0_data, local_ostr[0][0], local_ostr[0][1])

Apply node that caused the error: GpuElemwise{add,no_inplace}(GpuDot22.0, GpuDimShuffle{x,0}.0)
Toposort index: 207
Inputs types: [CudaNdarrayType(float32, matrix), CudaNdarrayType(float32, row)]
Inputs shapes: [(20000, 700), (1, 700)]
Inputs strides: [(700, 1), (0, 1)]
Inputs values: ['not shown', 'not shown']
Outputs clients: [[GpuReshape{3}(GpuElemwise{add,no_inplace}.0, MakeVector{dtype='int64'}.0)]]

HINT: Re-running with most Theano optimization disabled could give you a back-trace of when this node was created. This can be done with by setting the Theano flag 'optimizer=fast_compile'. If that does not work, Theano optimizations can be disabled with 'optimizer=None'.
HINT: Use the Theano flag 'exception_verbosity=high' for a debugprint and storage map footprint of this apply node.
@parth126
Copy link

parth126 commented Feb 27, 2017

@MatteoTomassetti This is an out of memory issue
I believe reducing the batch size to 25-50 should solve it.

@MajorTal
Copy link
Owner

MajorTal commented Feb 28, 2017 via email

@MatteoTomassetti
Copy link
Author

MatteoTomassetti commented Feb 28, 2017

thanks @parth126 and @MajorTal! I was wondering, based on your experience, what's the average running time I should expect for one epoch to train the entire news.2011.en.shuffled dataset?
My problem is that I ran the code for just one epoch and I extrapolated the time it should take to reach 20,000 iterations I'm left with years of training!

@MajorTal
Copy link
Owner

I just moved to the news.2013.en.shuffled (much larger) - I'll update the code to reflect that.
It is so large that I split the epochs to mini-epochs that cover about 1% of the data (because I save the model after each epoch and because it is taking so long...).
These mini-epochs are configured to run about 30 minutes.
After about 2 hours you already see meaningful results (about 85% accuracy).
I used this AMI to train the system: https://aws.amazon.com/marketplace/pp/B06VSPXKDX
On an AWS EC2 p2.xlarge instance (currently at $0.9 per Hour)

@FMFluke
Copy link

FMFluke commented Nov 16, 2017

@MatteoTomassetti @MajorTal I was running this exact code with the default news.2013.en.shuffled dataset (changed almost nothing except to update Keras API calling to newer version and adapt the code to be python 3 compatible). After almost 2 days of training (on reasonable speed, was using Azure with K80) the accuracy is stuck at about 47-48%. I also noticed that while it had been able to fix many spelling mistakes, it always repeat the last character or just add trailing periods to the prediction and therefore marked as wrong. Do you have any idea what could be happening? I have been looking around and could not find good answer.

@MajorTal
Copy link
Owner

MajorTal commented Nov 17, 2017 via email

@FMFluke
Copy link

FMFluke commented Nov 18, 2017

Ok, but then how did you make the model know to exclude those periods when calculating the accuracy? How exactly did you strip them off?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants