GPU memory usage when doing beam search #314

tanyuqian · 2020-07-06T03:35:11Z

A BART model (https://arxiv.org/pdf/1910.13461.pdf) is implemented here: https://github.com/tanyuqian/texar-pytorch/tree/master/examples/bart

It has passed the test of text classification (MNLI) and summarization (CNN/DM) with greedy decoding, but it fails to run CNN/DM with beam search on a single GTX 1080Ti because of GPU memory, even when batch_size=1, beam_width=2, max_decoding_length=140.

A script to show this issue is here: https://github.com/tanyuqian/texar-pytorch/blob/master/examples/bart/bart_cnn.py (run this code after downloading CNN/DM data following README)

Note that in this fork, two more hyperparameters are added in TransformerDecoder ('normalize_before' and 'final_layer_norm'): https://github.com/tanyuqian/texar-pytorch/blob/master/texar/torch/modules/decoders/transformer_decoders.py#L290

gpengzhi added question Further information is requested topic: modules Issue about built-in Texar modules labels Oct 15, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GPU memory usage when doing beam search #314

GPU memory usage when doing beam search #314

tanyuqian commented Jul 6, 2020

GPU memory usage when doing beam search #314

GPU memory usage when doing beam search #314

Comments

tanyuqian commented Jul 6, 2020