unsupervised learning -tsda #894

ReySadeghi · 2021-04-25T20:20:30Z

Hi,
I used TSDA method to pretrain a BERT model on a corpus of sentences and I got this error:

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

and then used CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM] to trace the error and got this:

RuntimeError: CUDA error: device-side assert triggered

any help?

The text was updated successfully, but these errors were encountered:

nreimers · 2021-04-26T06:32:48Z

Looks like some issue with CUDA. Don't know how to fix it

kwang2049 · 2021-04-26T21:00:13Z

Hi ReySadeghi, could you please run on CPU and see whether there is still a problem?

ReySadeghi · 2021-04-27T07:53:24Z

Hi ReySadeghi, could you please run on CPU and see whether there is still a problem?

Hi, in one case I tried and Got this error:
indexerror: list index out of range python

and in another cases that I tried, RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED
is still remain.

kwang2049 · 2021-04-27T19:21:17Z

Could you please paste here the whole training script and also the whole log?

ReySadeghi · 2021-04-28T09:25:59Z

Could you please paste here the whole training script and also the whole log?

training script:

from sentence_transformers import SentenceTransformer, LoggingHandler
from sentence_transformers import models, util, datasets, evaluation, losses
from torch.utils.data import DataLoader

import nltk

vocab=[]
with open('vocab30k.txt', mode='r',encoding="utf8",errors='ignore') as file2:
for line2 in file2:
line2=line2.split('\n')[0]
line2=line2.strip()
vocab.append(line2)

vocab=vocab[:10000]

model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name,max_seq_length=250)

word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[]
with open('fa5M_shuffeled.txt', mode='r',encoding="utf8",errors='ignore') as file2:
for line2 in file2:
line2=line2.split('\n')[0]
line2=line2.strip()
train_sentences.append(line2)

train_sentences=train_sentences[:2000000]

train_dataset = datasets.DenoisingAutoEncoderDataset(train_sentences)

train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)

train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=model_name, tie_encoder_decoder=True)

model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=3,
weight_decay=0,
scheduler='constantlr',
optimizer_params={'lr': 3e-5},
show_progress_bar=True
)

..................................................
my coda version: 11.3

the Error:

lib/python3.7/site-packages/pandas/compat/init.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
warnings.warn(msg)
Some weights of the model checkpoint at HooshvareLab/bert-fa-base-uncased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']

This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
Some weights of BertLMHeadModel were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['bert.encoder.layer.0.crossattention.self.query.weight', 'bert.encoder.layer.0.crossattention.self.query.bias', 'bert.encoder.layer.0.crossattention.self.key.weight', 'bert.encoder.layer.0.crossattention.self.key.bias', 'bert.encoder.layer.0.crossattention.self.value.weight', 'bert.encoder.layer.0.crossattention.self.value.bias', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.0.crossattention.output.dense.bias', 'bert.encoder.layer.0.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.1.crossattention.self.query.weight', 'bert.encoder.layer.1.crossattention.self.query.bias', 'bert.encoder.layer.1.crossattention.self.key.weight', 'bert.encoder.layer.1.crossattention.self.key.bias', 'bert.encoder.layer.1.crossattention.self.value.weight', 'bert.encoder.layer.1.crossattention.self.value.bias', 'bert.encoder.layer.1.crossattention.output.dense.weight', 'bert.encoder.layer.1.crossattention.output.dense.bias', 'bert.encoder.layer.1.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.1.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.2.crossattention.self.query.weight', 'bert.encoder.layer.2.crossattention.self.query.bias', 'bert.encoder.layer.2.crossattention.self.key.weight', 'bert.encoder.layer.2.crossattention.self.key.bias', 'bert.encoder.layer.2.crossattention.self.value.weight', 'bert.encoder.layer.2.crossattention.self.value.bias', 'bert.encoder.layer.2.crossattention.output.dense.weight', 'bert.encoder.layer.2.crossattention.output.dense.bias', 'bert.encoder.layer.2.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.3.crossattention.self.query.weight', 'bert.encoder.layer.3.crossattention.self.query.bias', 'bert.encoder.layer.3.crossattention.self.key.weight', 'bert.encoder.layer.3.crossattention.self.key.bias', 'bert.encoder.layer.3.crossattention.self.value.weight', 'bert.encoder.layer.3.crossattention.self.value.bias', 'bert.encoder.layer.3.crossattention.output.dense.weight', 'bert.encoder.layer.3.crossattention.output.dense.bias', 'bert.encoder.layer.3.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.3.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.self.query.weight', 'bert.encoder.layer.4.crossattention.self.query.bias', 'bert.encoder.layer.4.crossattention.self.key.weight', 'bert.encoder.layer.4.crossattention.self.key.bias', 'bert.encoder.layer.4.crossattention.self.value.weight', 'bert.encoder.layer.4.crossattention.self.value.bias', 'bert.encoder.layer.4.crossattention.output.dense.weight', 'bert.encoder.layer.4.crossattention.output.dense.bias', 'bert.encoder.layer.4.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.4.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.5.crossattention.self.query.weight', 'bert.encoder.layer.5.crossattention.self.query.bias', 'bert.encoder.layer.5.crossattention.self.key.weight', 'bert.encoder.layer.5.crossattention.self.key.bias', 'bert.encoder.layer.5.crossattention.self.value.weight', 'bert.encoder.layer.5.crossattention.self.value.bias', 'bert.encoder.layer.5.crossattention.output.dense.weight', 'bert.encoder.layer.5.crossattention.output.dense.bias', 'bert.encoder.layer.5.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.5.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.query.weight', 'bert.encoder.layer.6.crossattention.self.query.bias', 'bert.encoder.layer.6.crossattention.self.key.weight', 'bert.encoder.layer.6.crossattention.self.key.bias', 'bert.encoder.layer.6.crossattention.self.value.weight', 'bert.encoder.layer.6.crossattention.self.value.bias', 'bert.encoder.layer.6.crossattention.output.dense.weight', 'bert.encoder.layer.6.crossattention.output.dense.bias', 'bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.7.crossattention.self.query.weight', 'bert.encoder.layer.7.crossattention.self.query.bias', 'bert.encoder.layer.7.crossattention.self.key.weight', 'bert.encoder.layer.7.crossattention.self.key.bias', 'bert.encoder.layer.7.crossattention.self.value.weight', 'bert.encoder.layer.7.crossattention.self.value.bias', 'bert.encoder.layer.7.crossattention.output.dense.weight', 'bert.encoder.layer.7.crossattention.output.dense.bias', 'bert.encoder.layer.7.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.7.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.8.crossattention.self.query.weight', 'bert.encoder.layer.8.crossattention.self.query.bias', 'bert.encoder.layer.8.crossattention.self.key.weight', 'bert.encoder.layer.8.crossattention.self.key.bias', 'bert.encoder.layer.8.crossattention.self.value.weight', 'bert.encoder.layer.8.crossattention.self.value.bias', 'bert.encoder.layer.8.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.output.dense.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.9.crossattention.self.query.weight', 'bert.encoder.layer.9.crossattention.self.query.bias', 'bert.encoder.layer.9.crossattention.self.key.weight', 'bert.encoder.layer.9.crossattention.self.key.bias', 'bert.encoder.layer.9.crossattention.self.value.weight', 'bert.encoder.layer.9.crossattention.self.value.bias', 'bert.encoder.layer.9.crossattention.output.dense.weight', 'bert.encoder.layer.9.crossattention.output.dense.bias', 'bert.encoder.layer.9.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.9.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.10.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.self.query.bias', 'bert.encoder.layer.10.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.bias', 'bert.encoder.layer.10.crossattention.self.value.weight', 'bert.encoder.layer.10.crossattention.self.value.bias', 'bert.encoder.layer.10.crossattention.output.dense.weight', 'bert.encoder.layer.10.crossattention.output.dense.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.11.crossattention.self.query.weight', 'bert.encoder.layer.11.crossattention.self.query.bias', 'bert.encoder.layer.11.crossattention.self.key.weight', 'bert.encoder.layer.11.crossattention.self.key.bias', 'bert.encoder.layer.11.crossattention.self.value.weight', 'bert.encoder.layer.11.crossattention.self.value.bias', 'bert.encoder.layer.11.crossattention.output.dense.weight', 'bert.encoder.layer.11.crossattention.output.dense.bias', 'bert.encoder.layer.11.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.11.crossattention.output.LayerNorm.bias']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
The following encoder weights were not tied to the decoder ['bert/pooler']
Iteration: 0%| | 0/500000 [00:00<?, ?it/s]
Epoch: 0%| | 0/3 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 969, in forward
past_key_values_length=past_key_values_length,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 209, in forward
embeddings = self.dropout(embeddings)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/dropout.py", line 58, in forward
return F.dropout(input, self.p, self.training, self.inplace)
File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 973, in dropout
else _VF.dropout(input, p, training))
RuntimeError: CUDA error: device-side assert triggered

nreimers · 2021-04-28T11:25:53Z

Does it work when you use bert-base-uncased?

Also check that you have a recent version of Pytorch and transformers

ReySadeghi · 2021-04-28T12:44:16Z

Does it work when you use bert-base-uncased?

Also check that you have a recent version of Pytorch and transformers

I edited it, actually the model name is 'HooshvareLab/bert-fa-base-uncased'.

kwang2049 · 2021-04-29T06:56:15Z

Thanks for reporting this issue!
We have located the bug: When one adds tokens to the encoder's lookup table, the _tie_encoder_decoder_weights function will tie the weights between encoder&decoder and thus make the encoder's lookup table back to the original one (since the decoder is initialized by the original checkpoint). We have found the solution and will fix it soon. The future version will initialize the decoder from encoder.config._name_or_path if tie_encoder_decoder=True and will contain more checking.

ReySadeghi · 2021-05-02T08:08:57Z

Thanks for reporting this issue!
We have located the bug: When one adds tokens to the encoder's lookup table, the _tie_encoder_decoder_weights function will tie the weights between encoder&decoder and thus make the encoder's lookup table back to the original one (since the decoder is initialized by the original checkpoint). We have found the solution and will fix it soon. The future version will initialize the decoder from encoder.config._name_or_path if tie_encoder_decoder=True and will contain more checking.

thanks. please inform me when the bug fixed.

kwang2049 · 2021-05-02T08:14:46Z

Thanks for reporting this issue!
We have located the bug: When one adds tokens to the encoder's lookup table, the _tie_encoder_decoder_weights function will tie the weights between encoder&decoder and thus make the encoder's lookup table back to the original one (since the decoder is initialized by the original checkpoint). We have found the solution and will fix it soon. The future version will initialize the decoder from encoder.config._name_or_path if tie_encoder_decoder=True and will contain more checking.

thanks. please inform me when the bug fixed.

Hi, ReySadeghi. The bug has been fixed since this commit 022b2dd . So please git clone the latest version and pip install -e . to try it:).

ReySadeghi · 2021-05-16T07:05:54Z

@kwang2049
Hi, I tried the latest version. running on CPU is ok but on GPU I got this Error:

Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 981, in forward
return_dict=return_dict,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 575, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 461, in forward
past_key_value=self_attn_past_key_value,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 394, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 253, in forward
mixed_query_layer = self.query(hidden_states)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
....................................................
and I tried "CUDA_LAUNCH_BLOCKING=1 python3.7 script.py" for more stack trace and got:

] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.

/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
Epoch: 0%| | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 969, in forward
past_key_values_length=past_key_values_length,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 204, in forward
embeddings = inputs_embeds + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered

kwang2049 · 2021-05-16T07:50:20Z

@kwang2049
Hi, I tried the latest version. running on CPU is ok but on GPU I got this Error:

Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 981, in forward
return_dict=return_dict,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 575, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 461, in forward
past_key_value=self_attn_past_key_value,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 394, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 253, in forward
mixed_query_layer = self.query(hidden_states)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
....................................................
and I tried "CUDA_LAUNCH_BLOCKING=1 python3.7 script.py" for more stack trace and got:

] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.

/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
Epoch: 0%| | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 969, in forward
past_key_values_length=past_key_values_length,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 204, in forward
embeddings = inputs_embeds + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered

Are you using the same script? Please try the code below:

from sentence_transformers import SentenceTransformer
from sentence_transformers import models, datasets, losses
from torch.utils.data import DataLoader


model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name, max_seq_length=250)

existing_word = list(word_embedding_model.tokenizer.vocab.keys())[1000]
vocab = ['<new_word_1>', '<new_word_2>', '<سلامسلام>', existing_word]

print('Before:', word_embedding_model.auto_model.embeddings)
word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
print('Now:', word_embedding_model.auto_model.embeddings)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[
    'A sentence containing <new_word_1> and <new_word_2>.', 
    'A sentence containing only <new_word_2>.', 
    'A sentence containing <سلامسلام>', 
    f'A sentence containing {existing_word}'
]

train_dataset = datasets.DenoisingAutoEncoderDataset(train_sentences)
train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)
train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=model_name, tie_encoder_decoder=True)

model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=3,
    weight_decay=0,
    scheduler='constantlr',
    optimizer_params={'lr': 3e-5},
    show_progress_bar=True
)

This works fine on my server. If this does not work from your side, then I think it is either because of your wrong version of SBERT repo (I pass the test above using sentence-transformers==1.1.1) or a CUDA problem.

And if this also works from your side, then I think it is related to a new word/token. And you can do this to locate it: You can iterate over all the new words, create a sentence containing each of them and fit the TSDAE model for each of them. Your computer may throw an exception at a certain point and if that happened, please tell us which it is.

ReySadeghi · 2021-05-16T10:25:23Z

yes, I used latest version of SBERT and used the same script but still got error!!

I got this warning too, could this cause the problem?

/lib/python3.7/site-packages/pandas/compat/init.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.

kwang2049 · 2021-05-16T18:37:54Z

yes, I used latest version of SBERT and used the same script but still got error!!

I got this warning too, could this cause the problem?

/lib/python3.7/site-packages/pandas/compat/init.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.

Could you please run the code snippet mentioned above?
Your warning seems to have nothing to do with the SBERT repo, since the pandas package is not required.

ReySadeghi · 2021-05-17T06:07:25Z

yeah, it's solved.
sorry, the latest version hadn't installed carefully.
thanks

ReySadeghi · 2021-05-17T06:12:56Z

@nreimers
does the code support running on multi GPU?

ReySadeghi · 2021-05-23T05:38:01Z

@kwang2049 @nreimers
hi, I ran the code snippet mentioned above to add 10k new tokens, after 1 epoch training , when I want to use saved model to vectorize sentences, I got this error:

AssertionError: Non-consecutive added token '#سلام' found. Should have index 100005 but has index 100006 in saved vocabulary.

ReySadeghi · 2021-05-23T05:39:28Z

@nreimers
hi, I tried TSDA code to train my model, but it doesn't give me any information about train loss during training.

nreimers · 2021-05-25T07:05:48Z

Train loss is not computed & plotted during training

kwang2049 · 2021-05-25T13:24:45Z

@kwang2049 @nreimers
hi, I ran the code snippet mentioned above to add 10k new tokens, after 1 epoch training , when I want to use saved model to vectorize sentences, I got this error:

AssertionError: Non-consecutive added token '#نوید_افکاری' found. Should have index 100005 but has index 100006 in saved vocabulary.

Hi @ReySadeghi, I cannot reproduce it: I found it can successfully load the SBERT checkpoint with added tokens. Before a more detailed conversation, could you please do this checking: (to see if there will still be the assertion error without TSDAE)

from sentence_transformers import SentenceTransformer
from sentence_transformers import models


model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name, max_seq_length=250)

existing_word = list(word_embedding_model.tokenizer.vocab.keys())[1000]
vocab = ['<new_word_1>', '<new_word_2>', '<سلامسلام>', existing_word, '<new_subword111>', '<new_subword222>']

print('Before:', word_embedding_model.auto_model.embeddings)
word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
print('Now:', word_embedding_model.auto_model.embeddings)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[
    'A sentence containing <new_word_1> and <new_word_2>.', 
    'A sentence containing only <new_word_2>.', 
    'A sentence containing <سلامسلام>', 
    f'A sentence containing {existing_word}'
    'A sentence containing <new_subword111>xxx, my<new_subword222>yyyu'
]

model.save('sbert_tokens_added')
model = SentenceTransformer('sbert_tokens_added')
print([model[0].tokenizer.tokenize(sentence) for sentence in train_sentences])

If running this new snippet also reports the error, I think it might be related to your transformers version. And if this works well, you can change the vocab variable above into your new token list and try again.

ReySadeghi · 2021-05-31T06:29:00Z

@kwang2049 @nreimers
hi, I ran the code snippet mentioned above to add 10k new tokens, after 1 epoch training , when I want to use saved model to vectorize sentences, I got this error:
AssertionError: Non-consecutive added token '#نوید_افکاری' found. Should have index 100005 but has index 100006 in saved vocabulary.

Hi @ReySadeghi, I cannot reproduce it: I found it can successfully load the SBERT checkpoint with added tokens. Before a more detailed conversation, could you please do this checking: (to see if there will still be the assertion error without TSDAE)
from sentence_transformers import SentenceTransformer
from sentence_transformers import models


model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name, max_seq_length=250)

existing_word = list(word_embedding_model.tokenizer.vocab.keys())[1000]
vocab = ['<new_word_1>', '<new_word_2>', '<سلامسلام>', existing_word, '<new_subword111>', '<new_subword222>']

print('Before:', word_embedding_model.auto_model.embeddings)
word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
print('Now:', word_embedding_model.auto_model.embeddings)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[
    'A sentence containing <new_word_1> and <new_word_2>.', 
    'A sentence containing only <new_word_2>.', 
    'A sentence containing <سلامسلام>', 
    f'A sentence containing {existing_word}'
    'A sentence containing <new_subword111>xxx, my<new_subword222>yyyu'
]

model.save('sbert_tokens_added')
model = SentenceTransformer('sbert_tokens_added')
print([model[0].tokenizer.tokenize(sentence) for sentence in train_sentences])
If running this new snippet also reports the error, I think it might be related to your transformers version. And if this works well, you can change the vocab variable above into your new token list and try again.

I tried this and it was ok, but actually I think the problem was due to some tokens that weren't in utf-8 encoding, when I removed them the problem was solved.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

unsupervised learning -tsda #894

unsupervised learning -tsda #894

ReySadeghi commented Apr 25, 2021

nreimers commented Apr 26, 2021

kwang2049 commented Apr 26, 2021

ReySadeghi commented Apr 27, 2021

kwang2049 commented Apr 27, 2021

ReySadeghi commented Apr 28, 2021 •

edited

Loading

nreimers commented Apr 28, 2021

ReySadeghi commented Apr 28, 2021 •

edited

Loading

kwang2049 commented Apr 29, 2021

ReySadeghi commented May 2, 2021

kwang2049 commented May 2, 2021

ReySadeghi commented May 16, 2021

kwang2049 commented May 16, 2021

ReySadeghi commented May 16, 2021 •

edited

Loading

kwang2049 commented May 16, 2021

ReySadeghi commented May 17, 2021

ReySadeghi commented May 17, 2021 •

edited

Loading

ReySadeghi commented May 23, 2021 •

edited

Loading

ReySadeghi commented May 23, 2021 •

edited

Loading

nreimers commented May 25, 2021

kwang2049 commented May 25, 2021

ReySadeghi commented May 31, 2021

unsupervised learning -tsda #894

unsupervised learning -tsda #894

Comments

ReySadeghi commented Apr 25, 2021

nreimers commented Apr 26, 2021

kwang2049 commented Apr 26, 2021

ReySadeghi commented Apr 27, 2021

kwang2049 commented Apr 27, 2021

ReySadeghi commented Apr 28, 2021 • edited Loading

nreimers commented Apr 28, 2021

ReySadeghi commented Apr 28, 2021 • edited Loading

kwang2049 commented Apr 29, 2021

ReySadeghi commented May 2, 2021

kwang2049 commented May 2, 2021

ReySadeghi commented May 16, 2021

kwang2049 commented May 16, 2021

ReySadeghi commented May 16, 2021 • edited Loading

kwang2049 commented May 16, 2021

ReySadeghi commented May 17, 2021

ReySadeghi commented May 17, 2021 • edited Loading

ReySadeghi commented May 23, 2021 • edited Loading

ReySadeghi commented May 23, 2021 • edited Loading

nreimers commented May 25, 2021

kwang2049 commented May 25, 2021

ReySadeghi commented May 31, 2021

ReySadeghi commented Apr 28, 2021 •

edited

Loading

ReySadeghi commented Apr 28, 2021 •

edited

Loading

ReySadeghi commented May 16, 2021 •

edited

Loading

ReySadeghi commented May 17, 2021 •

edited

Loading

ReySadeghi commented May 23, 2021 •

edited

Loading

ReySadeghi commented May 23, 2021 •

edited

Loading