Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

unsupervised learning -tsda #894

Open
ReySadeghi opened this issue Apr 25, 2021 · 21 comments
Open

unsupervised learning -tsda #894

ReySadeghi opened this issue Apr 25, 2021 · 21 comments

Comments

@ReySadeghi
Copy link

Hi,
I used TSDA method to pretrain a BERT model on a corpus of sentences and I got this error:

RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)

and then used CUDA_LAUNCH_BLOCKING=1 python [YOUR_PROGRAM] to trace the error and got this:

RuntimeError: CUDA error: device-side assert triggered

any help?

@nreimers
Copy link
Member

Looks like some issue with CUDA. Don't know how to fix it

@kwang2049
Copy link
Member

Hi ReySadeghi, could you please run on CPU and see whether there is still a problem?

@ReySadeghi
Copy link
Author

Hi ReySadeghi, could you please run on CPU and see whether there is still a problem?

Hi, in one case I tried and Got this error:
indexerror: list index out of range python

and in another cases that I tried, RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED
is still remain.

@kwang2049
Copy link
Member

Could you please paste here the whole training script and also the whole log?

@ReySadeghi
Copy link
Author

ReySadeghi commented Apr 28, 2021

Could you please paste here the whole training script and also the whole log?

training script:

from sentence_transformers import SentenceTransformer, LoggingHandler
from sentence_transformers import models, util, datasets, evaluation, losses
from torch.utils.data import DataLoader

import nltk

vocab=[]
with open('vocab30k.txt', mode='r',encoding="utf8",errors='ignore') as file2:
for line2 in file2:
line2=line2.split('\n')[0]
line2=line2.strip()
vocab.append(line2)

vocab=vocab[:10000]

model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name,max_seq_length=250)

word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[]
with open('fa5M_shuffeled.txt', mode='r',encoding="utf8",errors='ignore') as file2:
for line2 in file2:
line2=line2.split('\n')[0]
line2=line2.strip()
train_sentences.append(line2)

train_sentences=train_sentences[:2000000]

train_dataset = datasets.DenoisingAutoEncoderDataset(train_sentences)

train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)

train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=model_name, tie_encoder_decoder=True)

model.fit(
train_objectives=[(train_dataloader, train_loss)],
epochs=3,
weight_decay=0,
scheduler='constantlr',
optimizer_params={'lr': 3e-5},
show_progress_bar=True
)

..................................................
my coda version: 11.3

the Error:

lib/python3.7/site-packages/pandas/compat/init.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.
warnings.warn(msg)
Some weights of the model checkpoint at HooshvareLab/bert-fa-base-uncased were not used when initializing BertLMHeadModel: ['cls.seq_relationship.weight', 'cls.seq_relationship.bias']

  • This IS expected if you are initializing BertLMHeadModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
  • This IS NOT expected if you are initializing BertLMHeadModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).
    Some weights of BertLMHeadModel were not initialized from the model checkpoint at HooshvareLab/bert-fa-base-uncased and are newly initialized: ['bert.encoder.layer.0.crossattention.self.query.weight', 'bert.encoder.layer.0.crossattention.self.query.bias', 'bert.encoder.layer.0.crossattention.self.key.weight', 'bert.encoder.layer.0.crossattention.self.key.bias', 'bert.encoder.layer.0.crossattention.self.value.weight', 'bert.encoder.layer.0.crossattention.self.value.bias', 'bert.encoder.layer.0.crossattention.output.dense.weight', 'bert.encoder.layer.0.crossattention.output.dense.bias', 'bert.encoder.layer.0.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.0.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.1.crossattention.self.query.weight', 'bert.encoder.layer.1.crossattention.self.query.bias', 'bert.encoder.layer.1.crossattention.self.key.weight', 'bert.encoder.layer.1.crossattention.self.key.bias', 'bert.encoder.layer.1.crossattention.self.value.weight', 'bert.encoder.layer.1.crossattention.self.value.bias', 'bert.encoder.layer.1.crossattention.output.dense.weight', 'bert.encoder.layer.1.crossattention.output.dense.bias', 'bert.encoder.layer.1.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.1.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.2.crossattention.self.query.weight', 'bert.encoder.layer.2.crossattention.self.query.bias', 'bert.encoder.layer.2.crossattention.self.key.weight', 'bert.encoder.layer.2.crossattention.self.key.bias', 'bert.encoder.layer.2.crossattention.self.value.weight', 'bert.encoder.layer.2.crossattention.self.value.bias', 'bert.encoder.layer.2.crossattention.output.dense.weight', 'bert.encoder.layer.2.crossattention.output.dense.bias', 'bert.encoder.layer.2.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.2.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.3.crossattention.self.query.weight', 'bert.encoder.layer.3.crossattention.self.query.bias', 'bert.encoder.layer.3.crossattention.self.key.weight', 'bert.encoder.layer.3.crossattention.self.key.bias', 'bert.encoder.layer.3.crossattention.self.value.weight', 'bert.encoder.layer.3.crossattention.self.value.bias', 'bert.encoder.layer.3.crossattention.output.dense.weight', 'bert.encoder.layer.3.crossattention.output.dense.bias', 'bert.encoder.layer.3.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.3.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.4.crossattention.self.query.weight', 'bert.encoder.layer.4.crossattention.self.query.bias', 'bert.encoder.layer.4.crossattention.self.key.weight', 'bert.encoder.layer.4.crossattention.self.key.bias', 'bert.encoder.layer.4.crossattention.self.value.weight', 'bert.encoder.layer.4.crossattention.self.value.bias', 'bert.encoder.layer.4.crossattention.output.dense.weight', 'bert.encoder.layer.4.crossattention.output.dense.bias', 'bert.encoder.layer.4.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.4.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.5.crossattention.self.query.weight', 'bert.encoder.layer.5.crossattention.self.query.bias', 'bert.encoder.layer.5.crossattention.self.key.weight', 'bert.encoder.layer.5.crossattention.self.key.bias', 'bert.encoder.layer.5.crossattention.self.value.weight', 'bert.encoder.layer.5.crossattention.self.value.bias', 'bert.encoder.layer.5.crossattention.output.dense.weight', 'bert.encoder.layer.5.crossattention.output.dense.bias', 'bert.encoder.layer.5.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.5.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.6.crossattention.self.query.weight', 'bert.encoder.layer.6.crossattention.self.query.bias', 'bert.encoder.layer.6.crossattention.self.key.weight', 'bert.encoder.layer.6.crossattention.self.key.bias', 'bert.encoder.layer.6.crossattention.self.value.weight', 'bert.encoder.layer.6.crossattention.self.value.bias', 'bert.encoder.layer.6.crossattention.output.dense.weight', 'bert.encoder.layer.6.crossattention.output.dense.bias', 'bert.encoder.layer.6.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.6.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.7.crossattention.self.query.weight', 'bert.encoder.layer.7.crossattention.self.query.bias', 'bert.encoder.layer.7.crossattention.self.key.weight', 'bert.encoder.layer.7.crossattention.self.key.bias', 'bert.encoder.layer.7.crossattention.self.value.weight', 'bert.encoder.layer.7.crossattention.self.value.bias', 'bert.encoder.layer.7.crossattention.output.dense.weight', 'bert.encoder.layer.7.crossattention.output.dense.bias', 'bert.encoder.layer.7.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.7.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.8.crossattention.self.query.weight', 'bert.encoder.layer.8.crossattention.self.query.bias', 'bert.encoder.layer.8.crossattention.self.key.weight', 'bert.encoder.layer.8.crossattention.self.key.bias', 'bert.encoder.layer.8.crossattention.self.value.weight', 'bert.encoder.layer.8.crossattention.self.value.bias', 'bert.encoder.layer.8.crossattention.output.dense.weight', 'bert.encoder.layer.8.crossattention.output.dense.bias', 'bert.encoder.layer.8.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.8.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.9.crossattention.self.query.weight', 'bert.encoder.layer.9.crossattention.self.query.bias', 'bert.encoder.layer.9.crossattention.self.key.weight', 'bert.encoder.layer.9.crossattention.self.key.bias', 'bert.encoder.layer.9.crossattention.self.value.weight', 'bert.encoder.layer.9.crossattention.self.value.bias', 'bert.encoder.layer.9.crossattention.output.dense.weight', 'bert.encoder.layer.9.crossattention.output.dense.bias', 'bert.encoder.layer.9.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.9.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.10.crossattention.self.query.weight', 'bert.encoder.layer.10.crossattention.self.query.bias', 'bert.encoder.layer.10.crossattention.self.key.weight', 'bert.encoder.layer.10.crossattention.self.key.bias', 'bert.encoder.layer.10.crossattention.self.value.weight', 'bert.encoder.layer.10.crossattention.self.value.bias', 'bert.encoder.layer.10.crossattention.output.dense.weight', 'bert.encoder.layer.10.crossattention.output.dense.bias', 'bert.encoder.layer.10.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.10.crossattention.output.LayerNorm.bias', 'bert.encoder.layer.11.crossattention.self.query.weight', 'bert.encoder.layer.11.crossattention.self.query.bias', 'bert.encoder.layer.11.crossattention.self.key.weight', 'bert.encoder.layer.11.crossattention.self.key.bias', 'bert.encoder.layer.11.crossattention.self.value.weight', 'bert.encoder.layer.11.crossattention.self.value.bias', 'bert.encoder.layer.11.crossattention.output.dense.weight', 'bert.encoder.layer.11.crossattention.output.dense.bias', 'bert.encoder.layer.11.crossattention.output.LayerNorm.weight', 'bert.encoder.layer.11.crossattention.output.LayerNorm.bias']
    You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.
    The following encoder weights were not tied to the decoder ['bert/pooler']
    Iteration: 0%| | 0/500000 [00:00<?, ?it/s]
    Epoch: 0%| | 0/3 [00:00<?, ?it/s]
    Traceback (most recent call last):
    File "finetune_tsda.py", line 53, in
    show_progress_bar=True
    File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
    loss_value = loss_model(features, labels)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
    reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
    input = module(input)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
    output_states = self.auto_model(**trans_features, return_dict=False)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 969, in forward
    past_key_values_length=past_key_values_length,
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 209, in forward
    embeddings = self.dropout(embeddings)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
    result = self.forward(*input, **kwargs)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/dropout.py", line 58, in forward
    return F.dropout(input, self.p, self.training, self.inplace)
    File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 973, in dropout
    else _VF.dropout(input, p, training))
    RuntimeError: CUDA error: device-side assert triggered

@nreimers
Copy link
Member

Does it work when you use bert-base-uncased?

Also check that you have a recent version of Pytorch and transformers

@ReySadeghi
Copy link
Author

ReySadeghi commented Apr 28, 2021

Does it work when you use bert-base-uncased?

Also check that you have a recent version of Pytorch and transformers

I edited it, actually the model name is 'HooshvareLab/bert-fa-base-uncased'.

@kwang2049
Copy link
Member

Thanks for reporting this issue!
We have located the bug: When one adds tokens to the encoder's lookup table, the _tie_encoder_decoder_weights function will tie the weights between encoder&decoder and thus make the encoder's lookup table back to the original one (since the decoder is initialized by the original checkpoint). We have found the solution and will fix it soon. The future version will initialize the decoder from encoder.config._name_or_path if tie_encoder_decoder=True and will contain more checking.

@ReySadeghi
Copy link
Author

Thanks for reporting this issue!
We have located the bug: When one adds tokens to the encoder's lookup table, the _tie_encoder_decoder_weights function will tie the weights between encoder&decoder and thus make the encoder's lookup table back to the original one (since the decoder is initialized by the original checkpoint). We have found the solution and will fix it soon. The future version will initialize the decoder from encoder.config._name_or_path if tie_encoder_decoder=True and will contain more checking.

thanks. please inform me when the bug fixed.

@kwang2049
Copy link
Member

Thanks for reporting this issue!
We have located the bug: When one adds tokens to the encoder's lookup table, the _tie_encoder_decoder_weights function will tie the weights between encoder&decoder and thus make the encoder's lookup table back to the original one (since the decoder is initialized by the original checkpoint). We have found the solution and will fix it soon. The future version will initialize the decoder from encoder.config._name_or_path if tie_encoder_decoder=True and will contain more checking.

thanks. please inform me when the bug fixed.

Hi, ReySadeghi. The bug has been fixed since this commit 022b2dd . So please git clone the latest version and pip install -e . to try it:).

@ReySadeghi
Copy link
Author

@kwang2049
Hi, I tried the latest version. running on CPU is ok but on GPU I got this Error:

Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 981, in forward
return_dict=return_dict,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 575, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 461, in forward
past_key_value=self_attn_past_key_value,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 394, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 253, in forward
mixed_query_layer = self.query(hidden_states)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
....................................................
and I tried "CUDA_LAUNCH_BLOCKING=1 python3.7 script.py" for more stack trace and got:

] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.

/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
Epoch: 0%| | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 969, in forward
past_key_values_length=past_key_values_length,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 204, in forward
embeddings = inputs_embeds + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered

@kwang2049
Copy link
Member

@kwang2049
Hi, I tried the latest version. running on CPU is ok but on GPU I got this Error:

Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 981, in forward
return_dict=return_dict,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 575, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 461, in forward
past_key_value=self_attn_past_key_value,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 394, in forward
output_attentions,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 253, in forward
mixed_query_layer = self.query(hidden_states)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/linear.py", line 91, in forward
return F.linear(input, self.weight, self.bias)
File "/usr/local/lib/python3.7/site-packages/torch/nn/functional.py", line 1676, in linear
output = input.matmul(weight.t())
RuntimeError: CUDA error: CUBLAS_STATUS_ALLOC_FAILED when calling cublasCreate(handle)
....................................................
and I tried "CUDA_LAUNCH_BLOCKING=1 python3.7 script.py" for more stack trace and got:

] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [125,0,0] Assertion srcIndex < srcSelectDimSize failed.

/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [126,0,0] Assertion srcIndex < srcSelectDimSize failed.
/pytorch/aten/src/THC/THCTensorIndex.cu:272: indexSelectLargeIndex: block: [171,0,0], thread: [127,0,0] Assertion srcIndex < srcSelectDimSize failed.
Epoch: 0%| | 0/6 [00:00<?, ?it/s]
Traceback (most recent call last):
File "finetune_tsda.py", line 53, in
show_progress_bar=True
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/SentenceTransformer.py", line 567, in fit
loss_value = loss_model(features, labels)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/losses/DenoisingAutoEncoderLoss.py", line 90, in forward
reps = self.encoder(source_features)['sentence_embedding'] # (bsz, hdim)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/container.py", line 117, in forward
input = module(input)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/sentence_transformers/models/Transformer.py", line 38, in forward
output_states = self.auto_model(**trans_features, return_dict=False)
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 969, in forward
past_key_values_length=past_key_values_length,
File "/usr/local/lib/python3.7/site-packages/torch/nn/modules/module.py", line 722, in _call_impl
result = self.forward(*input, **kwargs)
File "/usr/local/lib/python3.7/site-packages/transformers/models/bert/modeling_bert.py", line 204, in forward
embeddings = inputs_embeds + token_type_embeddings
RuntimeError: CUDA error: device-side assert triggered

Are you using the same script? Please try the code below:

from sentence_transformers import SentenceTransformer
from sentence_transformers import models, datasets, losses
from torch.utils.data import DataLoader


model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name, max_seq_length=250)

existing_word = list(word_embedding_model.tokenizer.vocab.keys())[1000]
vocab = ['<new_word_1>', '<new_word_2>', '<سلامسلام>', existing_word]

print('Before:', word_embedding_model.auto_model.embeddings)
word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
print('Now:', word_embedding_model.auto_model.embeddings)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[
    'A sentence containing <new_word_1> and <new_word_2>.', 
    'A sentence containing only <new_word_2>.', 
    'A sentence containing <سلامسلام>', 
    f'A sentence containing {existing_word}'
]

train_dataset = datasets.DenoisingAutoEncoderDataset(train_sentences)
train_dataloader = DataLoader(train_dataset, batch_size=4, shuffle=True)
train_loss = losses.DenoisingAutoEncoderLoss(model, decoder_name_or_path=model_name, tie_encoder_decoder=True)

model.fit(
    train_objectives=[(train_dataloader, train_loss)],
    epochs=3,
    weight_decay=0,
    scheduler='constantlr',
    optimizer_params={'lr': 3e-5},
    show_progress_bar=True
)

This works fine on my server. If this does not work from your side, then I think it is either because of your wrong version of SBERT repo (I pass the test above using sentence-transformers==1.1.1) or a CUDA problem.

And if this also works from your side, then I think it is related to a new word/token. And you can do this to locate it: You can iterate over all the new words, create a sentence containing each of them and fit the TSDAE model for each of them. Your computer may throw an exception at a certain point and if that happened, please tell us which it is.

@ReySadeghi
Copy link
Author

ReySadeghi commented May 16, 2021

yes, I used latest version of SBERT and used the same script but still got error!!

I got this warning too, could this cause the problem?

/lib/python3.7/site-packages/pandas/compat/init.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.

@kwang2049
Copy link
Member

yes, I used latest version of SBERT and used the same script but still got error!!

I got this warning too, could this cause the problem?

/lib/python3.7/site-packages/pandas/compat/init.py:120: UserWarning: Could not import the lzma module. Your installed Python is incomplete. Attempting to use lzma compression will result in a RuntimeError.

Could you please run the code snippet mentioned above?
Your warning seems to have nothing to do with the SBERT repo, since the pandas package is not required.

@ReySadeghi
Copy link
Author

yeah, it's solved.
sorry, the latest version hadn't installed carefully.
thanks

@ReySadeghi
Copy link
Author

ReySadeghi commented May 17, 2021

@nreimers
does the code support running on multi GPU?

@ReySadeghi
Copy link
Author

ReySadeghi commented May 23, 2021

@kwang2049 @nreimers
hi, I ran the code snippet mentioned above to add 10k new tokens, after 1 epoch training , when I want to use saved model to vectorize sentences, I got this error:

AssertionError: Non-consecutive added token '#سلام' found. Should have index 100005 but has index 100006 in saved vocabulary.

@ReySadeghi
Copy link
Author

ReySadeghi commented May 23, 2021

@nreimers
hi, I tried TSDA code to train my model, but it doesn't give me any information about train loss during training.

@nreimers
Copy link
Member

Train loss is not computed & plotted during training

@kwang2049
Copy link
Member

@kwang2049 @nreimers
hi, I ran the code snippet mentioned above to add 10k new tokens, after 1 epoch training , when I want to use saved model to vectorize sentences, I got this error:

AssertionError: Non-consecutive added token '#نوید_افکاری' found. Should have index 100005 but has index 100006 in saved vocabulary.

Hi @ReySadeghi, I cannot reproduce it: I found it can successfully load the SBERT checkpoint with added tokens. Before a more detailed conversation, could you please do this checking: (to see if there will still be the assertion error without TSDAE)

from sentence_transformers import SentenceTransformer
from sentence_transformers import models


model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name, max_seq_length=250)

existing_word = list(word_embedding_model.tokenizer.vocab.keys())[1000]
vocab = ['<new_word_1>', '<new_word_2>', '<سلامسلام>', existing_word, '<new_subword111>', '<new_subword222>']

print('Before:', word_embedding_model.auto_model.embeddings)
word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
print('Now:', word_embedding_model.auto_model.embeddings)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[
    'A sentence containing <new_word_1> and <new_word_2>.', 
    'A sentence containing only <new_word_2>.', 
    'A sentence containing <سلامسلام>', 
    f'A sentence containing {existing_word}'
    'A sentence containing <new_subword111>xxx, my<new_subword222>yyyu'
]

model.save('sbert_tokens_added')
model = SentenceTransformer('sbert_tokens_added')
print([model[0].tokenizer.tokenize(sentence) for sentence in train_sentences])

If running this new snippet also reports the error, I think it might be related to your transformers version. And if this works well, you can change the vocab variable above into your new token list and try again.

@ReySadeghi
Copy link
Author

@kwang2049 @nreimers
hi, I ran the code snippet mentioned above to add 10k new tokens, after 1 epoch training , when I want to use saved model to vectorize sentences, I got this error:
AssertionError: Non-consecutive added token '#نوید_افکاری' found. Should have index 100005 but has index 100006 in saved vocabulary.

Hi @ReySadeghi, I cannot reproduce it: I found it can successfully load the SBERT checkpoint with added tokens. Before a more detailed conversation, could you please do this checking: (to see if there will still be the assertion error without TSDAE)

from sentence_transformers import SentenceTransformer
from sentence_transformers import models


model_name = 'HooshvareLab/bert-fa-base-uncased'
word_embedding_model = models.Transformer(model_name, max_seq_length=250)

existing_word = list(word_embedding_model.tokenizer.vocab.keys())[1000]
vocab = ['<new_word_1>', '<new_word_2>', '<سلامسلام>', existing_word, '<new_subword111>', '<new_subword222>']

print('Before:', word_embedding_model.auto_model.embeddings)
word_embedding_model.tokenizer.add_tokens(vocab)
word_embedding_model.auto_model.resize_token_embeddings(len(word_embedding_model.tokenizer))
print('Now:', word_embedding_model.auto_model.embeddings)

pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode_mean_tokens=False, pooling_mode_cls_token=True, pooling_mode_max_tokens=False)
model = SentenceTransformer(modules=[word_embedding_model, pooling_model])

train_sentences=[
    'A sentence containing <new_word_1> and <new_word_2>.', 
    'A sentence containing only <new_word_2>.', 
    'A sentence containing <سلامسلام>', 
    f'A sentence containing {existing_word}'
    'A sentence containing <new_subword111>xxx, my<new_subword222>yyyu'
]

model.save('sbert_tokens_added')
model = SentenceTransformer('sbert_tokens_added')
print([model[0].tokenizer.tokenize(sentence) for sentence in train_sentences])

If running this new snippet also reports the error, I think it might be related to your transformers version. And if this works well, you can change the vocab variable above into your new token list and try again.

I tried this and it was ok, but actually I think the problem was due to some tokens that weren't in utf-8 encoding, when I removed them the problem was solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants