You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Why do I get an OOM (Out of Memory) error? My GPU is 80G A800, and the model is only 7B with a batch size of 1. I believe this configuration should not cause an OOM.
2、LoRA train
To be able to perform training, I used the --lora option. However, after training, the checkpoint saved is 24GB, while the original model was only 14GB:
I would like to know why this is the case. Additionally, I received the following warning when loading:
Some weights of the model checkpoint at /mnt/data1/zmj/embedding_model/gritlm-main/gritlm/output/7-2_lora were not used when initializing MistralForCausalLM: ['model.base_model.model.embed_tokens.weight', 'model.base_model.model.layers.0.input_layernorm.weight', 'model.base_model.model.layers.0.mlp.down_proj.weight', 'model.base_model.model.layers.0.mlp.gate_proj.weight',...]
3、attn
After reading the paper, I understand that you used bidirectional attn for training the embedding task. However, why does the example script you provided for the embedding task use: --attn cccc
I look forward to your response.
The text was updated successfully, but these errors were encountered:
2,
I have not tried LoRA but it looks to me like your checkpoint was saved in FP32 which doubles the size. The warning is problematic cuz it means ur weights are not loaded.
2, I have not tried LoRA but it looks to me like your checkpoint was saved in FP32 which doubles the size. The warning is problematic cuz it means ur weights are not loaded.2, 我没有尝试过 LoRA,但在我看来,您的检查点保存在 FP32 中,大小增加了一倍。警告是有问题的,因为它意味着您的砝码没有加载。
Thank you for your contribution! I have encountered some issues.
1、Full train
Here is my training script:
Why do I get an OOM (Out of Memory) error? My GPU is 80G A800, and the model is only 7B with a batch size of 1. I believe this configuration should not cause an OOM.
2、LoRA train
To be able to perform training, I used the --lora option. However, after training, the checkpoint saved is 24GB, while the original model was only 14GB:
I would like to know why this is the case. Additionally, I received the following warning when loading:
Some weights of the model checkpoint at /mnt/data1/zmj/embedding_model/gritlm-main/gritlm/output/7-2_lora were not used when initializing MistralForCausalLM: ['model.base_model.model.embed_tokens.weight', 'model.base_model.model.layers.0.input_layernorm.weight', 'model.base_model.model.layers.0.mlp.down_proj.weight', 'model.base_model.model.layers.0.mlp.gate_proj.weight',...]
3、attn
After reading the paper, I understand that you used bidirectional attn for training the embedding task. However, why does the example script you provided for the embedding task use: --attn cccc
I look forward to your response.
The text was updated successfully, but these errors were encountered: