Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] If i want to start Fine-tuning, must the flashattention be installed? #776

Closed
snowattitude opened this issue Dec 20, 2024 · 15 comments

Comments

@snowattitude
Copy link

snowattitude commented Dec 20, 2024

Motivation

GPUS=2 PER_DEVICE_BATCH_SIZE=1 sh ./****2_5_2b_lora.sh

mistake

FlashAttention is not installed.
FlashAttention is not installed.
flash-attention package not found, consider installing for better performance: No module named 'flash_attn'.
Current flash-attenton does not support window_size. Either upgrade or use attn_implementation='eager'.
flash-attention package not found, consider installing for better performance: No module named 'flash_attn'.
Current flash-attenton does not support window_size. Either upgrade or use attn_implementation='eager'.

@snowattitude snowattitude changed the title [Feature] If i want to start Fine-tuning, the flashattention must be installed? [Feature] If i want to start Fine-tuning, must the flashattention be installed? Dec 20, 2024
@Weiyun1025
Copy link
Collaborator

It is not required; however, installing it will significantly improve the training speed.

@snowattitude
Copy link
Author

how can i turn the flash-attention's switch off? because i can't find how to turn off it
image

@snowattitude
Copy link
Author

It is not required; however, installing it will significantly improve the training speed.

does this code must use Ampere GPU?

@Weiyun1025
Copy link
Collaborator

you can try to set _attn_implementation to eager in the config.

@snowattitude
Copy link
Author

you can try to set _attn_implementation to eager in the config.

my file is the same as you

@snowattitude
Copy link
Author

{
"_commit_hash": null,
"architectures": [
"InternVLChatModel"
],
"auto_map": {
"AutoConfig": "configuration_internvl_chat.InternVLChatConfig",
"AutoModel": "modeling_internvl_chat.InternVLChatModel",
"AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel"
},
"downsample_ratio": 0.5,
"dynamic_image_size": true,
"force_image_size": 448,
"llm_config": {
"_name_or_path": "internlm/internlm2_5-1_8b-chat",
"add_cross_attention": false,
"architectures": [
"InternLM2ForCausalLM"
],
"attn_implementation": "flash_attention_2",
"auto_map": {
"AutoConfig": "configuration_internlm2.InternLM2Config",
"AutoModel": "modeling_internlm2.InternLM2ForCausalLM",
"AutoModelForCausalLM": "modeling_internlm2.InternLM2ForCausalLM",
"AutoModelForSequenceClassification": "modeling_internlm2.InternLM2ForSequenceClassification"
},
"bad_words_ids": null,
"begin_suppress_tokens": null,
"bias": false,
"bos_token_id": 1,
"chunk_size_feed_forward": 0,
"cross_attention_hidden_size": null,
"decoder_start_token_id": null,
"diversity_penalty": 0.0,
"do_sample": false,
"early_stopping": false,
"encoder_no_repeat_ngram_size": 0,
"eos_token_id": 2,
"exponential_decay_length_penalty": null,
"finetuning_task": null,
"forced_bos_token_id": null,
"forced_eos_token_id": null,
"hidden_act": "silu",
"hidden_size": 2048,
"id2label": {
"0": "LABEL_0",
"1": "LABEL_1"
},
"initializer_range": 0.02,
"intermediate_size": 8192,
"is_decoder": false,
"is_encoder_decoder": false,
"label2id": {
"LABEL_0": 0,
"LABEL_1": 1
},
"length_penalty": 1.0,
"max_length": 20,
"max_position_embeddings": 32768,
"min_length": 0,
"model_type": "internlm2",
"no_repeat_ngram_size": 0,
"num_attention_heads": 16,
"num_beam_groups": 1,
"num_beams": 1,
"num_hidden_layers": 24,
"num_key_value_heads": 8,
"num_return_sequences": 1,
"output_attentions": false,
"output_hidden_states": false,
"output_scores": false,
"pad_token_id": 2,
"prefix": null,
"pretraining_tp": 1,
"problem_type": null,
"pruned_heads": {},
"remove_invalid_values": false,
"repetition_penalty": 1.0,
"return_dict": true,
"return_dict_in_generate": false,
"rms_norm_eps": 1e-05,
"rope_scaling": {
"factor": 2.0,
"type": "dynamic"
},
"rope_theta": 1000000,
"sep_token_id": null,
"suppress_tokens": null,
"task_specific_params": null,
"temperature": 1.0,
"tf_legacy_loss": false,
"tie_encoder_decoder": false,
"tie_word_embeddings": false,
"tokenizer_class": null,
"top_k": 50,
"top_p": 1.0,
"torch_dtype": "bfloat16",
"torchscript": false,
"transformers_version": "4.37.2",
"typical_p": 1.0,
"use_bfloat16": true,
"use_cache": true,
"vocab_size": 92553
},
"max_dynamic_patch": 12,
"min_dynamic_patch": 1,
"model_type": "internvl_chat",
"ps_version": "v2",
"select_layer": -1,
"template": "internvl2_5",
"torch_dtype": "bfloat16",
"use_backbone_lora": 0,
"use_llm_lora": 0,
"use_thumbnail": true,
"vision_config": {
"architectures": [
"InternVisionModel"
],
"attention_dropout": 0.0,
"drop_path_rate": 0.0,
"dropout": 0.0,
"hidden_act": "gelu",
"hidden_size": 1024,
"image_size": 448,
"initializer_factor": 1.0,
"initializer_range": 0.02,
"intermediate_size": 4096,
"layer_norm_eps": 1e-06,
"model_type": "intern_vit_6b",
"norm_type": "layer_norm",
"num_attention_heads": 16,
"num_channels": 3,
"num_hidden_layers": 24,
"output_attentions": false,
"output_hidden_states": false,
"patch_size": 14,
"qk_normalization": false,
"qkv_bias": true,
"return_dict": true,
"torch_dtype": "bfloat16",
"transformers_version": "4.37.2",
"use_bfloat16": false,
"use_flash_attn": false
}
}

@popoyaya
Copy link

popoyaya commented Jan 2, 2025

+1,how can I finetune the model without the flash-attn?

@popoyaya
Copy link

popoyaya commented Jan 2, 2025

how can i turn the flash-attention's switch off? because i can't find how to turn off it image

+1,how can I finetune the model without the flash-attn?

@genkerizer
Copy link

image
You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

@popoyaya
Copy link

popoyaya commented Jan 5, 2025

image You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

Thanks!However, I encounter an error when installing flash-attn, and I am unable to resolve it. Is there any fine-tuning method that does not require flash-attn?

@snowattitude
Copy link
Author

image You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

Thanks!However, I encounter an error when installing flash-attn, and I am unable to resolve it. Is there any fine-tuning method that does not require flash-attn?

i can not see the question image, but u should install by the file named "~.whl"

@genkerizer
Copy link

image You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

Thanks!However, I encounter an error when installing flash-attn, and I am unable to resolve it. Is there any fine-tuning method that does not require flash-attn?

i can not see the question image, but u should install by the file named "~.whl"

image

@genkerizer
Copy link

image You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

Thanks!However, I encounter an error when installing flash-attn, and I am unable to resolve it. Is there any fine-tuning method that does not require flash-attn?

to install Flash_attn, please pull Nvidia devel image, refer in: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags

Example: nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu20.04

@popoyaya
Copy link

image You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

Thanks!However, I encounter an error when installing flash-attn, and I am unable to resolve it. Is there any fine-tuning method that does not require flash-attn?

to install Flash_attn, please pull Nvidia devel image, refer in: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags

Example: nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu20.04

Thank you so much! Help a lot!

@popoyaya
Copy link

image You need to install flash_attn after removing code at InternVL/internvl_chat/internvl/train/internvl_chat_finetune.py

Thanks!However, I encounter an error when installing flash-attn, and I am unable to resolve it. Is there any fine-tuning method that does not require flash-attn?

to install Flash_attn, please pull Nvidia devel image, refer in: https://catalog.ngc.nvidia.com/orgs/nvidia/containers/cuda/tags

Example: nvcr.io/nvidia/cuda:12.6.3-cudnn-devel-ubuntu20.04

I have another question regarding the use of LoRA. Do I only need to set --freeze_backbone False? However, after training, I found that in the config.json, "use_backbone_lora": 0. What else should I do if I want to finetune the visual encoder as well?

Thank you in advance for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants