Finetune Problem 微调的各种问题 #270
Replies: 42 comments 81 replies
-
您好!很高兴发现了CogLVM 关于微调有一下问题: |
Beta Was this translation helpful? Give feedback.
-
hi,我在按照示例来微调
我自己的参数是(_name_or_path='/mntnlp/common_base_model/cogvqa/cogagent', architectures=['CogAgentForCausalLM'], attention_dropout=0.1, auto_map={'AutoConfig': 'configuration_cogagent.CogAgentConfig', 'AutoModelForCausalLM': 'modeling_cogagent.CogAgentForCausalLM'}, batch_from_same_dataset=False, batch_size=4, bf16=False, block_size=10000, bos_token_id=1, checkpoint_activations=False, checkpoint_num_layers=1, checkpoint_skip_layers=0, cross_compute_hidden_size=1024, cross_hidden_size=1024, cross_image_pix=1120, cross_image_size=1120, cuda=True, deepscale=False, deepscale_config=None, deepspeed=True, deepspeed_activation_checkpointing=False, deepspeed_config={'train_micro_batch_size_per_gpu': 4, 'gradient_accumulation_steps': 1, 'gradient_clipping': 0.1, 'fp16': {'enabled': False, 'loss_scale': 0, 'loss_scale_window': 200, 'hysteresis': 2, 'min_loss_scale': 0.01}, 'bf16': {'enabled': False}, 'optimizer': {'type': 'AdamW', 'params': {'lr': 0.0001, 'weight_decay': 0.01}}}, deepspeed_mpi=False, device=0, distributed_backend='nccl', drop_path=0.0, eos_token_id=2, epochs=None, eva_args={'model_parallel_size': 1}, eval_batch_size=None, eval_interval=None, eval_iters=100, exit_interval=None, experiment_name='finetune-/mntnlp/common_base_model/cogvqa', fp16=False, from_pretrained='/mntnlp/common_base_model/cogvqa', gradient_accumulation_steps=1, hidden_act='silu', hidden_dropout=0.1, hidden_size=4096, hidden_size_per_attention_head=None, ignore_pad_token_for_loss=True, image_length=256, initializer_range=0.02, inner_hidden_size=None, input_source='interactive', intermediate_size=11008, iterable_dataset=False, layer_range=None, layernorm_epsilon=1e-05, layernorm_order='pre', length_penalty=0.0, load=None, local_rank=0, local_tokenizer='/mntnlp/common_base_model/vicuna_v1.5_7b', log_interval=50, lora_rank=50, lr=0.0001, lr_decay_iters=None, lr_decay_ratio=0.1, lr_decay_style='cosine', make_vocab_size_divisible_by=128, master_ip='127.0.0.1', master_port='16666', max_inference_batch_size=12, max_length=400, max_position_embeddings=2048, max_sequence_length=512, min_tgt_length=0, mode='finetune', model_parallel_size=1, no_load_rng=False, no_repeat_ngram_size=0, no_save_rng=False, num_attention_heads=32, num_beams=1, num_hidden_layers=32, num_layers=6, num_multi_query_heads=0, num_workers=1, out_seq_length=256, output_path='./samples', pad_token_id=0, pre_seq_len=8, prefetch_factor=4, rank=0, resume_dataloader=True, rms_norm_eps=1e-05, save=None, save_args=False, save_interval=5000, seed=1234, skip_init=False, split='1000,1,1', strict_eval=False, summary_dir='', temperature=1.0, template_version='chat', test_data=None, tie_word_embeddings=False, tokenizer_type='fake', top_k=0, top_p=0.0, torch_dtype='bfloat16', train_data=['./archive_split/train'], train_data_weights=None, train_iters=2000, transformers_version='4.36.0.dev0', use_cache=True, use_gpu_initialization=False, use_lora=True, use_ptuning=False, use_qlora=False, valid_data=['./archive_split/valid'], version='chat', vision_config={'dropout_prob': 0.0, 'hidden_act': 'gelu', 'hidden_size': 1792, 'image_size': 224, 'in_channels': 3, 'intermediate_size': 15360, 'layer_norm_eps': 1e-06, 'num_heads': 16, 'num_hidden_layers': 63, 'num_positions': 257, 'patch_size': 14}, vit_checkpoint_activations=False, vocab_size=32000, warmup=0.02, weight_decay=0.01, with_id=False, world_size=1, zero_stage=0) 这个inner_hidden_size默认怎么设置呢,感激 |
Beta Was this translation helpful? Give feedback.
-
enable = ["encoder", "cross_attention", "linear_proj", 'mlp.vision', 'rotary.vision', 'eoi', 'boi', 'vit'] |
Beta Was this translation helpful? Give feedback.
-
This can be somewhat a duplicated question, So here are my questions:
|
Beta Was this translation helpful? Give feedback.
-
我在8*3090上试图精调CogAgent模型,设置MP_SIZE=4,内存(500G)在模型加载阶段就会爆掉。请问有什么办法能够减少这部分内存占用吗? |
Beta Was this translation helpful? Give feedback.
-
遇到了跟 #268 中一样的问题。除了MP_SIZE和NUM_GPUS_PER_WORKER外没有修改其他参数,基于官方SAT拖的权重进行精调,在 |
Beta Was this translation helpful? Give feedback.
-
想问一下,微调能放开backend的各层参数来微调吗?如果能的话,具体怎么做? |
Beta Was this translation helpful? Give feedback.
-
我按照demo的实例,下载好图片并设置好路径之后执行 bash finetune_cogvlm_lora.sh,得到下面的错误,想知道是什么原因 |
Beta Was this translation helpful? Give feedback.
-
Hello! |
Beta Was this translation helpful? Give feedback.
-
请问现在有没有对hf model进行Lora或者qlora的代码呢?为什么用提供的finetune script 无法微调本地下载的hf model呢? |
Beta Was this translation helpful? Give feedback.
-
CogAgent的pretrain数据和qa格式的微调数据有计划开源吗? |
Beta Was this translation helpful? Give feedback.
-
bash finetune_demo/finetune_cogvlm_lora.sh |
Beta Was this translation helpful? Give feedback.
-
finetune cogvlm最后一步save出来的文件大小是基底模型文件的两倍,有60+G,是因为float精度吗?请问怎样保存成比较小的文件可直接用于inference? the final saved file in finetune cogvlm is almost twice as big as the base model file, 60+G, is it float precision? How to save to smaller file that can be used for inferencing? |
Beta Was this translation helpful? Give feedback.
-
训练过程中模型阶段性保存,可以将这部分去掉?如何去掉? |
Beta Was this translation helpful? Give feedback.
-
遇到问题:使用finetune_cogvlm_demo.py 跑几张图,跑了100步,发现loss一直是0,validation pred 文字也一直没有任何变化。 训练参数在原基础上改了batch_size=1,MP_num=1, 其它都没碰。因为vram不够,可训练参数没有包含lora,仅包含vit mlp 和 ptuning。 会是什么问题呢? |
Beta Was this translation helpful? Give feedback.
-
请问怎么基于Lora微调的模型再进行Lora微调。我已经基于base模型lora微调了一个模型并保存了CKPT,我想加载这个CKPT后并联上Lora层,再在其他数据集上做微调,请问如何配置? |
Beta Was this translation helpful? Give feedback.
-
8卡A100运行finetune_cogagent_lora.sh爆显存.... |
Beta Was this translation helpful? Give feedback.
-
想问一下有人试过 ,quant4 量化微调CogAgent嘛?我报维度错误:File "/root/miniconda3/envs/CogVLM/lib/python3.10/site-packages/sat/model/finetune/lora2.py", line 97, in init |
Beta Was this translation helpful? Give feedback.
-
能否说明一下这些参数具体是指哪些模块:enable = ["encoder", "cross_attention", "linear_proj", 'mlp.vision', 'rotary.vision', 'eoi', 'boi', 'vit'],如果要冻结是不是仅仅在这里设置就可以了 |
Beta Was this translation helpful? Give feedback.
-
微调如果要从没有merge过的checkpoint应该如何继续呢? |
Beta Was this translation helpful? Give feedback.
-
请问多轮对话如何把多个问答拼接起来?如何修改dataset.py呢? |
Beta Was this translation helpful? Give feedback.
-
请问 finetune cogagent _demo 代码中使用的data_collator 返回的信息如下: 但是我看cogagent模型的forward 方法中接受的参数并不是这些,好奇中间还有哪里做了什么处理吗? |
Beta Was this translation helpful? Give feedback.
-
请问如何把4卡并行lora微调出来的模型合并成一个模型。 我加载了开源的cogvlm-base-490模型,使用MP_SIZE=4做了lora微调,保存了一个CKPT,包含4个文件(mp_rank_00_model_states.pt到mp_rank_03_model_states.pt )。请问我如何把模型合并成MP_SIZE=1的模型并存储到一个文件呢。谢谢 |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
-
I have a problem but no idea. Here is the log. (cogvlm) dhu_mbzhao_1@deeplearning-v191204-deeplearn: Could someone help me? |
Beta Was this translation helpful? Give feedback.
-
what is this mean? Is this log impact the performance? Keyword arguments {'add_special_tokens': False} not recognized. |
Beta Was this translation helpful? Give feedback.
-
My test results are as follows, is this considered good or bad? [2024-07-19 09:38:21,589] [INFO] [RANK 0] validation loss at the end of training for test data | loss: 0.000000E+00 | PPL: 1.000000E+00 acc 5.319865E-02 | acc_w/o_case 5.319865E-02 | |
Beta Was this translation helpful? Give feedback.
-
微调Cogagent的时候发现模型eval的过程中输出的pred是模型的回答,但是相对应的label却是问的问题,这是正常的嘛 |
Beta Was this translation helpful? Give feedback.
-
当我使用我自己构建的数据集,迭代一千次后loss很低,但是实际对话效果很差,经常出现错误的检测,是数据集的问题还是超参数设置的问题,用的base-490的模型进行lora微调 |
Beta Was this translation helpful? Give feedback.
-
Please articulate any questions about model fine-tuning here; these questions will be answered by community members and officials in their spare time.
在这里阐述任何关于模型微调的问题,这些问题将由社区成员和官方在空闲的时间进行回答
Beta Was this translation helpful? Give feedback.
All reactions