Prediction becomes empty, therefore the loss become nan. #34

zero90169 · 2024-11-16T09:58:43Z

I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive.
My own dataset looks like the following format
{'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}

and it will be formed like this

{{ bos }}
user: data[idx]['instuction']
{{ eos }}
assistant:
classificiation: data[idx]['output']
{{ eos }}

Everything works totally fine and the evaluation results is satisfied.

However, everything goes wrong when I try to fine-tune the 9B model.
I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.

The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.

The first step of model predictions:
Decoded Predictions: ['" on the  provided the followingE"s" section... "]

The few steps of model predictions:
Decoded Predictions: ['', '', '', '']

This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.

package version:
accelerate==1.0.1
bitsandbytes==0.42.0
deepspeed==0.15.2
datasets==2.17.0
evaluate==0.4.1
gpustat==1.1.1
huggingface-hub==0.23.2
hydra-core==1.3.2
icecream==2.1.3
Jinja2==3.1.2
jsonlines==4.0.0
langchain==0.1.0
langchain-core==0.1.8
loguru==0.7.2
mlflow==2.9.0
openai==1.40.0
packaging>=23.2
pandas==2.0.3
peft==0.11.0
pendulum==2.1.2
pyarrow==14.0.0
pysnooper==1.2.0
PyYAML==6.0
retrying==1.3.4
scikit-learn==1.3.2
seaborn==0.13.0
tokenizers==0.20.3
torch==2.4.0
torchvision==0.19.0
vllm==0.6.3.post1
transformers==4.45.2
trl==0.11.0
wandb==0.16.0
flash-attn==2.6.2

The text was updated successfully, but these errors were encountered:

albertan017 · 2024-11-19T02:46:51Z

The 9B model is based on Yi-Coder, while the training script is from Deepseek-Coder. We did not test the 9B model for the script, we recommend to use llama factory to tune the 9B model.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prediction becomes empty, therefore the loss become nan. #34

Prediction becomes empty, therefore the loss become nan. #34

zero90169 commented Nov 16, 2024 •

edited

Loading

albertan017 commented Nov 19, 2024 •

edited

Loading

Prediction becomes empty, therefore the loss become nan. #34

Prediction becomes empty, therefore the loss become nan. #34

Comments

zero90169 commented Nov 16, 2024 • edited Loading

albertan017 commented Nov 19, 2024 • edited Loading

zero90169 commented Nov 16, 2024 •

edited

Loading

albertan017 commented Nov 19, 2024 •

edited

Loading