Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prediction becomes empty, therefore the loss become nan. #34

Open
zero90169 opened this issue Nov 16, 2024 · 1 comment
Open

Prediction becomes empty, therefore the loss become nan. #34

zero90169 opened this issue Nov 16, 2024 · 1 comment

Comments

@zero90169
Copy link

zero90169 commented Nov 16, 2024

I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive.
My own dataset looks like the following format
{'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}

and it will be formed like this

{{ bos }}
user: data[idx]['instuction']
{{ eos }}
assistant:
classificiation: data[idx]['output']
{{ eos }}

Everything works totally fine and the evaluation results is satisfied.

However, everything goes wrong when I try to fine-tune the 9B model.
I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.

The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.

The first step of model predictions:
Decoded Predictions: ['" on the  provided the followingE"s" section... "]

The few steps of model predictions:
Decoded Predictions: ['', '', '', '']

This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.

package version:
accelerate==1.0.1
bitsandbytes==0.42.0
deepspeed==0.15.2
datasets==2.17.0
evaluate==0.4.1
gpustat==1.1.1
huggingface-hub==0.23.2
hydra-core==1.3.2
icecream==2.1.3
Jinja2==3.1.2
jsonlines==4.0.0
langchain==0.1.0
langchain-core==0.1.8
loguru==0.7.2
mlflow==2.9.0
openai==1.40.0
packaging>=23.2
pandas==2.0.3
peft==0.11.0
pendulum==2.1.2
pyarrow==14.0.0
pysnooper==1.2.0
PyYAML==6.0
retrying==1.3.4
scikit-learn==1.3.2
seaborn==0.13.0
tokenizers==0.20.3
torch==2.4.0
torchvision==0.19.0
vllm==0.6.3.post1
transformers==4.45.2
trl==0.11.0
wandb==0.16.0
flash-attn==2.6.2
@albertan017
Copy link
Owner

albertan017 commented Nov 19, 2024

The 9B model is based on Yi-Coder, while the training script is from Deepseek-Coder. We did not test the 9B model for the script, we recommend to use llama factory to tune the 9B model.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants