You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive.
My own dataset looks like the following format {'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}
and it will be formed like this
{{ bos }}
user: data[idx]['instuction']
{{ eos }}
assistant:
classificiation: data[idx]['output']
{{ eos }}
Everything works totally fine and the evaluation results is satisfied.
However, everything goes wrong when I try to fine-tune the 9B model.
I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.
The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.
The first step of model predictions:
Decoded Predictions: ['" on the provided the followingE"s" section... "]
The few steps of model predictions:
Decoded Predictions: ['', '', '', '']
This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.
The 9B model is based on Yi-Coder, while the training script is from Deepseek-Coder. We did not test the 9B model for the script, we recommend to use llama factory to tune the 9B model.
I've tried to finetune the llm4decompile-6.7b model on my dataset and the result is impressive.
My own dataset looks like the following format
{'instruction': 'MY_CUSTOMIZE_QUESTION, 'input': '', 'output': 'MY_CUSTOMIZE_ANSWER}
and it will be formed like this
Everything works totally fine and the evaluation results is satisfied.
However, everything goes wrong when I try to fine-tune the 9B model.
I change the part of my code that loads the model from ‘llm4decompile-6.7b’ to ‘llm4decompile-9b’ while keeping everything else the same.
The model prediction becomes empty after a few steps update and the loss become nan due to the empty output.
This question is really bothering me, and I hope someone can give me some advice. Any advice would be greatly appreciated.
The text was updated successfully, but these errors were encountered: