You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We released Storm-7B, the first open-source language model comparable to the GPT-4 series on the AlpacaEval 2.0 leaderboard, ranking 3rd in length-controlled win rate.
The recipe for this model is simple: 1) fine-tuning from Openchat-3.5-0106, 2) applying iterative DPO training, a variant of DPO where a language model iteratively learns from the preferences of the trained reward model. We will release our technical report and code as soon as possible.
Slight doubt whether this qualifies as an instruction-tuned model; no details available yet.
The text was updated successfully, but these errors were encountered:
https://huggingface.co/jieliu/Storm-7B
Slight doubt whether this qualifies as an instruction-tuned model; no details available yet.
The text was updated successfully, but these errors were encountered: