You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We are working on making DISC-Law-Eval benchmark open source, including the evaluation dataset and the scripts, but they are not ready now. Sorry about that.
关于评测的话,可以提供一下具体的评测方式嘛?
例如objective中的单选题,few-shot setting是如何设置的呢?是找四条样例数据拼在当前问题之前嘛?
以及对于模型的回答,只需要包括选项的字母就算正确,还是说需要把选项中的文字也都包括了才算正确呀?
对于subjective的问题,可以提供一下gpt3.5的评测prompt模版嘛?
非常感谢~
The text was updated successfully, but these errors were encountered: