diff --git a/evals/evaluation/HELMET/README.md b/evals/evaluation/HELMET/README.md index 4cb23e49..21eef3d7 100644 --- a/evals/evaluation/HELMET/README.md +++ b/evals/evaluation/HELMET/README.md @@ -1,8 +1,8 @@ -# HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly HELMET +# HELMET: How to Evaluate Long-context Language Models Effectively and Thoroughly HELMET [[Paper](https://arxiv.org/abs/2410.02694)] -HELMET HELMET (How to Evaluate Long-context Models Effectively and Thoroughly) is a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. +HELMET HELMET (How to Evaluate Long-context Models Effectively and Thoroughly) is a comprehensive benchmark for long-context language models covering seven diverse categories of tasks. The datasets are application-centric and are designed to evaluate models at different lengths and levels of complexity. Please check out the paper for more details, and this repo will detail how to run the evaluation. diff --git a/evals/evaluation/HELMET/assets/logo.jpeg b/evals/evaluation/HELMET/assets/logo.jpeg deleted file mode 100644 index fb40ece2..00000000 Binary files a/evals/evaluation/HELMET/assets/logo.jpeg and /dev/null differ diff --git a/evals/evaluation/HELMET/assets/logo.png b/evals/evaluation/HELMET/assets/logo.png new file mode 100644 index 00000000..3c9d1a08 Binary files /dev/null and b/evals/evaluation/HELMET/assets/logo.png differ