diff --git a/.spellcheck-en-custom.txt b/.spellcheck-en-custom.txt index 4ebd275a..0154a3b5 100644 --- a/.spellcheck-en-custom.txt +++ b/.spellcheck-en-custom.txt @@ -34,6 +34,7 @@ Dropdown env EP Eval +eval Excalidraw exfiltrate exfiltrating @@ -52,6 +53,7 @@ Inferencing instructlab ISA JIT +JSON Jupyter KAGGLE Kaggle @@ -63,8 +65,8 @@ LLM llms LLVM lora -md Markdownlint +md Mergify Merlinite mimimum @@ -72,10 +74,11 @@ Miniforge Mixtral MLX mlx +MMLU NVidia Nvidia -ollama Ollama +ollama orchestrator ots Pareja @@ -104,12 +107,12 @@ RX safetensors Salawu SDG -Sigstore sdg sexualized SHA Shivchander Signoff +Sigstore Srivastava subdirectory Sudalairaj diff --git a/docs/backend/eval-repo.md b/docs/backend/eval-repo.md new file mode 100644 index 00000000..94ce0c28 --- /dev/null +++ b/docs/backend/eval-repo.md @@ -0,0 +1,44 @@ +# New Repository Proposal: eval + +## Summary + +This document proposes a new repository under the `instructlab` GitHub organization: + +- `instructlab/eval` + +## Background + +The `instructlab/instructlab` repository currently includes no real implementation +of Evaluation as described by the [LAB paper](https://arxiv.org/abs/2403.01081). The +closest implementation currently in `instructlab/instructlab` via the `ilab test` command. + +`ilab test` as of this writing is only implemented for macOS with M-series chips. It uses +a JSON Lines file and a LoRA adapter to compare output of a given model before and after +LoRA training with MLX, thus the macOS M-series dependency. + +We desire to build out an implementation closer to the described evaluation in the paper, +using more high-level evaluation schemes such as +[Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300) (MMLU). +We propose a new repository to house this code that publishes a new Python library +called `instructlab-eval`. The reasoning for a new repository and library includes: + +- We expect multiple consumers of this code. The `ilab` CLI is one, but we also envision +building a REST API around it to help support scaling out this functionality on a cluster. +- We expect there is broader community interest in an open-source library and service for +evaluation. We envision this library could support other evaluation techniques over time. + +## Maintainers + +The initial team of maintainers for this repository will be a copy of the +`Backend Maintainers` GitHub team. + +## Alternatives Considered + +### Add to `instructlab/instructlab` + +We could add this code to the existing `instructlab/instructlab` repository. + +The primary argument against this approach is that we expect the scope of an +`instructlab-eval` library to expand beyond the scope of what would be run by the +`ilab` CLI. We instead envision a different community of contributors organizing +around Evaluation specifically. diff --git a/docs/sdg-repo.md b/docs/backend/sdg-repo.md similarity index 100% rename from docs/sdg-repo.md rename to docs/backend/sdg-repo.md