Proposal for new Evaluation repo

Signed-off-by: Nathan Weinberg <[email protected]>
instructlab · Jun 10, 2024 · 88ed642 · 88ed642
1 parent 340e9ac
commit 88ed642
Show file tree

Hide file tree

Showing 3 changed files with 50 additions and 3 deletions.
diff --git a/.spellcheck-en-custom.txt b/.spellcheck-en-custom.txt
@@ -34,6 +34,7 @@ Dropdown
 env
 EP
 Eval
+eval
 Excalidraw
 exfiltrate
 exfiltrating
@@ -52,6 +53,7 @@ Inferencing
 instructlab
 ISA
 JIT
+JSON
 Jupyter
 KAGGLE
 Kaggle
@@ -63,19 +65,20 @@ LLM
 llms
 LLVM
 lora
-md
 Markdownlint
+md
 Mergify
 Merlinite
 mimimum
 Miniforge
 Mixtral
 MLX
 mlx
+MMLU
 NVidia
 Nvidia
-ollama
 Ollama
+ollama
 orchestrator
 ots
 Pareja
@@ -104,12 +107,12 @@ RX
 safetensors
 Salawu
 SDG
-Sigstore
 sdg
 sexualized
 SHA
 Shivchander
 Signoff
+Sigstore
 Srivastava
 subdirectory
 Sudalairaj

diff --git a/docs/backend/eval-repo.md b/docs/backend/eval-repo.md
@@ -0,0 +1,44 @@
+# New Repository Proposal: eval
+
+## Summary
+
+This document proposes a new repository under the `instructlab` GitHub organization:
+
+- `instructlab/eval`
+
+## Background
+
+The `instructlab/instructlab` repository currently includes no real implementation
+of Evaluation as described by the [LAB paper](https://arxiv.org/abs/2403.01081). The
+closest implementation currently in `instructlab/instructlab` via the `ilab test` command.
+
+`ilab test` as of this writing is only implemented for macOS with M-series chips. It uses
+a JSON Lines file and a LoRA adapter to compare output of a given model before and after
+LoRA training with MLX, thus the macOS M-series dependency.
+
+We desire to build out an implementation closer to the described evaluation in the paper,
+using more high-level evaluation schemes such as
+[Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300) (MMLU).
+We propose a new repository to house this code that publishes a new Python library
+called `instructlab-eval`. The reasoning for a new repository and library includes:
+
+- We expect multiple consumers of this code. The `ilab` CLI is one, but we also envision
+building a REST API around it to help support scaling out this functionality on a cluster.
+- We expect there is broader community interest in an open-source library and service for
+evaluation. We envision this library could support other evaluation techniques over time.
+
+## Maintainers
+
+The initial team of maintainers for this repository will be a copy of the
+`Backend Maintainers` GitHub team.
+
+## Alternatives Considered
+
+### Add to `instructlab/instructlab`
+
+We could add this code to the existing `instructlab/instructlab` repository.
+
+The primary argument against this approach is that we expect the scope of an
+`instructlab-eval` library to expand beyond the scope of what would be run by the
+`ilab` CLI. We instead envision a different community of contributors organizing
+around Evaluation specifically.
diff --git a/docs/sdg-repo.md → docs/backend/sdg-repo.md b/docs/sdg-repo.md → docs/backend/sdg-repo.md