Skip to content

Commit

Permalink
Proposal for new Evaluation repo
Browse files Browse the repository at this point in the history
Signed-off-by: Nathan Weinberg <[email protected]>
  • Loading branch information
nathan-weinberg committed Jun 10, 2024
1 parent 340e9ac commit 88ed642
Show file tree
Hide file tree
Showing 3 changed files with 50 additions and 3 deletions.
9 changes: 6 additions & 3 deletions .spellcheck-en-custom.txt
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,7 @@ Dropdown
env
EP
Eval
eval
Excalidraw
exfiltrate
exfiltrating
Expand All @@ -52,6 +53,7 @@ Inferencing
instructlab
ISA
JIT
JSON
Jupyter
KAGGLE
Kaggle
Expand All @@ -63,19 +65,20 @@ LLM
llms
LLVM
lora
md
Markdownlint
md
Mergify
Merlinite
mimimum
Miniforge
Mixtral
MLX
mlx
MMLU
NVidia
Nvidia
ollama
Ollama
ollama
orchestrator
ots
Pareja
Expand Down Expand Up @@ -104,12 +107,12 @@ RX
safetensors
Salawu
SDG
Sigstore
sdg
sexualized
SHA
Shivchander
Signoff
Sigstore
Srivastava
subdirectory
Sudalairaj
Expand Down
44 changes: 44 additions & 0 deletions docs/backend/eval-repo.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
# New Repository Proposal: eval

## Summary

This document proposes a new repository under the `instructlab` GitHub organization:

- `instructlab/eval`

## Background

The `instructlab/instructlab` repository currently includes no real implementation
of Evaluation as described by the [LAB paper](https://arxiv.org/abs/2403.01081). The
closest implementation currently in `instructlab/instructlab` via the `ilab test` command.

`ilab test` as of this writing is only implemented for macOS with M-series chips. It uses
a JSON Lines file and a LoRA adapter to compare output of a given model before and after
LoRA training with MLX, thus the macOS M-series dependency.

We desire to build out an implementation closer to the described evaluation in the paper,
using more high-level evaluation schemes such as
[Massive Multitask Language Understanding](https://arxiv.org/abs/2009.03300) (MMLU).
We propose a new repository to house this code that publishes a new Python library
called `instructlab-eval`. The reasoning for a new repository and library includes:

- We expect multiple consumers of this code. The `ilab` CLI is one, but we also envision
building a REST API around it to help support scaling out this functionality on a cluster.
- We expect there is broader community interest in an open-source library and service for
evaluation. We envision this library could support other evaluation techniques over time.

## Maintainers

The initial team of maintainers for this repository will be a copy of the
`Backend Maintainers` GitHub team.

## Alternatives Considered

### Add to `instructlab/instructlab`

We could add this code to the existing `instructlab/instructlab` repository.

The primary argument against this approach is that we expect the scope of an
`instructlab-eval` library to expand beyond the scope of what would be run by the
`ilab` CLI. We instead envision a different community of contributors organizing
around Evaluation specifically.
File renamed without changes.

0 comments on commit 88ed642

Please sign in to comment.