Skip to content

Commit

Permalink
[sharktank] Evaluation - Add Perplexity test (#233)
Browse files Browse the repository at this point in the history
Add Perplexity test for LLM evaluation
  • Loading branch information
archana-ramalingam authored Oct 16, 2024
1 parent 1430182 commit e30d0af
Show file tree
Hide file tree
Showing 9 changed files with 1,279 additions and 13 deletions.
60 changes: 60 additions & 0 deletions .github/workflows/eval_test.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
name: Evaluation Tests

on:
workflow_dispatch:
schedule:
# Weekdays nightly at 07:00 UTC = 23:00 PST / 00:00 PDT.
- cron: "0 7 * * 1-5"

concurrency:
# A PR number if a pull request and otherwise the commit hash. This cancels
# queued and in-progress runs for the same PR (presubmit) or commit
# (postsubmit). The workflow name is prepended to avoid conflicts between
# different workflows.
group: ${{ github.workflow }}-${{ github.event.number || github.sha }}
cancel-in-progress: true

jobs:
test_perplexity:
name: "Evaluation Tests - perplexity"
strategy:
matrix:
version: [3.11]
os: [ubuntu-latest, windows-latest]
fail-fast: false
runs-on: ${{matrix.os}}
defaults:
run:
shell: bash
env:
PIP_CACHE_DIR: "${{ github.workspace }}/.pip-cache"
steps:
- name: "Setting up Python"
id: setup_python
uses: actions/setup-python@v3
with:
python-version: ${{matrix.version}}

- name: "Checkout Code"
uses: actions/checkout@v3

- name: Cache Pip Packages
uses: actions/cache@v4
id: cache-pip
with:
path: ${{ env.PIP_CACHE_DIR }}
key: pip-${{ steps.setup_python.outputs.python-version }}-${{ hashFiles('*requirements.txt') }}

- name: Install pip deps
run: |
python -m pip install --no-compile --upgrade pip
# Note: We install in three steps in order to satisfy requirements
# from non default locations first. Installing the PyTorch CPU
# wheels saves multiple minutes and a lot of bandwidth on runner setup.
pip install --no-compile -r pytorch-cpu-requirements.txt
pip install --no-compile -f https://iree.dev/pip-release-links.html --src deps \
-e "git+https://github.com/iree-org/iree-turbine.git#egg=iree-turbine"
pip install --no-compile -r requirements.txt -e sharktank/ shortfin/
- name: Run perplexity test
run: pytest sharktank/tests/evaluate/perplexity_test.py
10 changes: 10 additions & 0 deletions docs/model_cookbook.md
Original file line number Diff line number Diff line change
Expand Up @@ -257,6 +257,16 @@ iree-run-module \
--parameters=model=/tmp/open_llama_3b_v2/open-llama-3b-v2-f16.gguf
```

## Evaluation pipeline

Run perplexity test:

```bash
python -m sharktank.evaluate.perplexity \
--gguf-file=llama8b_f16.gguf \
--tokenizer-config-json=tokenizer_config.json
```

## Generating data for llama models

```bash
Expand Down
1 change: 1 addition & 0 deletions requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ onnx==1.15.0
huggingface-hub==0.22.2
transformers==4.40.0
sentencepiece==0.2.0
datasets==3.0.0

# It is expected that you have installed a PyTorch version/variant specific
# to your needs, so we only include a minimum version spec.
Expand Down
12 changes: 12 additions & 0 deletions sharktank/sharktank/evaluate/data/eval_prompts.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
Robert Boulter is an English film, television and theatre actor.
Robert Boulter had a guest-starring role on the television series "The Bill" in 2000.
Du Fu was a prominent Chinese poet of the Tang dynasty.
Along with Li Bai (Li Po), Du Fu is frequently called the greatest of the Chinese poets.
The Ise-class battleships were a pair of dreadnought battleships built for the Imperial Japanese Navy (IJN) during World War I.
Originally intended to be repeats of the preceding Fusō class, the Ise-class battleships were redesigned before construction began. Both ships carried supplies for the survivors of the Great Kantō earthquake in 1923.
They were modernized in 1934-37 with improvements to their armour and machinery and a rebuilt superstructure in the pagoda mast style. Afterwards they played a minor role in the Second Sino-Japanese War.
Richard Gale "Dick" Rifenburg (August 21, 1926-December 5, 1994) was an American football player and a pioneering television broadcaster for the forerunner to WIVB-TV in Buffalo.
Rifenburg played college football for the University of Michigan Wolverines in 1944 and from 1946 to 1948. He was a consensus selection at end on the 1948 College Football All-America Team.
Rifenburg played professionally in the National Football League (NFL) with the Detroit Lions for one season in 1950. After retiring from football he settled in Buffalo and became a sports broadcaster.
An oxaziridine is an organic molecule that features a three-membered heterocycle containing oxygen, nitrogen, and carbon. In their largest application, oxazidines are intermediates in the industrial production of hydrazine.
Oxaziridine derivatives are also used as specialized reagents in organic chemistry for a variety of oxidations, including alpha hydroxylation of enolates, epoxidation and aziridination of olefins, and other heteroatom transfer reactions.
Loading

0 comments on commit e30d0af

Please sign in to comment.