Skip to content

Commit

Permalink
server : fix temperature + disable some tests (ggerganov#7409)
Browse files Browse the repository at this point in the history
* server : fix temperature

* server : disable tests relying on parallel determinism

* ci : change server Debug -> RelWithDebInfo
  • Loading branch information
ggerganov authored May 20, 2024
1 parent 6bf9b66 commit 3bc10cb
Show file tree
Hide file tree
Showing 2 changed files with 9 additions and 15 deletions.
7 changes: 1 addition & 6 deletions .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,13 +33,10 @@ jobs:
strategy:
matrix:
sanitizer: [ADDRESS, THREAD, UNDEFINED]
build_type: [Debug]
build_type: [RelWithDebInfo]
include:
- build_type: Release
sanitizer: ""
- build_type: Debug
sanitizer: THREAD
disabled_on_pr: true
fail-fast: false # While -DLLAMA_SANITIZE_THREAD=ON is broken

steps:
Expand Down Expand Up @@ -103,10 +100,8 @@ jobs:
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON ;
cmake --build build --config ${{ matrix.build_type }} -j $(nproc) --target server
- name: Tests
id: server_integration_tests
if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}
run: |
cd examples/server/tests
PORT=8888 ./tests.sh
Expand Down
17 changes: 8 additions & 9 deletions examples/server/tests/features/results.feature
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Feature: Results

Scenario Outline: consistent results with same seed
Given <n_slots> slots
And 0.0 temperature
And 1.0 temperature
Then the server is starting
Then the server is healthy

Expand All @@ -27,7 +27,8 @@ Feature: Results
Examples:
| n_slots |
| 1 |
| 2 |
# FIXME: unified KV cache nondeterminism
# | 2 |

Scenario Outline: different results with different seed
Given <n_slots> slots
Expand Down Expand Up @@ -73,14 +74,13 @@ Feature: Results
Examples:
| n_parallel | temp |
| 1 | 0.0 |
| 2 | 0.0 |
| 4 | 0.0 |
| 1 | 1.0 |
# FIXME: These tests fail on master.
# Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
# FIXME: unified KV cache nondeterminism
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
# | 2 | 0.0 |
# | 4 | 0.0 |
# | 2 | 1.0 |
# | 4 | 1.0 |

Expand Down Expand Up @@ -108,12 +108,11 @@ Feature: Results
Examples:
| n_slots | n_kv | n_predict | n_parallel |
| 4 | 1024 | 1 | 1 |
| 4 | 1024 | 1 | 4 |
# FIXME: These tests fail on master.
# Problems: unified KV cache (except for CPU backend with LLAMA_NO_LLAMAFILE=1), SIMD nondeterminism.
# FIXME: unified KV cache nondeterminism
# See https://github.com/ggerganov/whisper.cpp/issues/1941#issuecomment-1986923227
# and https://github.com/ggerganov/llama.cpp/pull/6122#discussion_r1531405574
# and https://github.com/ggerganov/llama.cpp/pull/7347 .
# | 4 | 1024 | 1 | 4 |
# | 4 | 1024 | 100 | 1 |
# This test still fails even the above patches; the first token probabilities are already different.
# | 4 | 1024 | 100 | 4 |

0 comments on commit 3bc10cb

Please sign in to comment.