Skip to content

Commit

Permalink
update README, docker-compose.yaml and change performance graph colours
Browse files Browse the repository at this point in the history
  • Loading branch information
StefanBogdan committed Nov 11, 2024
1 parent 57750d5 commit 521e7da
Show file tree
Hide file tree
Showing 3 changed files with 18 additions and 30 deletions.
34 changes: 11 additions & 23 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ This repo contains a tool for benchmarking Weaviate performance.
* 📊 results and context can be found in the [Weaviate documentation](https://weaviate.io/developers/weaviate/current/benchmarks/)
* 💬 discuss the results on our [Slack channel](https://join.slack.com/t/weaviate/shared_invite/zt-goaoifjr-o8FuVz9b1HLzhlUfyfddhw) or [Twitter](https://twitter.com/weaviate_io)

### ANN benchmark
## ANN benchmark

There are two components you will need to run for the benchmarks:

Expand All @@ -16,42 +16,30 @@ There are two components you will need to run for the benchmarks:

You can run both as containers on the same machine via Docker compose.

For replicating our benchmarks we recommend setting up two separate machines.
For replicating our benchmarks we recommend setting the following machine:

| Machine description | CPU type | CPUs | Memory | Disk size | Disk type | Misc. |
| Machine name | CPU type | CPUs | Memory | Disk size | Disk type | Misc. |
| --- | --- | --- | --- | --- | --- | --- |
| Machine to run Weaviate | c2 | 30 | 120GB | 500GB | SSD | [Ubuntu 22.04 with Docker-compose](https://gist.github.com/bobvanluijt/04f6d97916244a7de59fead84ef63cd4) |
| Machine to run benchmark script | N2 | 8 | 64GB | 500GB | SSD | [Ubuntu 22.04 with Docker-compose](https://gist.github.com/bobvanluijt/04f6d97916244a7de59fead84ef63cd4) |
| `n4-highmem-16` | N4 | 16 | 128GB | 512GB | Hyperdisk Balanced | Debian 12 (bookworm) with [Docker and Compose V2](https://gist.github.com/StefanBogdan/821d18bbc5f18978643adff508749cf0) |

#### Prepare the Weaviate machine
### Run tests

Clone this repo and cd into it `$ git clone https://github.com/semi-technologies/weaviate-benchmarking && cd weaviate-benchmarking`

Run the following command to spin up Weaviate: `$ docker-compose up weaviate -d`

Copy the IP address and amount of CPU cores this machine has.

#### Prepare the benchmark machine

Check if the Weaviate machine is available: `curl http://{IP OF WEAVIATE INSTANCE}/v1/meta`. Note that the instance runs on port `8080`, e.g., `http://10.128.15.12:8080/v1/meta`.
You will also need to allow port `50051` for gRPC and `21121` for metrics. You can verify this via `nc -zv localhost 50051`.

Clone this repo and cd into it `$ git clone https://github.com/semi-technologies/weaviate-benchmarking && cd weaviate-benchmarking`
Clone this repo and cd into it `$ git clone https://github.com/weaviate/weaviate-benchmarking && cd weaviate-benchmarking`

Download the files into a datasets folder as outlined below.

```sh
mkdir datasets && \
curl -o ./datasets/dbpedia-100k-openai-ada002-angular.hdf5 https://storage.googleapis.com/ann-datasets/custom/dbpedia-100k-openai-ada002-angular.hdf5 \
curl -o ./datasets/deep-image-96-angular.hdf5 https://ann-benchmarks.com/deep-image-96-angular.hdf5 && \
curl -o ./datasets/mnist-784-euclidean.hdf5 https://ann-benchmarks.com/mnist-784-euclidean.hdf5 && \
curl -o ./datasets/gist-960-euclidean.hdf5 https://ann-benchmarks.com/gist-960-euclidean.hdf5
curl -o ./datasets/dbpedia-openai-1000k-angular.hdf5 https://storage.googleapis.com/ann-datasets/ann-benchmarks/dbpedia-openai-1000k-angular.hdf5 && \
curl -o ./datasets/snowflake-msmarco-arctic-embed-m-v1.5-angular.hdf5 https://storage.googleapis.com/ann-datasets/custom/snowflake-msmarco-arctic-embed-m-v1.5-angular.hdf5 && \
curl -o ./datasets/sift-128-euclidean.hdf5 http://ann-benchmarks.com/sift-128-euclidean.hdf5 && \
curl -o ./datasets/sphere-10M-meta-dpr.hdf5 https://storage.googleapis.com/ann-datasets/custom/sphere-10M-meta-dpr.hdf5
```

Run a single performance test on an [ann-benchmarks](https://ann-benchmarks.com/) hdf5 dataset.

```sh
DATASET=./datasets/dbpedia-100k-openai-ada002-angular.hdf5 DISTANCE=cosine docker compose run benchmarker
DATASET=./datasets/dbpedia-openai-1000k-angular.hdf5 DISTANCE=cosine docker compose up --abort-on-container-exit
```

For more details on additional configuration options see the help options.
Expand Down
6 changes: 2 additions & 4 deletions benchmarker/scripts/python/performance-graphs.py
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,6 @@
import glob
import json
import argparse
import seaborn as sns
import matplotlib.ticker as tkr
import matplotlib.pyplot as plt
import pandas as pd

Expand Down Expand Up @@ -40,8 +38,8 @@ def create_plot(results_df: pd.DataFrame, mode='light'):

# Set custom colors for limits
color_map = {
100: '#098f73',
10: '#2b17e7'
100: '#61bd73',
10: '#fc3988'
}

# Configure plot style based on mode
Expand Down
8 changes: 5 additions & 3 deletions docker-compose.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,8 @@ services:
dockerfile: Dockerfile
command: >
/app/benchmarker ann-benchmark
-v ${DATASET:-./datasets/dbpedia-100k-openai-ada002-angular.hdf5 }
-d ${DISTANCE:-cosine}
--vectors ${DATASET:-./datasets/dbpedia-100k-openai-ada002-angular.hdf5 }
--distance ${DISTANCE:-cosine}
--grpcOrigin ${GRPC_ORIGIN:-weaviate:50051}
--httpOrigin ${HTTP_ORIGIN:-weaviate:8080}
volumes:
Expand All @@ -22,12 +22,14 @@ services:
- '8080'
- --scheme
- http
image: docker.io/semitechnologies/weaviate:1.25.8
image: docker.io/semitechnologies/weaviate:1.27.1
ports:
- 8080:8080
- 50051:50051
- 2112:2112
restart: on-failure:0
volumes:
- "$PWD/weaviate-data:/var/lib/weaviate"
environment:
QUERY_DEFAULTS_LIMIT: 25
AUTHENTICATION_ANONYMOUS_ACCESS_ENABLED: 'true'
Expand Down

0 comments on commit 521e7da

Please sign in to comment.