diff --git a/README.md b/README.md index db6c78dfe..dd233d466 100644 --- a/README.md +++ b/README.md @@ -36,6 +36,12 @@ Lorax is a framework that allows users to serve over a hundred fine-tuned models - ✅ **Production Readiness** reliably stable, Lorax supports Prometheus metrics and distributed tracing with Open Telemetry - 🤯 **Free Commercial Use:** Apache 2.0 License. Enough said 😎. + +
+ +
+ + ## 🏠 Optimized architectures - 🦙 [Llama V2](https://huggingface.co/meta-llama) @@ -56,7 +62,7 @@ or The easiest way of getting started is using the official Docker container: ```shell -model=mistralai/Mistral-7B-v0.1 +model=mistralai/Mistral-7B-Instruct-v0.1 volume=$PWD/data # share a volume with the Docker container to avoid downloading weights every run docker run --gpus all --shm-size 1g -p 8080:80 -v $volume:/data ghcr.io/huggingface/lorax-inference:0.9.4 --model-id $model @@ -73,14 +79,14 @@ You can then query the model using either the `/generate` or `/generate_stream` ```shell curl 127.0.0.1:8080/generate \ -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"adapter_id":"some/adapter"}}' \ + -d '{"inputs": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", "parameters": {"adapter_id": "vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k"}}' \ -H 'Content-Type: application/json' ``` ```shell curl 127.0.0.1:8080/generate_stream \ -X POST \ - -d '{"inputs":"What is Deep Learning?","parameters":{"adapter_id":"some/adapter"}}' \ + -d '{"inputs": "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?", "parameters": {"adapter_id": "vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k"}}' \ -H 'Content-Type: application/json' ``` @@ -94,10 +100,12 @@ pip install lorax-client from lorax import Client client = Client("http://127.0.0.1:8080") -print(client.generate("What is Deep Learning?", adapter_id="some/adapter").generated_text) +prompt = "Natalia sold clips to 48 of her friends in April, and then she sold half as many clips in May. How many clips did Natalia sell altogether in April and May?" + +print(client.generate(prompt, adapter_id="vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k").generated_text) text = "" -for response in client.generate_stream("What is Deep Learning?", adapter_id="some/adapter"): +for response in client.generate_stream(prompt, adapter_id="vineetsharma/qlora-adapter-Mistral-7B-Instruct-v0.1-gsm8k"): if not response.token.special: text += response.token.text print(text) @@ -109,4 +117,4 @@ You can consult the OpenAPI documentation of the `lorax` REST API using the `/do ### 🛠️ Local install -MAGDY AND WAEL TODO \ No newline at end of file +MAGDY AND WAEL TODO