From 9a5a578b686473a42ddc41af5fbb6fd5b09da948 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Lucas=20Guimar=C3=A3es?= <lukahbg@gmail.com>
Date: Wed, 19 Jun 2024 05:32:01 -0300
Subject: [PATCH] Fix typo at README.md (#1621)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Fixed typos and improved some phrases.

Signed-off-by: Lucas Guimarães <lukahbg@gmail.com>
---
 workflows/chatbot/inference/README.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/workflows/chatbot/inference/README.md b/workflows/chatbot/inference/README.md
index ba5da39484a..1d9598953a1 100644
--- a/workflows/chatbot/inference/README.md
+++ b/workflows/chatbot/inference/README.md
@@ -49,7 +49,7 @@ numactl -m <node N> -C <cpu list> python generate.py \
 To enable FP32 inference, you can add the parameter `--dtype "float32"`. To check the statistical information of inference, you can add the parameter `--return_stats`.
 
 ## LLama2 INT8 Inference
-[Llama2](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) int8 inference demonstrates in [int8_llama2](https://github.com/intel/intel-extension-for-transformers/tree/int8_llama2/workflows/chatbot/inference) branch and need install Intel-extension-for-pytorch [llm_feature_branch](https://github.com/intel/intel-extension-for-pytorch/tree/llm_feature_branch) branch. Please follow the [README.md](https://github.com/intel/intel-extension-for-transformers/blob/81a4484dcc93f09d7609e6896fe3fbc22756975b/workflows/chatbot/inference/README.md) to setup the environments and make quantization.
+[Llama2](https://huggingface.co/meta-llama/Llama-2-7b-chat-hf) int8 inference demonstrates in [int8_llama2](https://github.com/intel/intel-extension-for-transformers/tree/int8_llama2/workflows/chatbot/inference) branch and need install Intel-extension-for-pytorch [llm_feature_branch](https://github.com/intel/intel-extension-for-pytorch/tree/llm_feature_branch) branch. Please follow the [README.md](https://github.com/intel/intel-extension-for-transformers/blob/81a4484dcc93f09d7609e6896fe3fbc22756975b/workflows/chatbot/inference/README.md) to set up the environments and make quantization.
 
 # Inference on Habana Gaudi
 
@@ -107,7 +107,7 @@ python ../utils/gaudi_spawn.py --use_deepspeed --world_size 8 generate.py \
 
 Habana supports HPU graph mode for inference speedup, which is available for bloom, gpt2, opt, gptj, gpt_neox, mpt, llama. You can use the parameter `use_hpu_graphs` to speed up the inference.
 
-you can use '--peft_model_path' to apply you peft finetuned output model during generation.
+you can use '--peft_model_path' to apply your peft finetuned output model during generation.
 
 ```bash
 python ../utils/gaudi_spawn.py --use_deepspeed --world_size 8 generate.py \
@@ -122,7 +122,7 @@ python ../utils/gaudi_spawn.py --use_deepspeed --world_size 8 generate.py \
 
 # Additional Notes
 
-Here are the explanations of parameters in generate.py:
+Here are the explanations of the parameters in generate.py:
 `--temperature`: Controls the diversity of generated text. Lower values result in more deterministic outputs. The default value is 0.1.
 `--top_p`: During text generation, only consider tokens with cumulative probability up to this value. This parameter helps to avoid extremely low probability tokens. The default value is 0.75.
 `--top_k`: The number of highest probability vocabulary tokens to consider for each step of text generation. The default value is 40.