Format code (#20)

* rename func Signed-off-by: Sun, Xuehao <[email protected]> * update version Signed-off-by: Sun, Xuehao <[email protected]> * format code Signed-off-by: Sun, Xuehao <[email protected]> --------- Signed-off-by: Sun, Xuehao <[email protected]>
opea-project · Mar 28, 2024 · 8e4fefa · 8e4fefa
1 parent b33fea6
commit 8e4fefa
Show file tree

Hide file tree

Showing 102 changed files with 24,438 additions and 11,055 deletions.
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -85,7 +85,7 @@ repos:
           ]
 
   - repo: https://github.com/pre-commit/mirrors-prettier
-    rev: "v3.1.0" # Use the sha / tag you want to point at
+    rev: "v4.0.0-alpha.8" # Use the sha / tag you want to point at
     hooks:
       - id: prettier
         args: [--print-width=120]

diff --git a/CODE_OF_CONDUCT.md b/CODE_OF_CONDUCT.md
@@ -17,23 +17,23 @@ diverse, inclusive, and healthy community.
 Examples of behavior that contributes to a positive environment for our
 community include:
 
-* Demonstrating empathy and kindness toward other people
-* Being respectful of differing opinions, viewpoints, and experiences
-* Giving and gracefully accepting constructive feedback
-* Accepting responsibility and apologizing to those affected by our mistakes,
+- Demonstrating empathy and kindness toward other people
+- Being respectful of differing opinions, viewpoints, and experiences
+- Giving and gracefully accepting constructive feedback
+- Accepting responsibility and apologizing to those affected by our mistakes,
   and learning from the experience
-* Focusing on what is best not just for us as individuals, but for the overall
+- Focusing on what is best not just for us as individuals, but for the overall
   community
 
 Examples of unacceptable behavior include:
 
-* The use of sexualized language or imagery, and sexual attention or advances of
+- The use of sexualized language or imagery, and sexual attention or advances of
   any kind
-* Trolling, insulting or derogatory comments, and personal or political attacks
-* Public or private harassment
-* Publishing others' private information, such as a physical or email address,
+- Trolling, insulting or derogatory comments, and personal or political attacks
+- Public or private harassment
+- Publishing others' private information, such as a physical or email address,
   without their explicit permission
-* Other conduct which could reasonably be considered inappropriate in a
+- Other conduct which could reasonably be considered inappropriate in a
   professional setting
 
 ## Enforcement Responsibilities

diff --git a/ChatQnA/README.md b/ChatQnA/README.md
@@ -1,6 +1,7 @@
 This ChatQnA use case performs RAG using LangChain, Redis vectordb and Text Generation Inference on Intel Gaudi2. The Intel Gaudi2 accelerator supports both training and inference for deep learning models in particular for LLMs. Please visit [Habana AI products](https://habana.ai/products) for more details.
 
 # Environment Setup
+
 To use [🤗 text-generation-inference](https://github.com/huggingface/text-generation-inference) on Habana Gaudi/Gaudi2, please follow these steps:
 
 ## Prepare Docker
@@ -20,36 +21,41 @@ bash ./serving/tgi_gaudi/build_docker.sh
 ## Launch TGI Gaudi Service
 
 ### Launch a local server instance on 1 Gaudi card:
+
 ```bash
 bash ./serving/tgi_gaudi/launch_tgi_service.sh
 ```
 
 For gated models such as `LLAMA-2`, you will have to pass -e HUGGING_FACE_HUB_TOKEN=\<token\> to the docker run command above with a valid Hugging Face Hub read token.
 
-Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token ans export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
+Please follow this link [huggingface token](https://huggingface.co/docs/hub/security-tokens) to get the access token and export `HUGGINGFACEHUB_API_TOKEN` environment with the token.
 
 ```bash
 export HUGGINGFACEHUB_API_TOKEN=<token>
 ```
 
 ### Launch a local server instance on 8 Gaudi cards:
+
 ```bash
 bash ./serving/tgi_gaudi/launch_tgi_service.sh 8
 ```
 
 ### Customize TGI Gaudi Service
 
 The ./serving/tgi_gaudi/launch_tgi_service.sh script accepts three parameters:
+
 - num_cards: The number of Gaudi cards to be utilized, ranging from 1 to 8. The default is set to 1.
 - port_number: The port number assigned to the TGI Gaudi endpoint, with the default being 8080.
 - model_name: The model name utilized for LLM, with the default set to "Intel/neural-chat-7b-v3-3".
 
 You have the flexibility to customize these parameters according to your specific needs. Additionally, you can set the TGI Gaudi endpoint by exporting the environment variable `TGI_LLM_ENDPOINT`:
+
 ```bash
 export TGI_LLM_ENDPOINT="http://xxx.xxx.xxx.xxx:8080"
 ```
 
 ## Launch Redis
+
 ```bash
 docker compose -f langchain/docker/docker-compose-redis.yml up -d
 ```
@@ -93,6 +99,7 @@ export SAFETY_GUARD_ENDPOINT="http://xxx.xxx.xxx.xxx:8088"
 ```
 
 ## Start the Backend Service
+
 Make sure TGI-Gaudi service is running and also make sure data is populated into Redis. Launch the backend service:
 
 ```bash
@@ -102,7 +109,8 @@ nohup python app/server.py &
 
 ## Start the Frontend Service
 
-Navigate to the "ui" folder and execute the following commands to start the fronend GUI:
+Navigate to the "ui" folder and execute the following commands to start the frontend GUI:
+
 ```bash
 cd ui
 sudo apt-get install npm && \
@@ -122,19 +130,21 @@ sudo yum install -y nodejs
 Update the `DOC_BASE_URL` environment variable in the `.env` file by replacing the IP address '127.0.0.1' with the actual IP address.
 
 Run the following command to install the required dependencies:
+
 ```bash
 npm install
 ```
 
 Start the development server by executing the following command:
+
 ```bash
 nohup npm run dev &
 ```
 
 This will initiate the frontend service and launch the application.
 
-
 # Enable TGI Gaudi FP8 for higher throughput (Optional)
+
 The TGI Gaudi utilizes BFLOAT16 optimization as the default setting. If you aim to achieve higher throughput, you can enable FP8 quantization on the TGI Gaudi. According to our test results, FP8 quantization yields approximately a 1.8x performance gain compared to BFLOAT16. Please follow the below steps to enable FP8 quantization.
 
 ## Prepare Metadata for FP8 Quantization
@@ -156,13 +166,13 @@ After finishing the above commands, the quantization metadata will be generated.
 docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/hqt_output data/
 docker cp 262e04bbe466:/usr/src/optimum-habana/examples/text-generation/quantization_config/maxabs_quant.json data/
 ```
-Then modify the `dump_stats_path` to "/data/hqt_output/measure" and update `dump_stats_xlsx_path` to /data/hqt_output/measure/fp8stats.xlsx" in maxabs_quant.json file.
 
+Then modify the `dump_stats_path` to "/data/hqt_output/measure" and update `dump_stats_xlsx_path` to /data/hqt_output/measure/fp8stats.xlsx" in maxabs_quant.json file.
 
 ## Restart the TGI Gaudi server within all the metadata mapped
 
 ```bash
 docker run -p 8080:80 -e QUANT_CONFIG=/data/maxabs_quant.json -v $volume:/data --runtime=habana -e HABANA_VISIBLE_DEVICES=all -e OMPI_MCA_btl_vader_single_copy_mechanism=none --cap-add=sys_nice --ipc=host ghcr.io/huggingface/tgi-gaudi:1.2.1 --model-id Intel/neural-chat-7b-v3-3
 ```
 
-Now the TGI Gaudi will launch the FP8 model by default. Please note that currently only Llama2 series and Mistral series models support FP8 quantization.
+Now the TGI Gaudi will launch the FP8 model by default. Please note that currently only Llama2 series and Mistral series models support FP8 quantization.
diff --git a/ChatQnA/benchmarking/README.md b/ChatQnA/benchmarking/README.md
@@ -1 +1 @@
-Will update soon.
+Will update soon.
diff --git a/ChatQnA/benchmarking/client.py b/ChatQnA/benchmarking/client.py
@@ -15,37 +15,49 @@
 # See the License for the specific language governing permissions and
 # limitations under the License.
 
-import requests
-import json
 import argparse
 import concurrent.futures
+import json
 import random
 
+import requests
+
+
 def extract_qText(json_data):
     try:
-        file = open('devtest.json')
+        file = open("devtest.json")
         data = json.load(file)
         json_data = json.loads(json_data)
         json_data["inputs"] = data[random.randint(0, len(data) - 1)]["qText"]
         return json.dumps(json_data)
     except (json.JSONDecodeError, KeyError, IndexError):
         return None
 
+
 def send_request(url, json_data):
-    headers = {'Content-Type': 'application/json'}
+    headers = {"Content-Type": "application/json"}
     response = requests.post(url, data=json_data, headers=headers)
     print(f"Question: {json_data} Response: {response.status_code} - {response.text}")
 
+
 def main(url, json_data, concurrency):
     with concurrent.futures.ThreadPoolExecutor(max_workers=concurrency) as executor:
-        future_to_url = {executor.submit(send_request, url, extract_qText(json_data)): url for _ in range(concurrency*2)}
+        future_to_url = {
+            executor.submit(send_request, url, extract_qText(json_data)): url for _ in range(concurrency * 2)
+        }
         for future in concurrent.futures.as_completed(future_to_url):
             _ = future_to_url[future]
 
+
 if __name__ == "__main__":
     parser = argparse.ArgumentParser(description="Concurrent client to send POST requests")
     parser.add_argument("--url", type=str, default="http://localhost:12345", help="URL to send requests to")
-    parser.add_argument("--json_data", type=str, default='{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"do_sample": true}}', help="JSON data to send")
+    parser.add_argument(
+        "--json_data",
+        type=str,
+        default='{"inputs":"Which NFL team won the Super Bowl in the 2010 season?","parameters":{"do_sample": true}}',
+        help="JSON data to send",
+    )
     parser.add_argument("--concurrency", type=int, default=100, help="Concurrency level")
     args = parser.parse_args()
     main(args.url, args.json_data, args.concurrency)