diff --git a/README.md b/README.md index 2c85f3b47..78291df4c 100644 --- a/README.md +++ b/README.md @@ -99,8 +99,8 @@ In summary: | High Level APIs | Sample use | Arguments | |-----------------|------------|-------------------| -| QEfficient.cloud.infer | [click here](#1-use-qefficientcloudinfer) |
  • model_name : $\color{green} {Mandatory}$
  • num_cores : $\color{green} {Mandatory}$
  • device_group : $\color{green} {Mandatory}$
  • batch_size : Optional [Default-1]
  • prompt_len : Optional [Default-32]
  • ctx_len : Optional [Default-128]
  • mxfp6 : Optional
  • hf_token : Optional
  • cache_dir : Optional ["cache_dir" in current working directory]
  • prompt : Optinoal [Default-"My name is"]
  • | -| QEfficient.cloud.execute | [click here](#2-use-of-qefficientcloudexcute) |
  • model_name : $\color{green} {Mandatory}$
  • device_group : $\color{green} {Mandatory}$
  • qpc_path : $\color{green} {Mandatory}$
  • prompt : Optional [Default-"My name is"]
  • cache_dir : Optional ["cache_dir" in current working directory]
  • hf_token : Optional
  • | +| QEfficient.cloud.infer | [click here](#1-use-qefficientcloudinfer) |
  • model_name : $\color{green} {Mandatory}$
  • num_cores : $\color{green} {Mandatory}$
  • device_group : $\color{green} {Mandatory}$
  • batch_size : Optional [Default-1]
  • prompt_len : Optional [Default-32]
  • ctx_len : Optional [Default-128]
  • mxfp6 : Optional
  • hf_token : Optional
  • cache_dir : Optional ["cache_dir" in current working directory]
  • prompt : Optional
  • prompts_txt_file_path : Optional
  • *only one argument, prompt or prompts_txt_file_path should be passed*
  • | +| QEfficient.cloud.execute | [click here](#2-use-of-qefficientcloudexcute) |
  • model_name : $\color{green} {Mandatory}$
  • device_group : $\color{green} {Mandatory}$
  • qpc_path : $\color{green} {Mandatory}$
  • cache_dir : Optional ["cache_dir" in current working directory]
  • hf_token : Optional
  • prompt : Optional
  • prompts_txt_file_path : Optional
  • *only one argument, prompt or prompts_txt_file_path should be passed*
  • | ### 1. Use QEfficient.cloud.infer @@ -116,7 +116,7 @@ This is the single e2e python api in the library, which takes model_card name as python -m QEfficient.cloud.infer --help python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16 --device_group [0] --prompt "My name is" --mos 1 --aic_enable_depth_first -# If executing for batch size>1, pass path of txt file with input prompts, Example below +# If executing for batch size>1, pass path of txt file with input prompts, Example below, sample txt file(prompts.txt) is present in examples folder . python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 3 --prompt_len 32 --ctx_len 128 --num_cores 16 --device_group [0] --prompts_txt_file_path examples/prompts.txt --mxfp6 --mos 1 --aic_enable_depth_first ``` @@ -128,7 +128,7 @@ Once we have compiled the QPC, we can now use the precompiled QPC in execute API python -m QEfficient.cloud.execute --model_name gpt2 --qpc_path qeff_models/gpt2/qpc_16cores_1BS_32PL_128CL_1devices_mxfp6/qpcs/ --prompt "Once upon a time in" --device_group [0] ``` -We can also enable MQ, just based on the number of devices. Based on the "--device_group" as input it will create TS config on the fly. If "--device-group [0,1]" it will create TS config for 2 devices and use it for compilation, if "--device-group 0" then TS compilation is skipped and single soc execution is enabled. +We can also enable MQ, just based on the number of devices. Based on the "--device-group" as input it will create TS config on the fly. If "--device-group [0,1]" it will create TS config for 2 devices and use it for compilation, if "--device-group 0" then TS compilation is skipped and single soc execution is enabled. ```bash python -m QEfficient.cloud.infer --model_name Salesforce/codegen-2B-mono --batch_size 1 --prompt_len 32 --ctx_len 128 --mxfp6 --num_cores 16 --device-group [0,1] --prompt "def fibonacci(n):" --mos 2 --aic_enable_depth_first @@ -145,7 +145,7 @@ python -m QEfficient.cloud.infer --model_name gpt2 --batch_size 1 --prompt_len 3 | High Level APIs | Single SoC | Tensor Slicing | |-----------------|------------|-------------------| -| QEfficient.cloud.infer | python -m QEfficient.cloud.infer --model_name $\color{green} {model}$ --batch_size 8 --prompt_len 128 --ctx_len 1024 --num_cores 16 --device-group [0] --prompt "My name is" --mxfp6 --hf_token $\color{green}{xyz}$ --mos 1 --aic_enable_depth_first | python -m QEfficient.cloud.infer --model_name $\color{green}{model}$ --batch_size 8 --prompt_len 128 --ctx_len 1024--num_cores 16 --device-group [0,1,2,3] --prompt "My name is" --mxfp6 --hf_token $\color{green}{xyz}$ --mos 4 --aic_enable_depth_first | +| QEfficient.cloud.infer | python -m QEfficient.cloud.infer --model_name $\color{green} {model}$ --batch_size 1 --prompt_len 128 --ctx_len 1024 --num_cores 16 --device-group [0] --prompt "My name is" --mxfp6 --hf_token $\color{green}{xyz}$ --mos 1 --aic_enable_depth_first | python -m QEfficient.cloud.infer --model_name $\color{green}{model}$ --batch_size 1 --prompt_len 128 --ctx_len 1024--num_cores 16 --device-group [0,1,2,3] --prompt "My name is" --mxfp6 --hf_token $\color{green}{xyz}$ --mos 4 --aic_enable_depth_first | | QEfficient.cloud.excute | python -m QEfficient.cloud.execute --model_name $\color{green}{model}$ --device_group [0] --qpc_path $\color{green}{path}$ --prompt "My name is" --hf_token $\color{green}{xyz}$ | python -m QEfficient.cloud.execute --model_name $\color{green}{model}$ --device_group [0,1,2,3] --qpc_path $\color{green}{path}$ --prompt "My name is" --hf_token $\color{green}{xyz}$ | :memo: Replace $\color{green}{model}$ , $\color{green}{path}$ and $\color{green}{xyz}$ with preffered model card name, qpc path and hf token respectively.