Refactor model conversion (#296)

* split deploy.py * fix get_cuda_tensor * deploy qwen_awq * fix lint * add docstring * fix * support baichuan/baichuan-awq * parameterizing size_per_head * remove try/except * limit input model_format * add quant_path param * remove old deploy.py * fix path * fix transformer layer range when load bins * fix qwen init * split & save log * relative import * update get_config * WeightFileMgr -> Reader * rename * update * fix init_layer_id * rename llama.py -> meta_llama.py, hf.py -> llama.py * reduce code * update arg description * fix meta llama * manually cleanup meta model params
InternLM · Nov 3, 2023 · 823ad84 · 823ad84
1 parent 1bbc6e0
commit 823ad84
Show file tree

Hide file tree

Showing 17 changed files with 1,743 additions and 1,050 deletions.
diff --git a/lmdeploy/cli/cli.py b/lmdeploy/cli/cli.py
@@ -28,8 +28,12 @@ def convert(self,
             model_name (str): The name of the to-be-deployed model, such as
                 llama-7b, llama-13b, vicuna-7b and etc.
             model_path (str): The directory path of the model
-            model_format (str): The format of the model, fb or hf. 'fb' stands
-                for META's llama format, and 'hf' means huggingface format.
+            model_format (str): the format of the model, should choose from
+                ['llama', 'hf', 'awq', None]. 'llama' stands for META's llama
+                format, 'hf' means huggingface llama format, and 'awq' means
+                llama(hf) model quantized by lmdeploy/lite/quantization/awq.py.
+                the default value is None, which means the model_format will be
+                inferred based on model_name
             tokenizer_path (str): The path of tokenizer model.
             dst_path (str): The destination path that saves outputs.
             tp (int): The number of GPUs used for tensor parallelism, which
@@ -38,7 +42,7 @@ def convert(self,
             group_size (int): A parameter used in AWQ to quantize fp16 weights
                 to 4 bits.
         """
-        from lmdeploy.serve.turbomind.deploy import main as convert
+        from lmdeploy.turbomind.deploy.converter import main as convert
 
         convert(model_name,
                 model_path,