Skip to content

Commit

Permalink
Refactor model conversion (#296)
Browse files Browse the repository at this point in the history
* split deploy.py

* fix get_cuda_tensor

* deploy qwen_awq

* fix lint

* add docstring

* fix

* support baichuan/baichuan-awq

* parameterizing size_per_head

* remove try/except

* limit input model_format

* add quant_path param

* remove old deploy.py

* fix path

* fix transformer layer range when load bins

* fix qwen init

* split & save log

* relative import

* update get_config

* WeightFileMgr -> Reader

* rename

* update

* fix init_layer_id

* rename llama.py -> meta_llama.py, hf.py -> llama.py

* reduce code

* update arg description

* fix meta llama

* manually cleanup meta model params
  • Loading branch information
irexyc authored Nov 3, 2023
1 parent 1bbc6e0 commit 823ad84
Show file tree
Hide file tree
Showing 17 changed files with 1,743 additions and 1,050 deletions.
10 changes: 7 additions & 3 deletions lmdeploy/cli/cli.py
Original file line number Diff line number Diff line change
Expand Up @@ -28,8 +28,12 @@ def convert(self,
model_name (str): The name of the to-be-deployed model, such as
llama-7b, llama-13b, vicuna-7b and etc.
model_path (str): The directory path of the model
model_format (str): The format of the model, fb or hf. 'fb' stands
for META's llama format, and 'hf' means huggingface format.
model_format (str): the format of the model, should choose from
['llama', 'hf', 'awq', None]. 'llama' stands for META's llama
format, 'hf' means huggingface llama format, and 'awq' means
llama(hf) model quantized by lmdeploy/lite/quantization/awq.py.
the default value is None, which means the model_format will be
inferred based on model_name
tokenizer_path (str): The path of tokenizer model.
dst_path (str): The destination path that saves outputs.
tp (int): The number of GPUs used for tensor parallelism, which
Expand All @@ -38,7 +42,7 @@ def convert(self,
group_size (int): A parameter used in AWQ to quantize fp16 weights
to 4 bits.
"""
from lmdeploy.serve.turbomind.deploy import main as convert
from lmdeploy.turbomind.deploy.converter import main as convert

convert(model_name,
model_path,
Expand Down
Loading

0 comments on commit 823ad84

Please sign in to comment.