intel · Mingyu-Wei · Mar 13, 2024 · Mar 18, 2024 · Mar 18, 2024
diff --git a/Chinese_Version/ch_7_Finetune/7_1_Finetune_Llama2-7B.md b/Chinese_Version/ch_7_Finetune/7_1_Finetune_Llama2-7B.md
@@ -45,6 +45,7 @@ source /opt/intel/oneapi/setvars.sh
 对于英特尔 GPU，您应在`from_pretrained`函数中特别设置 `optimize_model=False`。一旦获得低精度模型，请将其设置为`to('xpu')`。
 
 ```python
+import torch
 model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "meta-llama/Llama-2-7b-hf",
                                              load_in_low_bit="nf4",
                                              optimize_model=False,
@@ -89,34 +90,34 @@ model = get_peft_model(model, config)
 >
 > 有关 LoraConfig 参数的更多说明可以在 [Transformer LoRA 指南](https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft)中查看。
 
-### 7.1.2.3 加载数据集
+### 7.1.2.3 加载Tokenizer
 
-我们加载通用数据集 [english quotes](https://huggingface.co/datasets/Abirate/english_quotes) 来根据英语名言来微调我们的模型。
+分词器可以在 LLM 训练和推理中实现分词和去分词过程。您可以使用 [Huggingface Transformers](https://huggingface.co/docs/transformers/index) API来加载 LLM 推理需要的分词器，它可以与 BigDL-LLM 加载的模型无缝配合使用。对于Llama 2，对应的tokenizer类为`LlamaTokenizer`。
 
 ```python
-from datasets import load_dataset
-data = load_dataset("Abirate/english_quotes")
-data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
+from transformers import LlamaTokenizer
+tokenizer = LlamaTokenizer.from_pretrained(pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf", trust_remote_code=True)
+tokenizer.pad_token_id = 0
+tokenizer.padding_side = "left"
 ```
 
 > **注意**
 >
-> 如果您已经从 [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes/blob/main/quotes.jsonl) 下载了 `.jsonl` 文件，您可以使用 `data = load_dataset( "json", data_files= "path/to/your/.jsonl/file")` 指定本地路径，以替代从 huggingface repo id 的加载方法 `data = load_dataset("Abirate/english_quotes")`。
+> 如果您已经下载了 Llama 2 (7B) 模型，您可以将 `pretrained_model_name_or_path` 指定为本地模型路径。
 
-### 7.1.2.4 加载Tokenizer
+### 7.1.2.4 加载数据集
 
-分词器可以在 LLM 训练和推理中实现分词和去分词过程。您可以使用 [Huggingface Transformers](https://huggingface.co/docs/transformers/index) API来加载 LLM 推理需要的分词器，它可以与 BigDL-LLM 加载的模型无缝配合使用。对于Llama 2，对应的tokenizer类为`LlamaTokenizer`。
+我们加载通用数据集 [english quotes](https://huggingface.co/datasets/Abirate/english_quotes) 来根据英语名言来微调我们的模型。
 
 ```python
-from transformers import LlamaTokenizer
-tokenizer = LlamaTokenizer.from_pretrained(pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf", trust_remote_code=True)
-tokenizer.pad_token_id = 0
-tokenizer.padding_side = "left"
+from datasets import load_dataset
+data = load_dataset("Abirate/english_quotes")
+data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
 ```
 
 > **注意**
 >
-> 如果您已经下载了 Llama 2 (7B) 模型，您可以将 `pretrained_model_name_or_path` 指定为本地模型路径。
+> 如果您已经从 [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes/blob/main/quotes.jsonl) 下载了 `.jsonl` 文件，您可以使用 `data = load_dataset( "json", data_files= "path/to/your/.jsonl/file")` 指定本地路径，以替代从 huggingface repo id 的加载方法 `data = load_dataset("Abirate/english_quotes")`。
 
 ### 7.1.2.5 进行训练
 
@@ -175,8 +176,10 @@ result = trainer.train()
 ### 7.1.3.1 加载预训练模型
 
 ```python
+from bigdl.llm.transformers import AutoModelForCausalLM
+base_model_path = "meta-llama/Llama-2-7b-hf"
 base_model = AutoModelForCausalLM.from_pretrained(
-        base_model,
+        base_model_path,
         torch_dtype=torch.float16,
         device_map={"": "cpu"},
     )
@@ -234,7 +237,7 @@ Using pad_token, but it is not set yet.
 最后，我们可以将合并的模型保存在指定的本地路径中（在我们的例子中是`./outputs/checkpoint-200-merged`）。
 
 ```python
-output_path = ./outputs/checkpoint-200-merged
+output_path = "./outputs/checkpoint-200-merged"
 lora_model_sd = lora_model.state_dict()
 deloreanized_sd = {
         k.replace("base_model.model.", ""): v

diff --git a/ch_7_Finetune/7_1_Finetune_Llama2-7B.md b/ch_7_Finetune/7_1_Finetune_Llama2-7B.md
@@ -46,6 +46,7 @@ With BigDL-LLM optimization, you can load the model with `bigdl.llm.transformers
 For Intel GPUs, once you have the model in low precision, **set it to `to('xpu')`**.
 
 ```python
+import torch
 from bigdl.llm.transformers import AutoModelForCausalLM
 model = AutoModelForCausalLM.from_pretrained(pretrained_model_name_or_path = "meta-llama/Llama-2-7b-hf",
                                              load_in_low_bit="nf4",
@@ -93,7 +94,20 @@ model = get_peft_model(model, config)
 > More explanation about `LoraConfig` parameters can be found in [Transformer LoRA Guides](https://huggingface.co/docs/peft/conceptual_guides/lora#common-lora-parameters-in-peft).
 >
 
-### 7.1.2.3 Load Dataset
+### 7.1.2.3 Load Tokenizer
+A tokenizer enables tokenizing and detokenizing process in LLM training and inference. You can use [Huggingface transformers](https://huggingface.co/docs/transformers/index) API to load the tokenizer directly. It can be used seamlessly with models loaded by BigDL-LLM. For Llama 2, the corresponding tokenizer class is `LlamaTokenizer`.
+
+```python
+from transformers import LlamaTokenizer
+tokenizer = LlamaTokenizer.from_pretrained(pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf", trust_remote_code=True)
+tokenizer.pad_token_id = 0
+tokenizer.padding_side = "left"
+```
+> **Note**
+>
+> If you have already downloaded the Llama 2 (7B) model, you could specify `pretrained_model_name_or_path` to the local model path.
+
+### 7.1.2.4 Load Dataset
 
 A common dataset, [english quotes](https://huggingface.co/datasets/Abirate/english_quotes), is loaded to fine tune our model on famous quotes.
 ```python
@@ -107,19 +121,6 @@ data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
 > The dataset path here is default to be Huggingface repo id. 
 > If you have already downloaded the `.jsonl` file from [Abirate/english_quotes](https://huggingface.co/datasets/Abirate/english_quotes/blob/main/quotes.jsonl), you could use `data = load_dataset("json", data_files= "path/to/your/.jsonl/file")` to specify the local path instead of `data = load_dataset("Abirate/english_quotes")`.
 
-### 7.1.2.4 Load Tokenizer
-A tokenizer enables tokenizing and detokenizing process in LLM training and inference. You can use [Huggingface transformers](https://huggingface.co/docs/transformers/index) API to load the tokenizer directly. It can be used seamlessly with models loaded by BigDL-LLM. For Llama 2, the corresponding tokenizer class is `LlamaTokenizer`.
-
-```python
-from transformers import LlamaTokenizer
-tokenizer = LlamaTokenizer.from_pretrained(pretrained_model_name_or_path="meta-llama/Llama-2-7b-chat-hf", trust_remote_code=True)
-tokenizer.pad_token_id = 0
-tokenizer.padding_side = "left"
-```
-> **Note**
->
-> If you have already downloaded the Llama 2 (7B) model, you could specify `pretrained_model_name_or_path` to the local model path.
-
 ### 7.1.2.5 Run the Training
 
 You can then start the training process by setting the `trainer` with existing tools on the HF ecosystem. Here we set `warmup_steps` to be 20 to accelerate the process of training.
@@ -177,8 +178,9 @@ After finetuning the model, you could merge the QLoRA weights back into the base
 
 ```python
 from bigdl.llm.transformers import AutoModelForCausalLM
+base_model_path = "meta-llama/Llama-2-7b-hf"
 base_model = AutoModelForCausalLM.from_pretrained(
-        base_model,
+        base_model_path,
         torch_dtype=torch.float16,
         device_map={"": "cpu"},
     )
@@ -238,7 +240,7 @@ Using pad_token, but it is not set yet.
 ```
 Finally we can save the fine-tuned model in a specified local path (in our case is `./outputs/checkpoint-200-merged`).
 ```python
-output_path = ./outputs/checkpoint-200-merged
+output_path = "./outputs/checkpoint-200-merged"
 lora_model_sd = lora_model.state_dict()
 deloreanized_sd = {
         k.replace("base_model.model.", ""): v