Merge pull request #34 from bofenghuang/next

Update to version 2.2
bofenghuang · Oct 20, 2023 · 72b96b5 · 72b96b5
2 parents 76e1cd0 + 80bfbcc
commit 72b96b5
Show file tree

Hide file tree

Showing 75 changed files with 6,531 additions and 71,140 deletions.
diff --git a/README.md b/README.md
@@ -147,6 +147,7 @@ Our project builds upon the following open-source projects for further developme
 - [Alpaca-LoRA by @tloen](https://github.com/tloen/alpaca-lora)
 - [Baize](https://github.com/project-baize/baize-chatbot)
 - [llama.cpp by @ggerganov](https://github.com/ggerganov/llama.cpp)
+- [Axolotl by @OpenAccess-AI-Collective](https://github.com/OpenAccess-AI-Collective/axolotl)
 
 ## Citation
 

diff --git a/data/chat/converted_alpaca_data_cleaned_fr_52k.jsonl b/data/chat/converted_alpaca_data_cleaned_fr_52k.jsonl
diff --git a/data/chat/converted_dolly_bactrian_fr_15k.jsonl b/data/chat/converted_dolly_bactrian_fr_15k.jsonl
diff --git a/data/chat/dummy_chat.jsonl b/data/chat/dummy_chat.jsonl
diff --git a/data/chat/oasst_20230412_fr_top1.jsonl b/data/chat/oasst_20230412_fr_top1.jsonl
diff --git a/data/chat/sg_fr.jsonl b/data/chat/sg_fr.jsonl
diff --git a/data/chat/sharegpt_fr.jsonl b/data/chat/sharegpt_fr.jsonl
diff --git a/docs/data.md b/docs/data.md
@@ -72,7 +72,7 @@ You can use the following script to generate the instruction-following data:
 export OPENAI_API_KEY=YOUR/OPENAI/API/TOKEN
 
 # num_instructions_to_generate is by worker
-python scripts/data_generation/generate_instructions.py \
+python scripts/data_generation/generate_self_instruct.py \
     --seed_tasks_path data/instruct/seed_tasks_vigogne.jsonl \
     --prompt_path data/instruct/prompt_vigogne.txt \
     --output_file data/instruct/self_instruct_data.jsonl \
@@ -116,9 +116,9 @@ Below is an example of a script we used to provide some translated subjects in [
 # Specify your OpenAI API key
 export OPENAI_API_KEY=YOUR/OPENAI/API/TOKEN
 
-python scripts/data_generation/generate_conversations.py \
-    --input_json_file data/chat/subject_quora_fr_nllb3b3.jsonl \
-    --output_json_file data/chat/self_chat_data_quora_fr.jsonl \
+python scripts/data_generation/generate_self_chat.py \
+    --input_file data/chat/subject_quora_fr_nllb3b3.jsonl \
+    --output_file data/chat/self_chat_data_quora_fr.jsonl \
     --subject_field translated_subject \
     --output_subject_field subject \
     --id_prefix self-chat-quora- \
@@ -169,8 +169,8 @@ Next, you can generate responses using the following script. Please note that th
 export OPENAI_API_KEY=YOUR/OPENAI/API/TOKEN
 
 python scripts/data_generation/generate_responses.py \
-    --input_json_file path/to/flanv2_translated.jsonl \
-    --output_json_file path/to/flanv2_translated_completed.jsonl \
+    --input_file path/to/flanv2_translated.jsonl \
+    --output_file path/to/flanv2_translated_completed.jsonl \
     --system_field system_prompt \
     --instruction_field translated_question \
     --response_field fr_response \

diff --git a/docs/model.md b/docs/model.md
@@ -10,9 +10,11 @@ You can access the weights for these models on the 🤗 Hugging Face Hub. For fu
 
 Here is a list of recommended models for this project. These models have been trained using more diverse and higher-quality data, along with an optimized training process. It is advisable to use these models as a priority for your project. For alternative models, please refer to the [Other Models](#other-models) section.
 
-|                                     Model                                      | Type  |            Foundation model             |      Data      |                                                          Description                                                           |
-| :----------------------------------------------------------------------------: | :---: | :-------------------------------------: | :------------: | :----------------------------------------------------------------------------------------------------------------------------: |
-| [Vigogne-2-7B-Chat-V2.0](https://huggingface.co/bofenghuang/vigogne-2-7b-chat) | Chat  | [Llama-2-7B](https://ai.meta.com/llama) | 520K chat data | Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details. |
+|                                     Model                                      | Type  |                         Foundation model                         |      Data      |                                                          Description                                                           |
+| :----------------------------------------------------------------------------: | :---: | :--------------------------------------------------------------: | :------------: | :----------------------------------------------------------------------------------------------------------------------------: |
+|   [Vigostral-7B-Chat](https://huggingface.co/bofenghuang/vigostral-7b-chat)    | Chat  | [Mistral-7B-v0.1](https://mistral.ai/news/announcing-mistral-7b) |                |                                                                                                                                |
+| [Vigogne-2-7B-Chat-V2.0](https://huggingface.co/bofenghuang/vigogne-2-7b-chat) | Chat  |             [Llama-2-7B](https://ai.meta.com/llama)              | 520K chat data | Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details. |
+|  [Vigogne-2-13B-Chat](https://huggingface.co/bofenghuang/vigogne-2-13b-chat)   | Chat  |             [Llama-2-13B](https://ai.meta.com/llama)             | 520K chat data | Check out our [blog](https://github.com/bofenghuang/vigogne/blob/main/blogs/2023-08-17-vigogne-chat-v2_0.md) for more details. |
 
 ### Legacy Models
 

diff --git a/docs/training.md b/docs/training.md
@@ -10,41 +10,4 @@ We highly recommend the utilization of tools such as [DeepSpeed](https://github.
 
 More examples can be found in [examples](https://github.com/bofenghuang/vigogne/blob/main/examples/train).
 
-The following command shows how to fine-tune the Llama 2 7B model on a single GPU using LoRA and LLM.int8().
-
-```bash
-python vigogne/train/train_sft.py \
-    --model_name_or_path "meta-llama/Llama-2-7b-hf" \
-    --train_file "/path/to/train/instruct/file.jsonl" \
-    --output_dir "outputs/llama-2-7b-sft-instruct-lora-int8" \
-    --overwrite_output_dir \
-    --mode "instruct" \
-    --preprocessing_num_workers "8" \
-    --dataloader_num_workers "1" \
-    --pack_into_block \
-    --block_size "2048" \
-    --load_in_8bit \
-    --lora_r "64" \
-    --lora_alpha "16" \
-    --lora_dropout "0.05" \
-    --target_modules "q_proj" "v_proj" "k_proj" "o_proj" "gate_proj" "up_proj" "down_proj" \
-    --per_device_train_batch_size "8" \
-    --per_device_eval_batch_size "4" \
-    --num_train_epochs "3" \
-    --learning_rate "1e-4" \
-    --warmup_ratio "0.03" \
-    --lr_scheduler_type "cosine" \
-    --weight_decay "0" \
-    --torch_compile \
-    --fp16 \
-    --gradient_checkpointing \
-    --ddp_find_unused_parameters false \
-    --log_level "info" \
-    --logging_steps "10" \
-    --logging_first_step true \
-    --save_strategy "steps" \
-    --save_steps "100" \
-    --save_total_limit "3" \
-    --report_to "tensorboard" "wandb" \
-    --do_train
-```
+Since version 2.2, I've refactored the training code, integrating specific elements inspired by the excellent training framework [Axolotl](https://github.com/OpenAccess-AI-Collective/axolotl). Thanks to the Axolotl team for their contributions to the open-source community! The primary motivation behind maintaining my own framework is to have full control over the entire training process and customize it to my specific needs. I highly recommend using Axolotl for additional features.
diff --git a/examples/inference/run_fastchat_cli.sh b/examples/inference/run_fastchat_cli.sh
diff --git a/examples/inference/run_gradio_demo.sh b/examples/inference/run_gradio_demo.sh
diff --git a/examples/inference/run_llama_cpp.sh b/examples/inference/run_llama_cpp.sh
diff --git a/examples/inference/run_server_vllm.sh b/examples/inference/run_server_vllm.sh
@@ -16,3 +16,6 @@ export CUDA_VISIBLE_DEVICES="0"
 python -m vllm.entrypoints.openai.api_server \
     --model bofenghuang/vigogne-2-7b-chat \
     --host "0.0.0.0"
+
+# Then send request using the following script
+# python vigogne/inference/vllm/client_openai_chatcompletion.py
diff --git a/examples/train/falcon/train_sft_chat_qlora.sh b/examples/train/falcon/train_sft_chat_qlora.sh
@@ -0,0 +1,76 @@
+#!/usr/bin/env bash
+# Copyright 2023  Bofeng Huang
+
+# Train chat models using QLoRA (int4)
+
+export WANDB_PROJECT="llm-sft-chat"
+export OMP_NUM_THREADS="1"
+export TOKENIZERS_PARALLELISM="false"
+export BITSANDBYTES_NOWELCOME="1"
+# export CUDA_VISIBLE_DEVICES="0"
+
+# Model
+model_name_or_path=tiiuae/falcon-7b
+# model_name_or_path=tiiuae/falcon-40b
+
+# Dataset
+# Customize dataset here
+train_file=data/chat/oasst_20230412_fr_top1.jsonl
+model_max_length=2048
+
+# Outdir
+run_name=falcon-7b-sft-chat-qlora
+output_dir=outputs/$run_name
+
+# Might need to adjust the batch size and other hyperparameters by yourself
+per_device_train_batch_size=8
+per_device_eval_batch_size=4
+gradient_accumulation_steps=8
+
+# Further optimization
+# DeepSpeed Stage 2
+# --deepspeed vigogne/configs/ds_config_zero2_no_offload.json \
+
+torchrun \
+    vigogne/cli/train_sft.py \
+    --model_name_or_path $model_name_or_path \
+    --tokenizer_use_fast false \
+    --tokenizer_padding_side "right" \
+    --add_special_tokens '{"bos_token":">>ABSTRACT<<","pad_token":"<|endoftext|>"}' \
+    --train_file $train_file \
+    --output_dir $output_dir \
+    --overwrite_output_dir \
+    --run_name $run_name \
+    --processor_style "vigogne_chat_v3" \
+    --model_max_length $model_max_length \
+    --eval_split_ratio "0.01" \
+    --preprocessing_num_workers "8" \
+    --dataloader_num_workers "1" \
+    --adapter "qlora" \
+    --load_in_4bit \
+    --optim "paged_adamw_32bit" \
+    --lora_r "64" \
+    --lora_alpha "16" \
+    --lora_dropout "0.05" \
+    --lora_target_all_linear_layers \
+    --do_merge_lora \
+    --num_train_epochs "3" \
+    --per_device_train_batch_size $per_device_train_batch_size \
+    --per_device_eval_batch_size $per_device_eval_batch_size \
+    --gradient_accumulation_steps $gradient_accumulation_steps \
+    --learning_rate "1e-4" \
+    --warmup_ratio "0.03" \
+    --lr_scheduler_type "cosine" \
+    --weight_decay "0" \
+    --fp16 \
+    --gradient_checkpointing \
+    --ddp_find_unused_parameters false \
+    --log_level "info" \
+    --logging_steps "1" \
+    --logging_first_step \
+    --save_strategy "steps" \
+    --save_steps "100" \
+    --save_total_limit "3" \
+    --evaluation_strategy "steps" \
+    --eval_steps "100" \
+    --report_to "tensorboard" "wandb"
diff --git a/examples/train/train_sft_chat_lora_int8.sh → examples/train/llama2/train_sft_chat.sh b/examples/train/train_sft_chat_lora_int8.sh → examples/train/llama2/train_sft_chat.sh
@@ -1,58 +1,65 @@
 #!/usr/bin/env bash
 # Copyright 2023  Bofeng Huang
 
-export WANDB_PROJECT="llm-sft-chat-fr"
+# Train chat models using full fine-tuning + DeepSpeed Stage 3
+
+export WANDB_PROJECT="llm-sft-chat"
 export OMP_NUM_THREADS="1"
 export TOKENIZERS_PARALLELISM="false"
 export BITSANDBYTES_NOWELCOME="1"
 export CUDA_VISIBLE_DEVICES="0,1,2,3"
 
-train_file=/path/to/train/chat/file.jsonl
+# Model
+model_name_or_path=meta-llama/Llama-2-7b-hf
 
-mode=chat
+# Dataset
+# Customize dataset here
+train_file=data/chat/oasst_20230412_fr_top1.jsonl
 model_max_length=2048
 
-model_name_or_path=meta-llama/Llama-2-7b-hf
-output_dir=outputs/llama-2-7b-sft-chat-lora-int8
+# Outdir
+run_name=llama-2-7b-sft-chat-fullfinetune
+output_dir=outputs/$run_name
 
-per_device_train_batch_size=8
+# Might need to adjust the batch size and other hyperparameters by yourself
+per_device_train_batch_size=4
+per_device_eval_batch_size=2
 gradient_accumulation_steps=4
 
-# Might need to adjust the batch size and other hyperparameters by yourself
 torchrun \
     --nproc_per_node 4 \
-    vigogne/train/train_sft.py \
+    vigogne/cli/train_sft.py \
+    --deepspeed vigogne/configs/ds_config_zero3_no_offload.json \
     --model_name_or_path $model_name_or_path \
+    --tokenizer_use_fast false \
+    --tokenizer_padding_side "right" \
     --train_file $train_file \
     --output_dir $output_dir \
     --overwrite_output_dir \
-    --mode $mode \
+    --run_name $run_name \
+    --processor_style "vigogne_chat_v3" \
     --model_max_length $model_max_length \
+    --eval_split_ratio "0.01" \
     --preprocessing_num_workers "8" \
     --dataloader_num_workers "1" \
-    --pack_into_block \
-    --block_size "2048" \
-    --load_in_8bit \
-    --lora_r "64" \
-    --lora_alpha "16" \
-    --lora_dropout "0.05" \
-    --target_modules "q_proj" "v_proj" "k_proj" "o_proj" "gate_proj" "up_proj" "down_proj" \
+    --num_train_epochs "3" \
     --per_device_train_batch_size $per_device_train_batch_size \
+    --per_device_eval_batch_size $per_device_eval_batch_size \
     --gradient_accumulation_steps $gradient_accumulation_steps \
-    --num_train_epochs "3" \
-    --learning_rate "1e-4" \
+    --optim "adamw_bnb_8bit" \
+    --learning_rate "2.5e-5" \
     --warmup_ratio "0.03" \
     --lr_scheduler_type "cosine" \
     --weight_decay "0" \
-    --torch_compile \
     --fp16 \
     --gradient_checkpointing \
     --ddp_find_unused_parameters false \
     --log_level "info" \
-    --logging_steps "10" \
-    --logging_first_step true \
+    --logging_steps "1" \
+    --logging_first_step \
     --save_strategy "steps" \
     --save_steps "100" \
     --save_total_limit "3" \
-    --report_to "tensorboard" "wandb" \
-    --do_train
+    --evaluation_strategy "steps" \
+    --eval_steps "100" \
+    --report_to "tensorboard" "wandb"
diff --git a/...les/train/train_sft_instruct_lora_int8.sh → examples/train/llama2/train_sft_chat_lora.sh b/...les/train/train_sft_instruct_lora_int8.sh → examples/train/llama2/train_sft_chat_lora.sh
@@ -1,58 +1,76 @@
 #!/usr/bin/env bash
 # Copyright 2023  Bofeng Huang
 
-export WANDB_PROJECT="llm-sft-instruct-fr"
+# Train chat models using LoRA
+
+export WANDB_PROJECT="llm-sft-chat"
 export OMP_NUM_THREADS="1"
 export TOKENIZERS_PARALLELISM="false"
 export BITSANDBYTES_NOWELCOME="1"
-export CUDA_VISIBLE_DEVICES="0,1,2,3"
+# export CUDA_VISIBLE_DEVICES="0"
 
-train_file=/path/to/train/instruct/file.jsonl
+# Model
+model_name_or_path=meta-llama/Llama-2-7b-hf
 
-mode=instruct
+# Dataset
+# Customize dataset here
+train_file=data/chat/oasst_20230412_fr_top1.jsonl
 model_max_length=2048
 
-model_name_or_path=meta-llama/Llama-2-7b-hf
-output_dir=outputs/llama-2-7b-sft-instruct-lora-int8
+# Outdir
+run_name=llama-2-7b-sft-chat-lora
+output_dir=outputs/$run_name
 
+# Might need to adjust the batch size and other hyperparameters by yourself
 per_device_train_batch_size=8
-gradient_accumulation_steps=4
+per_device_eval_batch_size=4
+gradient_accumulation_steps=8
+
+# Further optimization
+# DeepSpeed Stage 2
+# --deepspeed vigogne/configs/ds_config_zero2_no_offload.json \
+# LLM.int8()
+# --load_in_8bit \
+# 8bit optimizer
+# --optim "adamw_bnb_8bit" \
 
-# Might need to adjust the batch size and other hyperparameters by yourself
 torchrun \
-    --nproc_per_node 4 \
-    vigogne/train/train_sft.py \
+    vigogne/cli/train_sft.py \
     --model_name_or_path $model_name_or_path \
+    --tokenizer_use_fast false \
+    --tokenizer_padding_side "right" \
     --train_file $train_file \
     --output_dir $output_dir \
     --overwrite_output_dir \
-    --mode $mode \
+    --run_name $run_name \
+    --processor_style "vigogne_chat_v3" \
     --model_max_length $model_max_length \
+    --eval_split_ratio "0.01" \
     --preprocessing_num_workers "8" \
     --dataloader_num_workers "1" \
-    --pack_into_block \
-    --block_size "2048" \
-    --load_in_8bit \
+    --adapter "lora" \
     --lora_r "64" \
     --lora_alpha "16" \
     --lora_dropout "0.05" \
-    --target_modules "q_proj" "v_proj" "k_proj" "o_proj" "gate_proj" "up_proj" "down_proj" \
+    --lora_target_modules "q_proj" "v_proj" "k_proj" "o_proj" "gate_proj" "up_proj" "down_proj" \
+    --do_merge_lora \
+    --num_train_epochs "3" \
     --per_device_train_batch_size $per_device_train_batch_size \
+    --per_device_eval_batch_size $per_device_eval_batch_size \
     --gradient_accumulation_steps $gradient_accumulation_steps \
-    --num_train_epochs "3" \
     --learning_rate "1e-4" \
     --warmup_ratio "0.03" \
     --lr_scheduler_type "cosine" \
     --weight_decay "0" \
-    --torch_compile \
     --fp16 \
     --gradient_checkpointing \
     --ddp_find_unused_parameters false \
     --log_level "info" \
-    --logging_steps "10" \
-    --logging_first_step true \
+    --logging_steps "1" \
+    --logging_first_step \
     --save_strategy "steps" \
     --save_steps "100" \
     --save_total_limit "3" \
-    --report_to "tensorboard" "wandb" \
-    --do_train
+    --evaluation_strategy "steps" \
+    --eval_steps "100" \
+    --report_to "tensorboard" "wandb"