Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Add aya models #2335

Merged
merged 2 commits into from
Oct 25, 2024
Merged

Conversation

Aunali321
Copy link
Contributor

PR type

  • Bug Fix
  • New Feature
  • Document Updates
  • More Models or Datasets Support

PR information

Add new Aya models by Cohere for AI.

Experiment results

The models do not exist yet on ModelScope. Maybe they will be added soon. This is my first PR and I don't understand chinese so sorry if i make a mistake.

@Aunali321
Copy link
Contributor Author

Here's a small training run using aya-expanse-8b.
Config:

USE_HF=1 \
HF_HUB_ENABLE_HF_TRANSFER=1 \
swift rlhf \
    --rlhf_type kto \
    --model_type aya-expanse-8b \
    --beta 0.1 \
    --desirable_weight 1.0 \
    --undesirable_weight 1.0 \
    --model_revision master \
    --sft_type lora \
    --tuner_backend peft \
    --template_type AUTO \
    --dtype AUTO \
    --output_dir output \
    --dataset Cossale/informal-to-professional-kto \
    --train_dataset_sample -1 \
    --num_train_epochs 1 \
    --max_length 8192 \
    --check_dataset_strategy warning \
    --lora_rank 32 \
    --lora_alpha 64 \
    --lora_dropout_p 0.05 \
    --lora_target_modules ALL \
    --gradient_checkpointing true \
    --batch_size 1 \
    --weight_decay 0.1 \
    --learning_rate 2e-4 \
    --use_dora True \
    --neftune_noise_alpha 5 \
    --gradient_accumulation_steps 4 \
    --max_grad_norm 0.5 \
    --warmup_ratio 0.03 \
    --eval_steps 100 \
    --save_steps 100 \
    --save_total_limit 2 \
    --logging_steps 10 \
    --use_flash_attn true

Merging LoRA:

USE_HF=1 \
swift export \
    --model_type aya-expanse-8b \
    --ckpt_dir '/root/llm-finetuning-setup/swift/output/aya-expanse-8b/v2-20241024-170858/checkpoint-35' \
    --merge_lora true 

Training:

image

Result:

image

@Jintao-Huang
Copy link
Collaborator

thanks ❤️

@Jintao-Huang
Copy link
Collaborator

https://github.com/modelscope/ms-swift/blob/main/CONTRIBUTING.md#code-standards-and-development-approach

Please check the lint. 😊

@Aunali321
Copy link
Contributor Author

Fixed the lint, please re-run. thanks.

@Jintao-Huang
Copy link
Collaborator

modelscope model:

@Jintao-Huang Jintao-Huang merged commit 825ec45 into modelscope:main Oct 25, 2024
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants