Skip to content

Commit

Permalink
update with revised training size
Browse files Browse the repository at this point in the history
  • Loading branch information
djliden committed Mar 8, 2024
1 parent 42d53d2 commit e8008ab
Showing 1 changed file with 1 addition and 1 deletion.
Original file line number Diff line number Diff line change
Expand Up @@ -726,7 +726,7 @@
"\n",
"#### Split the dataset into training and validation\n",
"\n",
"Here we also limit to a training subset of 5,000 examples. This is based on the [LIMIT](https://www.databricks.com/blog/limit-less-more-instruction-tuning) paper, which found that a small number of high-quality examples is sufficient for instruction-tuning. Note that, under ideal circumstances, we would choose more *domain-specific* examples with a variety of different formats. Given that we are not tailoring this fine-tuning job for a specific domain, we will just choose 5,000 random examples from the SlimOrca dataset."
"Here we also limit to a training subset of 10,000 examples. This is based on the [LIMIT](https://www.databricks.com/blog/limit-less-more-instruction-tuning) paper, which found that a small number of high-quality examples is sufficient for instruction-tuning. Under ideal circumstances, we would choose more *domain-specific* examples with a variety of different formats. Given that we are not tailoring this fine-tuning job for a specific domain, we will just choose 10,000 random examples from the SlimOrca dataset. We could almost certainly get by with fewer examples, especially if those examples were selected for quality and tailored to the specific tasks we want the model to succeed at."
]
},
{
Expand Down

0 comments on commit e8008ab

Please sign in to comment.