Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The repeat time parameter in the dataset configuration file #763

Closed
ChenJian7578 opened this issue Dec 17, 2024 · 3 comments
Closed

The repeat time parameter in the dataset configuration file #763

ChenJian7578 opened this issue Dec 17, 2024 · 3 comments

Comments

@ChenJian7578
Copy link

Is there any benefit to model fine-tuning when the repeat_time parameter in the data set configuration file is greater than 1

@yuecao0119
Copy link
Collaborator

Thank you very much for your question.
Regarding the repeat_time parameter, my understanding is that on the one hand, the amount of high-quality data is relatively small, and on the other hand, the amount of data in different fields is different. By setting repeat_time, the number of active data samples and the data mixing ratio can be adjusted to make better use of high-quality data. In addition, increasing the repeat_time of a small amount of data to increase its proportion in the entire data can allow the model to quickly adapt to this specific data.
I hope this can help you.

@ChenJian7578
Copy link
Author

Can it be understood that when I have only one custom dataset, the repeat time argument is equivalent to the nummber_reain_epoch argument?

@czczup
Copy link
Member

czczup commented Dec 17, 2024

Can it be understood that when I have only one custom dataset, the repeat time argument is equivalent to the nummber_reain_epoch argument?

Yes, you are right.

@czczup czczup closed this as completed Dec 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants