Skip to content

zhejianglab/Data-Processing-Toolkit-for-LLMs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data-Processing-Toolkit-for-LLMs

中文  |  English  |

Overview

The Data Processing Toolkit for LLMs, published by Zhejiang Lab, contains tools designed for the data collection and processing to train LLMs. This toolkit is engineered to address the challenges associated with data preparation across diverse domains of LLM training. This project aims to help researchers enhance the efficiency of data preparation and reduce the cost of data set construction.

The data processing toolkit released in the current version includes:

Acknowledgement

If you use this toolkit in your research, please cite it as follows:

@misc{ZJ2024DataProcessesToolkit,
 author = {Zhejiang Lab},
 title = {Data Processing Toolkit for LLMs},
 year = {2024},
 howpublished = {\url{https://github.com/zhejianglab/Data-Processing-Toolkit-for-LLMs}},
 note = {Accessed: 2024-09-14}
}

If you have published research using this toolkit, please let us know and we will maintain a list of relevant publications to facilitate better communication among researchers.

Contact us

If you have any problems using the toolkit, please contact us via email at Zhejiang Lab

© 2024 Research Center for Intelligent Equipment of Zhejiang Lab

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages