Yuan2.0

📔 For more detailed usage information, please refer to Yuan2.0 Paper

Introduction

Yuan2.0 is a new generation Fundamental Large Language Model developed by IEIT System. We have published all three models, Yuan 2.0-102B, Yuan 2.0-51B, and Yuan 2.0-2B. And we provide relevant scripts for pretraining, fine-tuning, and inference services for other developers. Yuan2.0 is based on Yuan1.0, utilizing a wider range of high-quality pre training data and instruction fine-tuning datasets to enhance the model's understanding of semantics, mathematics, reasoning, code, knowledge, and other aspects.

The use of the source code in this repository requires compliance with the open source license agreement Apache 2.0. The Yuan2.0 model supports commercial use and does not require authorization. Please understand and comply with the 《Yuan 2.0 Model License Agreement》. Do not use the open source model and code, as well as derivatives generated from open source projects, for any purposes that may cause harm to the country and society, or for any services that have not undergone security assessment and filing. Although we have taken measures to ensure the compliance and accuracy of the data during training, the model has a huge number of parameters and is affected by probability and randomness factors. We cannot guarantee the accuracy of the output content, and the model is easily misled by input instructions. This project does not assume any data security, public opinion risks, or any model misleading, abusing, spreading caused by open-source models and code Risks and responsibilities arising from improper utilization You will be solely responsible for the risks and consequences arising from the use, copying, distribution, and modification of the model in this open source project

Quick Start

See detail documentation here Quickstart.

Environment Config

We strongly recommend using the latest release of docker images we provided here.

You can launch an instance of the Yuan 2.0 container with the following Docker commands:

docker load < ./yuan_v2.0.tar
docker run --gpus all -it --rm -v /path/to/yuan_2.0:/workspace/yuan_2.0 -v /path/to/dataset:/workspace/dataset -v /path/to/checkpoints:/workspace/checkpoints yuan_v2.0:latest

Data preprocess

We have provided the data preprocess script. See documentation here.

Pretrain

We've provided several scripts for pretraining in the example. The details can be seen from documentation here.

Model Fine-tuning

We also have provided the supervised fine-tuning script. See documentation here.

Models

🥇🥇🥇 We have provided Yuan2.0 supervised-finetuned checkpoints. The checkpoint files of the models through the following links:

Hugging Face Version

Model	Seq Len	Download Link
Yuan2.0-102B-hf	4K	ModelScope \| HuggingFace \| OpenXlab \| 百度网盘 \| WiseModel
Yuan2.0-51B-hf	4K	ModelScope \| HuggingFace \| OpenXlab \| 百度网盘 \| WiseModel
Yuan2.0-2B-hf	8K	ModelScope \| HuggingFace \| OpenXlab \| 百度网盘 \| WiseModel
Yuan2.0-2B-Janux-hf ^New	8K	ModelScope \| HuggingFace \| OpenXlab \| 百度网盘 \| WiseModel

Origin Version

Model	Seq Len	Download Link
Yuan2.0-102B	4K	ModelScope \| OpenXlab \| 百度网盘 \| WiseModel
Yuan2.0-51B	4K	ModelScope \| OpenXlab \| 百度网盘 \| WiseModel
Yuan2.0-2B	8K	ModelScope \| OpenXlab \| 百度网盘 \| WiseModel
源2.0-2B-Janux ^New	8K	ModelScope \| OpenXlab \| 百度网盘 \| WiseModel

Yuan2.0-2B model support sequence length up to 8192 tokens, Yuan2.0-51B and Yuan2.0-102B models support sequence length up to 4096 tokens, you and set --max-position-embeddings and --seq-length values according to your device memory.

Evaluation

We provide evaluation scripts for HumanEval，AGIEval-Math，GSM-8K and TruthfulQA for users to reproduce ours numbers. We conducted performance tests on different size of Yuan 2.0 models, which can be found in our paper.

Model	GSM8K	AGIEval-GK-Math-QA	AGIEval-GK-Math-Cloze	HumanEval	TurthfulQA
GPT-4	92%	47.0%	16.1%	86.6%	59%
Chat-GPT	68.6%*	36.5%	7.3%	66.5%*	34%*
Llama2	56.8%	-	-	29.9%	-
Yuan2.0-102B	76.6%	38.7%	13.5%	67.1%	58%
Yuan2.0-102B-SC	86.2%	45.5%	15.2%	77.4%	-

* Evaluate ChatGPT using exactly the same input data as Yuan 2.0 in November 2023

Inference Service

For the inference efficiency, the Yuan2.0-51B and Yuan2.0-102B models need to be converted into model files with only tensor parallelism before starting the inference service. The details can be seen from documentation here.

You can call the model by calling the inference service and sending a request to it. The details can be seen from documentation here.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README-EN.md

README-EN.md

Yuan2.0

Table of Contents

Introduction

Quick Start

Environment Config

Data preprocess

Pretrain

Model Fine-tuning

Models

Hugging Face Version

Origin Version

Evaluation

Inference Service

Files

README-EN.md

Latest commit

History

README-EN.md

File metadata and controls

Yuan2.0

Table of Contents

Introduction

Quick Start

Environment Config

Data preprocess

Pretrain

Model Fine-tuning

Models

Hugging Face Version

Origin Version

Evaluation

Inference Service