From 4d31484ba2f6229627e420df4295acea655a0169 Mon Sep 17 00:00:00 2001 From: nlpzhezhao Date: Sat, 9 Mar 2024 20:08:19 +0800 Subject: [PATCH] change readme --- README.md | 21 +++++++++++---------- README_ZH.md | 10 +++++----- 2 files changed, 16 insertions(+), 15 deletions(-) diff --git a/README.md b/README.md index d6a12ac..5012672 100644 --- a/README.md +++ b/README.md @@ -7,7 +7,7 @@ -Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. **See the Wiki for [Full Documentation](https://github.com/dbiir/UER-py/wiki)**. +Pre-training has become an essential part for NLP tasks. UER-py (Universal Encoder Representations) is a toolkit for pre-training on general-domain corpus and fine-tuning on downstream task. UER-py maintains model modularity and supports research extensibility. It facilitates the use of existing pre-training models, and provides interfaces for users to further extend upon. With UER-py, we build a model zoo which contains pre-trained models of different properties. **See the [UER-py project Wiki](https://github.com/dbiir/UER-py/wiki) for full documentation**.

@@ -160,19 +160,20 @@ UER-py is organized as follows: ``` UER-py/ |--uer/ - | |--embeddings/ # contains embeddings - | |--encoders/ # contains encoders such as RNN, CNN, - | |--decoders/ # contains decoders - | |--targets/ # contains targets such as language modeling, masked language modeling - | |--layers/ # contains frequently-used NN layers, such as embedding layer, normalization layer - | |--models/ # contains model.py, which combines embedding, encoder, and target modules + | |--embeddings/ # contains modules of embedding component + | |--encoders/ # contains modules of encoder component such as RNN, CNN, Transformer + | |--decoders/ # contains modules of decoder component + | |--targets/ # contains modules of target component such as language modeling, masked language modeling + | |--layers/ # contains frequently-used NN layers + | |--models/ # contains model.py, which combines modules of different components | |--utils/ # contains frequently-used utilities | |--model_builder.py | |--model_loader.py | |--model_saver.py + | |--opts.py | |--trainer.py | - |--corpora/ # contains corpora for pre-training + |--corpora/ # contains pre-training data |--datasets/ # contains downstream tasks |--models/ # contains pre-trained models, vocabularies, and configuration files |--scripts/ # contains useful scripts for pre-training models @@ -184,7 +185,7 @@ UER-py/ |--README.md |--README_ZH.md |--requirements.txt - |--logo.jpg + |--LICENSE ``` @@ -214,7 +215,7 @@ UER-py has been used in winning solutions of many NLP competitions. In this sect
## Contact information -For communication related to this project, please contact Zhe Zhao (helloworld@ruc.edu.cn; nlpzhezhao@tencent.com) or Yudong Li (liyudong123@hotmail.com) or Cheng Hou (chenghoubupt@bupt.edu.cn) or Wenhang Shi (wenhangshi@ruc.edu.cn). +For communication related to this project, please contact Zhe Zhao (helloworld@alu.ruc.edu.cn; nlpzhezhao@tencent.com) or Yudong Li (liyudong123@hotmail.com) or Cheng Hou (chenghoubupt@bupt.edu.cn) or Wenhang Shi (wenhangshi@ruc.edu.cn). This work is instructed by my enterprise mentors __Qi Ju__, __Xuefeng Yang__, __Haotang Deng__ and school mentors __Tao Liu__, __Xiaoyong Du__. diff --git a/README_ZH.md b/README_ZH.md index 6a54930..6896796 100644 --- a/README_ZH.md +++ b/README_ZH.md @@ -7,7 +7,7 @@ -预训练已经成为自然语言处理任务的重要组成部分,为大量自然语言处理任务带来了显著提升。UER-py(Universal Encoder Representations)是一个用于对通用语料进行预训练并对下游任务进行微调的工具包。UER-py遵循模块化的设计原则。通过模块的组合,用户能迅速精准的复现已有的预训练模型,并利用已有的接口进一步开发更多的预训练模型。通过UER-py,我们建立了一个模型仓库,其中包含不同性质的预训练模型(例如基于不同编码器和目标任务)。用户可以根据具体任务的要求,从中选择合适的预训练模型使用。**[完整文档](https://github.com/dbiir/UER-py/wiki/主页)请参见本项目Wiki**。 +预训练已经成为自然语言处理任务的重要组成部分,为大量自然语言处理任务带来了显著提升。UER-py(Universal Encoder Representations)是一个用于对通用语料进行预训练并对下游任务进行微调的工具包。UER-py遵循模块化的设计原则。通过模块的组合,用户能迅速精准的复现已有的预训练模型,并利用已有的接口进一步开发更多的预训练模型。通过UER-py,我们建立了一个模型仓库,其中包含不同性质的预训练模型(例如基于不同语料、编码器、目标任务)。用户可以根据具体任务的要求,从中选择合适的预训练模型使用。**完整文档请参见[本项目Wiki]((https://github.com/dbiir/UER-py/wiki/主页))**。
@@ -36,10 +36,10 @@ UER-py有如下几方面优势: - __可复现__ UER-py已在许多数据集上进行了测试,与原始预训练模型实现(例如BERT、GPT-2、ELMo、T5)的表现相匹配 - __模块化__ UER-py使用解耦的模块化设计框架。框架分成Embedding、Encoder、Target等多个部分。各个部分之间有着清晰的接口并且每个部分包括了丰富的模块。可以对不同模块进行组合,构建出性质不同的预训练模型 -- __模型训练__ UER-py支持CPU、单机单GPU、单机多GPU、多机多GPU训练模式 -- __模型仓库__ 我们维护并持续发布预训练模型。用户可以根据具体任务的要求,从中选择合适的预训练模型使用 +- __模型训练__ UER-py支持单机CPU、单机GPU、多机多GPU训练模式 +- __模型仓库__ 我们维护并发布预训练模型。用户可以根据具体任务的要求,从中选择合适的预训练模型使用 - __SOTA结果__ UER-py支持全面的下游任务,包括文本分类、文本对分类、序列标注、阅读理解等,并提供了多个竞赛获胜解决方案 -- __预训练相关功能__ UER-py提供了丰富的预训练相关的功能和优化,包括特征抽取、近义词检索、预训练模型转换、模型集成、文本生成等 +- __预训练相关功能__ UER-py提供了丰富的预训练相关的功能,包括特征抽取、近义词检索、预训练模型转换、模型集成、文本生成等
@@ -75,7 +75,7 @@ doc2-sent1 doc3-sent1 doc3-sent2 ``` -书评语料是由书评分类数据集去掉标签得到的。我们将一条评论从中间分开,从而形成一个两句话的文档,具体可见*corpora*文件夹中的*book_review_bert.txt*。 +书评语料是由书评情感分类数据集去掉标签得到的。我们将一条评论从中间分开,从而形成一个两句话的文档,具体可见*corpora*文件夹中的*book_review_bert.txt*。 分类数据集的格式如下: ```