Merge pull request #4805 from FederatedAI/develop-1.11.1

Merge 1.11.1 into master
FederatedAI · Apr 20, 2023 · 5fa5522 · 5fa5522
2 parents 5ac0567 + 50be383
commit 5fa5522
Show file tree

Hide file tree

Showing 38 changed files with 33,044 additions and 112 deletions.
diff --git a/RELEASE.md b/RELEASE.md
@@ -1,3 +1,10 @@
+## Release 1.11.1
+### Major Features and Improvements
+> FederatedML
+* Support Homo Graph Neural Network
+* PSI-DH protocol enhancement: use Oakley MODP modulus groups
+
+
 ## Release 1.11.0
 ### Major Features and Improvements
 > FederatedML

diff --git a/deploy/cluster-deploy/doc/fate_on_spark/fate_on_spark_deployment_guide.md b/deploy/cluster-deploy/doc/fate_on_spark/fate_on_spark_deployment_guide.md
@@ -187,7 +187,7 @@ wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/${version}/
 scp *.tar.gz [email protected]:/data/projects/install
 scp *.tar.gz [email protected]:/data/projects/install
 ```
-Note: The current document needs to be deployed with FATE version>=1.7.0, ${version} is replaced with e.g. 1.11.0, without the v character.
+Note: The current document needs to be deployed with FATE version>=1.7.0, ${version} is replaced with e.g. 1.11.1, without the v character.
 
 ### 5.2 Operating system parameter checking
 

diff --git a/deploy/cluster-deploy/doc/fate_on_spark/fate_on_spark_deployment_guide.zh.md b/deploy/cluster-deploy/doc/fate_on_spark/fate_on_spark_deployment_guide.zh.md
@@ -183,7 +183,7 @@ wget https://webank-ai-1251170195.cos.ap-guangzhou.myqcloud.com/fate/${version}/
 scp *.tar.gz [email protected]:/data/projects/install
 scp *.tar.gz [email protected]:/data/projects/install
 ```
-注意: 当前文档需要部署的FATE version>=1.7.0，${version}替换为如1.11.0，不带v字符
+注意: 当前文档需要部署的FATE version>=1.7.0，${version}替换为如1.11.1，不带v字符
 ### 5.2 操作系统参数检查
 
 **在目标服务器（192.168.0.1 192.168.0.2 192.168.0.3）app用户下执行**

diff --git a/deploy/standalone-deploy/README.md b/deploy/standalone-deploy/README.md
@@ -41,7 +41,7 @@ export version={FATE version for this deployment}
 example:
 
 ```bash
-export version=1.11.0
+export version=1.11.1
 ```
 
 ### 2.2 Pulling mirrors

diff --git a/deploy/standalone-deploy/README.zh.md b/deploy/standalone-deploy/README.zh.md
@@ -35,13 +35,13 @@
 设置部署所需环境变量(注意, 通过以下方式设置的环境变量仅在当前终端会话有效, 若打开新的终端会话, 如重新登录或者新窗口, 请重新设置)
 
 ```bash
-export version={本次部署的FATE版本号, 如1.11.0}
+export version={本次部署的FATE版本号, 如1.11.1}
 ```
 
 样例:
 
 ```bash
-export version=1.11.0
+export version=1.11.1
 ```
 
 ### 2.2 拉取镜像

diff --git a/doc/federatedml_component/README.md b/doc/federatedml_component/README.md
@@ -62,6 +62,8 @@ provide:
 | [Hetero SSHE Logistic Regression](logistic_regression.md)                                    | HeteroSSHELR            | Build hetero logistic regression model without arbiter                                                                     | Table, values are Instances                               | Table, values are Instances                                                           |                                                      | SSHE LR Model                                                    |
 | [Hetero SSHE Linear Regression](linear_regression.md)                                    | HeteroSSHELinR            | Build hetero linear regression model without arbiter                                                                     | Table, values are Instances                               | Table, values are Instances                                                           |                                                      | SSHE LinR Model                                                    |
 | [Positive Unlabeled Learning](positive_unlabeled.md) | PositiveUnlabeled | Build positive unlabeled learning model                                        | Table, values are Instances                               | Table, values are Instances                                                           |                                                      |     |
+| [FATE-LLM](fate_llm.md) | FATE_LLM | Federated Large Language Model                                        | Torch DataSet                               |                                                            | PreTrained Large Language Model                       |    FineTuned Large Language Model |
+
 
 ## Secure Protocol
 

diff --git a/doc/federatedml_component/README.zh.md b/doc/federatedml_component/README.zh.md
@@ -52,6 +52,7 @@ Federatedml模块包括许多常见机器学习算法联邦化实现。所有模
 | [Hetero SSHE Logistic Regression](logistic_regression.md)                                    | HeteroSSHELR            | 两方构建纵向逻辑回归（无可信第三方）                                                                     | Table, 值为Instance                           | Table, 值为Instance                                                      |                                                      | SSHE LR Model                                                    |
 | [Hetero SSHE Linear Regression](linear_regression.md)                                    | HeteroSSHELinR            | 两方构建纵向线性回归（无可信第三方）                                                                     | Table, 值为Instance                           | Table, 值为Instance                                                      |                                                      | SSHE LinR Model                                                    |
 | [Positive Unlabeled Learning](positive_unlabeled.md)                                   | PositiveUnlabeled | 构建positive unlabeled learning(PU learning)模型                                        | Table, 值为Instance                             | Table, 值为Instance                                                     |                                                      |     |
+| [FATE-LLM](fate_llm.md) | FATE_LLM | 联邦大语言模型                                        | Torch DataSet                               |                                                            | PreTrained Large Language Model                       |    FineTuned Large Language Model |
 
 
 ## 安全协议

diff --git a/doc/federatedml_component/fate_llm.md b/doc/federatedml_component/fate_llm.md
@@ -0,0 +1,42 @@
+# FATE-LLM
+FATE-LLM is a framework to support federated training with large language models, it also provides multiple parameter-efficient fine-tuning strategies[1][2] for industrial applications.
+
+## Features
+In current version, it supports the following features:
+* Integration of various large language models for federated learning: including BERT[, ALBERT, RoBERta, GPT-2, BART, DeBERta, DistilBERT, etc. 
+These models are widely used in natural language understanding and generation tasks, and can meet the needs of different application scenarios[3][4][5].
+* Integration of multiple parameter-efficient tuning methods: Bottleneck Adapters (including Houlsby, Pfeiffer, Parallel schemes), Invertible Adapters, LoRA, IA3, and Compacter
+
+## Experiment Data
+
+### Model Parameter Sizes
+The current version of FATE-LLM supports various classic large language models, with parameters amount ranging from tens of millions to 1.5 billions. 
+The following table are the parameters amounts of models we support for commonly used versions  
+![llm model parameters](../images/llm_model_parameter_amount.png)
+
+### Trainable Parameter Sizes Of Parameter-Efficient Methods
+In order to give users a more intuitive feelings for the huge improvement of federated training and transmission in FATE-LLM, 
+we will take gpt-2 as an example and show the parameter amount in the federated training and transmission process.  
+![parameter_efficient](../images/parameter_efficient_of_gpt-2.png)
+
+### Training Time Improvement:
+We present a comparison of training times between different adapter
+ methods and fine-tuning a complete model in a homo(horizontal) federated learning scenario for a text sentiment classification task using the IMDB dataset
+- Scenario: Homo(Horizontal) Federated Learning Scenario
+- Task Type: Text Sentiment Classification Task
+- Participants: Two client parties involved in model building and one server for aggregation.
+- Data & Basic parameters: IMDB dataset, with a size of 25,000, batch_size=64, padding_length=200.
+- Environment: Each modeling party uses 2x V100 32GB GPUs, and the experiments are conducted in a local area network environment.
+
+The table below shows the training time comparison between using various adapters and fine-tuning the complete model for each epoch (in seconds).
+It can be observed that the federated form of adapter+language model can significantly save training time.  
+
+![GPT-2 Training Time Improvement](../images/gpt-2_training_time_improvement.png)
+
+
+## References
+[1] Cai D, Wu Y, Wang S, et al. Autofednlp: An efficient fednlp framework[J]. arXiv preprint arXiv:2205.10162, 2022.   
+[2] Zhang Z, Yang Y, Dai Y, et al. When Federated Learning Meets Pre-trained Language Models' Parameter-Efficient Tuning Methods[J]. arXiv preprint arXiv:2212.10025, 2022.   
+[3] Zhou C, Li Q, Li C, et al. A comprehensive survey on pretrained foundation models: A history from bert to chatgpt[J].   
+[4] Devlin J, Chang M W, Lee K, et al. Bert: Pre-training of deep bidirectional transformers for language understanding[J].arXiv preprint arXiv:1810.04805, 2018.   
+[5] Radford A, Wu J, Child R, et al. Language models are unsupervised multitask learners[J]. OpenAI blog, 2019, 1(8): 9.
diff --git a/doc/images/gpt-2_training_time_improvement.png b/doc/images/gpt-2_training_time_improvement.png
diff --git a/doc/images/llm_model_parameter_amount.png b/doc/images/llm_model_parameter_amount.png
diff --git a/doc/images/parameter_efficient_of_gpt-2.png b/doc/images/parameter_efficient_of_gpt-2.png