Merge pull request #4812 from FederatedAI/develop-1.11.1

Update documents
FederatedAI · Apr 21, 2023 · 9432615 · 9432615
2 parents 5fa5522 + faa4da8
commit 9432615
Show file tree

Hide file tree

Showing 21 changed files with 93 additions and 608 deletions.
diff --git a/README.md b/README.md
@@ -39,6 +39,7 @@ Deploying FATE to multiple nodes to achieve scalability, reliability and managea
 - [Train & Predict Hetero SecureBoost with FATE-Pipeline](./doc/tutorial/pipeline/pipeline_tutorial_hetero_sbt.ipynb)
 - [Build & Customize NN models with FATE-Pipeline](./doc/tutorial/pipeline/nn_tutorial/README.md)
 - [Run Job with DSL json conf](doc/tutorial/dsl_conf/dsl_conf_tutorial.md)
+- [FATE-LLM Training Guides](doc/tutorial/fate_llm/README.md)
 - [More Tutorials...](doc/tutorial)
 
 ## Related Repositories (Projects)

diff --git a/README_zh.md b/README_zh.md
@@ -36,6 +36,7 @@ FATE 支持多种部署模式，用户可以根据自身情况进行选择。[
 - [使用FATE-Pipeline训练及预测纵向SBT任务](./doc/tutorial/pipeline/pipeline_tutorial_hetero_sbt.ipynb)
 - [使用FATE-Pipeline构建横、纵向神经网络模型](doc/tutorial/pipeline/nn_tutorial/README.md)
 - [使用DSL json conf运行任务](doc/tutorial/dsl_conf/dsl_conf_tutorial.md)
+- [FATE-LLM训练教程](doc/tutorial/fate_llm/README.md)
 - [更多教程](doc/tutorial)
 
 ## 关联仓库

diff --git a/doc/federatedml_component/intersect.md b/doc/federatedml_component/intersect.md
@@ -40,13 +40,6 @@ finding common even ids.
 With RSA intersection, participants can get their intersection ids
 securely and efficiently.
 
-## RAW Intersection
-
-This mode implements the simple intersection method in which a
-participant sends all its ids to another participant, and the other
-participant finds their common ids. Finally, the joining role will send
-the intersection ids to the sender.
-
 ## DH Intersection
 
 This mode implements secure intersection based on symmetric encryption
@@ -88,7 +81,7 @@ Intersection support cache.
 
 ## Multi-Host Intersection
 
-RSA, RAW, and DH intersection support multi-host scenario. It means a
+RSA,  and DH intersection support multi-host scenario. It means a
 guest can perform intersection with more than one host simultaneously
 and get the common ids among all participants.
 
@@ -155,14 +148,13 @@ And for Host:
 
 ## Feature
 
-Below lists features of each ECDH, RSA, DH, and RAW intersection methods.
+Below lists features of each ECDH, RSA and DH intersection methods.
 
 | Intersect Methods 	| PSI                                                                     	| Match-ID Support                                                       	| Multi-Host                                                                   	| Exact-Cardinality                                                                              	| Estimated Cardinality                                                              	| Preprocessing                                                                        	| Cache                                                                         	|
 |-------------------	|-------------------------------------------------------------------------	|------------------------------------------------------------------------	|------------------------------------------------------------------------------	|------------------------------------------------------------------------------------------------	|------------------------------------------------------------------------------------	|--------------------------------------------------------------------------------------	|-------------------------------------------------------------------------------	|
 | ECDH              	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-ecdh.py) 	| &check;                                                                	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-ecdh-multi)    | [&check;](../../examples/dsl/v2/intersect/test_intersect_job_ecdh_exact_cardinality_conf.json) 	| &cross;                                                                            	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-ecdh-w-preprocess.py) 	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-ecdh-cache.py) 	|
 | RSA               	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-rsa.py)  	| [&check;](../../examples/pipeline/match_id_test/pipeline-hetero-lr.py) 	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-multi-rsa.py) 	| &cross;                                                                                        	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-rsa-cardinality.py) 	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-dh-w-preprocess.py)   	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-rsa-cache.py)  	|
 | DH                	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-dh.py)   	| &check;                                                                	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-dh-multi.py)  	| [&check;](examples/dsl/v2/intersect/test_intersect_job_dh_exact_cardinality_conf.json)         	| &cross;                                                                            	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-rsa-w-preprocess.py)  	| [&check;](../../examples/pipeline/intersect/pipeline-intersect-dh-cache.py)   	|
-| RAW               	| &check;                                                                 	| &check;                                                                	| &check;                                                                      	| &cross;                                                                                        	| &cross;                                                                            	| &check;                                                                              	| &cross;                                                                       	|
 
 All four methods support:
 
@@ -180,10 +172,6 @@ RSA, DH, ECDH intersection methods also support:
 
 1.  PSI with cache
 
-RAW intersection supports the following extra feature:
-
-1.  base64 encoding may be used for all hashing methods.
-
 Cardinality Computation:
 
 1. Set `cardinality_method` to `rsa` will produce estimated intersection cardinality;

diff --git a/doc/tutorial/README.zh.md b/doc/tutorial/README.zh.md
@@ -8,6 +8,8 @@
 - [用 `Pipeline` 进行 `Hetero SecureBoost` 训练和预测](pipeline/pipeline_tutorial_hetero_sbt.ipynb)
 - [用 `Pipeline` 构建神经网络模型](pipeline/nn_tutorial/README.md)
 - [用 `Pipeline` 进行带 `Match ID` 的 `Hetero SecureBoost` 训练和预测](pipeline/pipeline_tutorial_match_id.ipynb)
+- [上传带 `Meta` 的数据及`Hetero SecureBoost`训练](pipeline/pipeline_tutorial_uploading_data_with_meta.ipynb)
+- [多列匹配ID时指定特定列求交任务](pipeline/pipeline_tutorial_multiple_id_columns.ipynb)
 
 不使用 `Pipeline` 来提交任务也是支持的，用户需要配置一些 `json` 格式的任务配置文件:
 
@@ -22,3 +24,6 @@
 用 `FATE-Test` 跑多个任务:
 
 - [FATE-Test 教程](fate_test_tutorial.md)
+
+多方模型合并并导出为 sklearn/LightGBM 或者 PMML 格式：
+- [模型合并导出](./model_merge.md)
diff --git a/...l/pipeline/nn_tutorial/GPT2-example.ipynb → doc/tutorial/fate_llm/GPT2-example.ipynb b/...l/pipeline/nn_tutorial/GPT2-example.ipynb → doc/tutorial/fate_llm/GPT2-example.ipynb
@@ -5,15 +5,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "#  Federated GPT-2 Tuning with Parameter Efficient methods in FATE-1.11"
+    "#  Federated GPT-2 Tuning with Parameter Efficient methods in FATE-LLM"
    ]
   },
   {
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this tutorial, we will demonstrate how to efficiently train federated large language models using the FATE 1.11 framework. In FATE-1.11, we introduce the \"pellm\"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on GPT-2, and we will also emphasize the use of the Adapter mechanism for fine-tuning GPT-2, which enables us to effectively reduce communication volume and improve overall efficiency.\n",
+    "In this tutorial, we will demonstrate how to efficiently train federated large language models using the FATE-LLM framework. In FATE-LLM, we introduce the \"pellm\"(Parameter Efficient Large Language Model) module, specifically designed for federated learning with large language models. We enable the implementation of parameter-efficient methods in federated learning, reducing communication overhead while maintaining model performance. In this tutorial we particularlly focus on GPT-2, and we will also emphasize the use of the Adapter mechanism for fine-tuning GPT-2, which enables us to effectively reduce communication volume and improve overall efficiency.\n",
     "\n",
     "By following this tutorial, you will learn how to leverage the FATE framework to rapidly fine-tune federated large language models, such as GPT-2, with ease and efficiency."
    ]
@@ -600,7 +600,7 @@
     "                             padding_side=\"left\", return_input_ids=False, pad_token='<|endoftext|>')\n",
     "# TrainerParam\n",
     "trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, \n",
-    "                             data_loader_worker=8, secure_aggregate=False)\n",
+    "                             data_loader_worker=8, secure_aggregate=True)\n",
     "\n",
     "\n",
     "nn_component = HomoNN(name='nn_0', model=model)\n",
@@ -660,7 +660,7 @@
    "outputs": [],
    "source": [
     "trainer_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, \n",
-    "                             data_loader_worker=8, secure_aggregate=False, cuda=0)"
+    "                             data_loader_worker=8, secure_aggregate=True, cuda=0)"
    ]
   },
   {
@@ -690,11 +690,11 @@
    "outputs": [],
    "source": [
     "client_0_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, \n",
-    "                             data_loader_worker=8, secure_aggregate=False, cuda=[0, 1, 2, 3])\n",
+    "                             data_loader_worker=8, secure_aggregate=True, cuda=[0, 1, 2, 3])\n",
     "client_1_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, \n",
-    "                             data_loader_worker=8, secure_aggregate=False, cuda=[0, 3, 4])\n",
+    "                             data_loader_worker=8, secure_aggregate=True, cuda=[0, 3, 4])\n",
     "server_param = TrainerParam(trainer_name='fedavg_trainer', epochs=1, batch_size=8, \n",
-    "                             data_loader_worker=8, secure_aggregate=False)\n",
+    "                             data_loader_worker=8, secure_aggregate=True)\n",
     "\n",
     "# set parameter for client 1\n",
     "nn_component.get_party_instance(role='guest', party_id=guest_0).component_param(\n",

diff --git a/...ipeline/nn_tutorial/GPT2-multi-task.ipynb → doc/tutorial/fate_llm/GPT2-multi-task.ipynb b/...ipeline/nn_tutorial/GPT2-multi-task.ipynb → doc/tutorial/fate_llm/GPT2-multi-task.ipynb
@@ -5,15 +5,15 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "# Multi-Task Federated Learning with GPT-2 using FATE-1.11"
+    "# Multi-Task Federated Learning with GPT-2 using FATE-LLM"
    ]
   },
   {
    "attachments": {},
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "In this tutorial, we will explore the implementation of multi-task federated learning with LM: GPT-2 using the FATE-1.11 framework. FATE-1.11 provides  the \"pellm\" module for efficient federated learning. It is specifically designed for large language models in a federated setting.\n",
+    "In this tutorial, we will explore the implementation of multi-task federated learning with LM: GPT-2 using the FATE-LLM framework. FATE-LLM provides  the \"pellm\" module for efficient federated learning. It is specifically designed for large language models in a federated setting.\n",
     "\n",
     "Multi-task learning involves training a model to perform multiple tasks simultaneously. In this tutorial, we will focus on two tasks - sentiment classification and named entity recognition (NER) - and show how they can be combined with GPT-2 in a federated learning setting. We will use the IMDB sentiment analysis dataset and the CoNLL-2003 NER dataset for our tasks.\n",
     "\n",
@@ -699,7 +699,7 @@
     "dataset_param = DatasetParam(dataset_name='multitask_ds', take_limits=50, tokenizer_name_or_path=model_path)\n",
     "# TrainerParam\n",
     "trainer_param = TrainerParam(trainer_name='multi_task_fedavg', epochs=1, batch_size=4, \n",
-    "                             data_loader_worker=8, secure_aggregate=False)\n",
+    "                             data_loader_worker=8, secure_aggregate=True)\n",
     "loss = t.nn.CustLoss(loss_module_name='multi_task_loss', class_name='MultiTaskLoss', task_weights=[0.5, 0.5])\n",
     "\n",
     "\n",

diff --git a/doc/tutorial/fate_llm/README.md b/doc/tutorial/fate_llm/README.md
@@ -0,0 +1,5 @@
+# Usage
+Here we provide tutorials of FATE-LLM training:
+
+- [FATE-LLM example with GPT-2](GPT2-example.ipynb)
+- [FATE-LLM Multi-Task GPT-2: Classification and NER Tagging](GPT2-multi-task.ipynb)
diff --git a/doc/tutorial/pipeline/nn_tutorial/README.md b/doc/tutorial/pipeline/nn_tutorial/README.md
@@ -66,10 +66,9 @@ In order to show you how to develop your own Trainer, here we try to develop a s
 
 Here we offer some advanced examples of using FATE-NN framework.
 
-## Fed-PELLM(Parameter Efficient Large Language Model) Training
+## FATE-LLM(Federated Large Language Models) Training
 
-- [Federated PELLM example with GPT-2](./GPT2-example.ipynb)
-- [Federated Multi-Task GPT-2: Classification and NER Tagging](./GPT2-multi-task.ipynb)
+- [FATE-LLM Training Guides](../../fate_llm/README.md)
 
 ## Resnet classification(Homo-NN)
 

diff --git a/examples/dsl/v2/intersect/README.md b/examples/dsl/v2/intersect/README.md
@@ -4,103 +4,89 @@ This section introduces the dsl and conf for usage of different type of task.
 
 #### Intersection Task.
 
-1. RAW Intersection:  
-    - dsl: test_intersect_job_dsl.json  
-    - runtime_config : test_intersect_job_raw_conf.json
-
-2. RAW Intersection with SM3 Hashing:  
-    - dsl: test_intersect_job_dsl.json  
-    - runtime_config : test_intersect_job_raw_sm3_conf.json
-
-3. RSA Intersection:  
+1. RSA Intersection:  
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_rsa_conf.json
 
-4. RSA Intersection with Random Base Fraction set to 0.5:
+2. RSA Intersection with Random Base Fraction set to 0.5:
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_rsa_fraction_conf.json
 
-5. RSA Intersection with Calculation Split:
+3. RSA Intersection with Calculation Split:
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_rsa_split_conf.json
 
-6. RSA Multi-hosts Intersection:  
+4. RSA Multi-hosts Intersection:  
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_rsa_multi_host_conf.json  
 
     This dsl is an example of guest runs intersection with two hosts using rsa intersection. It can be used as more than two hosts.
 
-7. RAW Multi-hosts Intersection:  
-    - dsl: test_intersect_job_dsl.json  
-    - runtime_config : test_intersect_job_raw_multi_host_conf.json  
-
-    This dsl is an example of guest runs intersection with two hosts using rsa intersection. It can be used as more than two hosts.
-
-8. DH Intersection:  
+5. DH Intersection:  
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_dh_conf.json
 
-9. DH Multi-host Intersection:  
+6. DH Multi-host Intersection:  
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_dh_multi_conf.json
 
-10. ECDH Intersection:  
-    - dsl: test_intersect_job_dsl.json  
-    - runtime_config : test_intersect_job_ecdh_conf.json
+7. ECDH Intersection:  
+   - dsl: test_intersect_job_dsl.json  
+   - runtime_config : test_intersect_job_ecdh_conf.json
 
-11. ECDH Intersection with Preprocessing:  
-    - dsl: test_intersect_job_dsl.json  
-    - runtime_config : test_intersect_job_ecdh_w_preprocess_conf.json
+8. ECDH Intersection with Preprocessing:  
+   - dsl: test_intersect_job_dsl.json  
+   - runtime_config : test_intersect_job_ecdh_w_preprocess_conf.json
 
-12. RSA Intersection with Cache:  
-    - dsl: test_intersect_job_cache_dsl.json  
-    - runtime_config : test_intersect_job_rsa_cache_conf.json
+9. RSA Intersection with Cache:  
+   - dsl: test_intersect_job_cache_dsl.json  
+   - runtime_config : test_intersect_job_rsa_cache_conf.json
 
-13. DH Intersection with Cache:  
+10. DH Intersection with Cache:  
     - dsl: test_intersect_job_cache_dsl.json  
     - runtime_config : test_intersect_job_dh_cache_conf.json
 
-14. ECDH Intersection with Cache:  
+11. ECDH Intersection with Cache:  
     - dsl: test_intersect_job_cache_dsl.json  
     - runtime_config : test_intersect_job_ecdh_cache_conf.json
 
-15. RSA Intersection with Cache Loader:  
+12. RSA Intersection with Cache Loader:  
     - dsl: test_intersect_job_cache_loader_dsl.json  
     - runtime_config : test_intersect_job_rsa_cache_loader_conf.json
 
-16. Estimated Intersect Cardinality:
+13. Estimated Intersect Cardinality:
     - dsl: test_intersect_job_dsl.json
     - runtime_config: "test_intersect_job_rsa_cardinality_conf.json
 
-17. Exact Intersect Cardinality with ECDH:
+14. Exact Intersect Cardinality with ECDH:
     - dsl: test_intersect_job_dsl.json
     - runtime_config: "test_intersect_job_ecdh_exact_cardinality_conf.json
 
-18. Exact Intersect Cardinality with DH:
+15. Exact Intersect Cardinality with DH:
     - dsl: test_intersect_job_dsl.json
     - runtime_config: "test_intersect_job_dh_exact_cardinality_conf.json
 
-19. DH Intersection with Preprocessing:  
+16. DH Intersection with Preprocessing:  
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_dh_w_preprocess_conf.json
 
-20. RSA Intersection with Preprocessing:  
+17. RSA Intersection with Preprocessing:  
     - dsl: test_intersect_job_dsl.json  
     - runtime_config : test_intersect_job_rsa_w_preprocess_conf.json
 
-21. ECDH Intersection with Cache Loader:  
+18. ECDH Intersection with Cache Loader:  
     - dsl: test_intersect_job_cache_loader_dsl.json  
     - runtime_config : test_intersect_job_ecdh_cache_loader_conf.json
 
-22. Exact Multi-host Intersect Cardinality with ECDH:
+19. Exact Multi-host Intersect Cardinality with ECDH:
     - dsl: test_intersect_job_dsl.json
     - runtime_config: "test_intersect_job_ecdh_multi_exact_cardinality_conf.json
 
-23. Exact Multi-host Intersect Cardinality with DH:
+20. Exact Multi-host Intersect Cardinality with DH:
     - dsl: test_intersect_job_dsl.json
     - runtime_config: "test_intersect_job_dh_multi_exact_cardinality_conf.json
 
-24. Exact Multi-host Intersect with ECDH:
+21. Exact Multi-host Intersect with ECDH:
     - dsl: test_intersect_job_dsl.json
     - runtime_config: "test_intersect_job_ecdh_multi_conf.json
 

diff --git a/examples/dsl/v2/intersect/intersect_testsuite.json b/examples/dsl/v2/intersect/intersect_testsuite.json
@@ -26,14 +26,6 @@
         }
     ],
     "tasks": {
-        "raw_intersect": {
-            "conf": "./test_intersect_job_raw_conf.json",
-            "dsl": "./test_intersect_job_dsl.json"
-        },
-        "raw_intersect_sm3": {
-            "conf": "./test_intersect_job_raw_sm3_conf.json",
-            "dsl": "./test_intersect_job_dsl.json"
-        },
         "rsa_intersect": {
             "conf": "./test_intersect_job_rsa_conf.json",
             "dsl": "./test_intersect_job_dsl.json"
@@ -54,10 +46,6 @@
             "conf": "./test_intersect_job_rsa_w_preprocess_conf.json",
             "dsl": "./test_intersect_job_dsl.json"
         },
-        "raw_intersect_multi_host": {
-            "conf": "./test_intersect_job_raw_multi_host_conf.json",
-            "dsl": "./test_intersect_job_dsl.json"
-        },
         "dh_intersect": {
             "conf": "./test_intersect_job_dh_conf.json",
             "dsl": "./test_intersect_job_dsl.json"