Merge branch 'master' into feature/multi_task_add_dynamic_weight

alibaba · Aug 29, 2024 · e1ca0ed · e1ca0ed
2 parents 53d9ba2 + dd64fd9
commit e1ca0ed
Show file tree

Hide file tree

Showing 42 changed files with 935 additions and 83 deletions.
diff --git a/README.md b/README.md
@@ -63,7 +63,7 @@ Running Platform:
 - [DSSM](docs/source/models/dssm.md) / [MIND](docs/source/models/mind.md) / [DropoutNet](docs/source/models/dropoutnet.md) / [CoMetricLearningI2I](docs/source/models/co_metric_learning_i2i.md) / [PDN](docs/source/models/pdn.md)
 - [W&D](docs/source/models/wide_and_deep.md) / [DeepFM](docs/source/models/deepfm.md) / [MultiTower](docs/source/models/multi_tower.md) / [DCN](docs/source/models/dcn.md) / [FiBiNet](docs/source/models/fibinet.md) / [MaskNet](docs/source/models/masknet.md) / [PPNet](docs/source/models/ppnet.md) / [CDN](docs/source/models/cdn.md)
 - [DIN](docs/source/models/din.md) / [BST](docs/source/models/bst.md) / [CL4SRec](docs/source/models/cl4srec.md)
-- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [PLE](docs/source/models/ple.md)
+- [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md) / [DBMTL](docs/source/models/dbmtl.md) / [AITM](docs/source/models/aitm.md) / [PLE](docs/source/models/ple.md)
 - [HighwayNetwork](docs/source/models/highway.md) / [CMBF](docs/source/models/cmbf.md) / [UNITER](docs/source/models/uniter.md)
 - More models in development
 

diff --git a/docs/images/models/aitm.jpg b/docs/images/models/aitm.jpg
diff --git a/docs/source/benchmark.md b/docs/source/benchmark.md
@@ -9,6 +9,7 @@
 - 该数据集是淘宝展示广告点击率预估数据集，包含用户、广告特征和行为日志。[天池比赛链接](https://tianchi.aliyun.com/dataset/dataDetail?dataId=56)
 - 训练数据表：pai_online_project.easyrec_demo_taobao_train_data
 - 测试数据表：pai_online_project.easyrec_demo_taobao_test_data
+- 其中pai_online_project是一个公共读的MaxCompute project，里面写入了一些数据表做测试，不需要申请权限。
 - 在PAI上面测试使用的资源包括2个parameter server，9个worker，其中一个worker做评估:
   ```json
   {"ps":{"count":2,

diff --git a/docs/source/component/backbone.md b/docs/source/component/backbone.md
@@ -1111,13 +1111,14 @@ MovieLens-1M数据集效果：
 
 ## 2.特征交叉组件
 
-| 类名             | 功能               | 说明           | 示例                                                                                                                         |
-| -------------- | ---------------- | ------------ | -------------------------------------------------------------------------------------------------------------------------- |
-| FM             | 二阶交叉             | DeepFM模型的组件  | [案例2](#deepfm)                                                                                                             |
-| DotInteraction | 二阶内积交叉           | DLRM模型的组件    | [案例4](#dlrm)                                                                                                               |
-| Cross          | bit-wise交叉       | DCN v2模型的组件  | [案例3](#dcn)                                                                                                                |
-| BiLinear       | 双线性              | FiBiNet模型的组件 | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
-| FiBiNet        | SENet & BiLinear | FiBiNet模型    | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
+| 类名             | 功能                    | 说明               | 示例                                                                                                                         |
+| -------------- | --------------------- | ---------------- | -------------------------------------------------------------------------------------------------------------------------- |
+| FM             | 二阶交叉                  | DeepFM模型的组件      | [案例2](#deepfm)                                                                                                             |
+| DotInteraction | 二阶内积交叉                | DLRM模型的组件        | [案例4](#dlrm)                                                                                                               |
+| Cross          | bit-wise交叉            | DCN v2模型的组件      | [案例3](#dcn)                                                                                                                |
+| BiLinear       | 双线性                   | FiBiNet模型的组件     | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
+| FiBiNet        | SENet & BiLinear      | FiBiNet模型        | [fibinet_on_movielens.config](https://github.com/alibaba/EasyRec/tree/master/examples/configs/fibinet_on_movielens.config) |
+| Attention      | Dot-product attention | Transformer模型的组件 |                                                                                                                            |
 
 ## 3.特征重要度学习组件
 

diff --git a/docs/source/component/component.md b/docs/source/component/component.md
@@ -79,6 +79,33 @@
 | senet    | SENet    |     | protobuf message |
 | mlp      | MLP      |     | protobuf message |
 
+- Attention
+
+Dot-product attention layer, a.k.a. Luong-style attention.
+
+The calculation follows the steps:
+
+1. Calculate attention scores using query and key with shape (batch_size, Tq, Tv).
+1. Use scores to calculate a softmax distribution with shape (batch_size, Tq, Tv).
+1. Use the softmax distribution to create a linear combination of value with shape (batch_size, Tq, dim).
+
+| 参数                      | 类型     | 默认值   | 说明                                                                                                                                                                                                                                     |
+| ----------------------- | ------ | ----- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| use_scale               | bool   | False | If True, will create a scalar variable to scale the attention scores.                                                                                                                                                                  |
+| score_mode              | string | dot   | Function to use to compute attention scores, one of {"dot", "concat"}. "dot" refers to the dot product between the query and key vectors. "concat" refers to the hyperbolic tangent of the concatenation of the query and key vectors. |
+| dropout                 | float  | 0.0   | Float between 0 and 1. Fraction of the units to drop for the attention scores.                                                                                                                                                         |
+| seed                    | int    | None  | A Python integer to use as random seed incase of dropout.                                                                                                                                                                              |
+| return_attention_scores | bool   | False | if True, returns the attention scores (after masking and softmax) as an additional output argument.                                                                                                                                    |
+| use_causal_mask         | bool   | False | Set to True for decoder self-attention. Adds a mask such that position i cannot attend to positions j > i. This prevents the flow of information from the future towards the past.                                                     |
+
+- inputs: List of the following tensors:
+  - query: Query tensor of shape (batch_size, Tq, dim).
+  - value: Value tensor of shape (batch_size, Tv, dim).
+  - key: Optional key tensor of shape (batch_size, Tv, dim). If not given, will use value for both key and value, which is the most common case.
+- output:
+  - Attention outputs of shape (batch_size, Tq, dim).
+  - (Optional) Attention scores after masking and softmax with shape (batch_size, Tq, Tv).
+
 ## 3.特征重要度学习组件
 
 - SENet

diff --git a/docs/source/feature/data.md b/docs/source/feature/data.md
@@ -2,15 +2,15 @@
 
 EasyRec作为阿里云PAI的推荐算法包，可以无缝对接MaxCompute的数据表，也可以读取OSS中的大文件，还支持E-MapReduce环境中的HDFS文件，也支持local环境中的csv文件。
 
-为了识别这些输入数据中的字段信息，需要设置相应的字段名称和字段类型、设置默认值，帮助EasyRec去读取相应的数据。设置label字段，作为训练的目标。为了适应多目标模型，label字段可以设置多个。
+为了识别这些输入数据中的字段信息，需要设置相应的字段名称和字段类型、设置默认值，帮助EasyRec去读取相应的数据。设置label字段，作为训练的目标。为了适配多目标模型，label字段可设置多个。
 
 另外还有一些参数如prefetch_size，是tensorflow中读取数据需要设置的参数。
 
 ## 一个最简单的data config的配置
 
 这个配置里面，只有三个字段，用户ID（uid）、物品ID（item_id）、label字段（click）。
 
-OdpsInputV2表示读取MaxCompute的表作为输入数据。
+OdpsInputV2表示读取MaxCompute的表作为输入数据。如果是本地机器上训练，注意使用CSVInput类型。
 
 ```protobuf
 data_config {
@@ -160,7 +160,7 @@ def remap_lbl(labels):
 ### prefetch_size
 
 - data prefetch，以batch为单位，默认是32
-- 设置prefetch size可以提高数据加载的速度，防止数据瓶颈
+- 设置prefetch size可以提高数据加载的速度，防止数据瓶颈。但是当batchsize较小的时候，该值可适当调小。
 
 ### shard && file_shard
 

diff --git a/docs/source/feature/feature.rst b/docs/source/feature/feature.rst
@@ -3,7 +3,7 @@
 
 在上一节介绍了输入数据包括MaxCompute表、csv文件、hdfs文件、OSS文件等，表或文件的一列对应一个特征。
 
-在数据中可以有一个或者多个label字段，而特征比较丰富，支持的类型包括IdFeature，RawFeature，TagFeature，SequenceFeature, ComboFeature.
+在数据中可以有一个或者多个label字段，在多目标模型中，需要多个label字段。而特征比较丰富，支持的类型包括IdFeature，RawFeature，TagFeature，SequenceFeature, ComboFeature。
 
 各种特征共用字段
 ----------------------------------------------------------------
@@ -71,12 +71,12 @@ IdFeature: 离散值特征/ID类特征
 
    .. math::
 
-        embedding\_dim=8+x^{0.25}
-  - 其中，x 为不同特征取值的个数
+        embedding\_dim=8+n^{0.25}
+  - 其中，n 是特征的唯一值的个数（如gender特征的取值是男、女，则n=2）
 
 -  hash\_bucket\_size: hash bucket的大小。适用于category_id, user_id等
 
--  对于user\_id等规模比较大的，hash冲突影响比较小的特征，
+-  对于user\_id等规模比较大的，hash冲突影响比较小的特征，用户行为日志不够丰富可通过hash压缩id数量，
 
    .. math::
 
@@ -91,7 +91,8 @@ IdFeature: 离散值特征/ID类特征
 
 
 -  num\_buckets: buckets number,
-   仅仅当输入是integer类型时，可以使用num\_buckets
+   仅仅当输入是integer类型时，可以使用num\_buckets。
+   但是当使用fg特征的时候，不要用integer特征用num\_buckets的方式来变换，注意要用hash\_bucket\_size的方式。
 
 -  vocab\_list:
    指定词表，适合取值比较少可以枚举的特征，如星期，月份，星座等

diff --git a/docs/source/feature/pai_rec_callback_conf.md b/docs/source/feature/pai_rec_callback_conf.md
@@ -1,5 +1,9 @@
 # PAI-REC 全埋点配置
 
+## PAI-Rec引擎的callback服务文档
+
+- [文档](http://pai-vision-data-hz.oss-cn-zhangjiakou.aliyuncs.com/pairec/docs/pairec/html/intro/callback_api.html)
+
 ## 模板
 
 ```json

diff --git a/docs/source/feature/rtp_fg.md b/docs/source/feature/rtp_fg.md
@@ -2,7 +2,7 @@
 
 - RTP FG: RealTime Predict Feature Generation, 解决实时预测需要的特征工程需求. 特征工程在推荐链路里面也占用了比较长的时间.
 
-- RTP FG能够以比较高的效率生成一些复杂的交叉特征，如match feature和lookup feature, 通过使用同一套c++代码保证离线在线的一致性.
+- RTP FG能够以比较高的效率生成一些复杂的交叉特征，如match feature和lookup feature.离线训练和在线预测的时候通过使用同一套c++代码保证离线在线的一致性.
 
 - 其生成的特征可以接入EasyRec进行训练，从RTP FG的配置(fg.json)可以生成EasyRec的配置文件(pipeline.config).
 

diff --git a/docs/source/feature/rtp_native.md b/docs/source/feature/rtp_native.md
@@ -1,6 +1,6 @@
 # RTP部署
 
-本文档介绍将EasyRec模型部署到RTP上的流程.
+本文档介绍将EasyRec模型部署到RTP（Real Time Prediction，实时打分服务）上的流程.
 
 - RTP目前仅支持checkpoint形式的模型部署，因此需要将EasyRec模型导出为checkpoint形式
 

diff --git a/docs/source/intro.md b/docs/source/intro.md
@@ -63,4 +63,5 @@ EasyRec implements state of the art machine learning models used in common recom
 
 ### Contact
 
+- DingDing Group: 32260796. (EasyRec usage general discussion.)
 - DingDing Group: 37930014162, click [this url](https://qr.dingtalk.com/action/joingroup?code=v1,k1,oHNqtNObbu+xUClHh77gCuKdGGH8AYoQ8AjKU23zTg4=&_dt_no_comment=1&origin=11) or scan QrCode to join![new_group.jpg](../images/qrcode/new_group.jpg)
diff --git a/docs/source/models/aitm.md b/docs/source/models/aitm.md
@@ -0,0 +1,118 @@
+# AITM
+
+### 简介
+
+在推荐场景里，用户的转化链路往往有多个中间步骤（曝光->点击->转化），AITM是一种多任务模型框架，充分利用了链路上各个节点的样本，提升模型对后端节点转化率的预估。
+
+![AITM](../../images/models/aitm.jpg)
+
+1. (a) Expert-Bottom pattern。如 [MMoE](mmoe.md)
+1. (b) Probability-Transfer pattern。如 [ESMM](esmm.md)
+1. (c)  Adaptive Information Transfer Multi-task (AITM) framework.
+
+两个特点：
+
+1. 使用Attention机制来融合多个目标对应的特征表征；
+1. 引入了行为校正的辅助损失函数。
+
+### 配置说明
+
+```protobuf
+model_config {
+  model_name: "AITM"
+  model_class: "MultiTaskModel"
+  feature_groups {
+    group_name: "all"
+    feature_names: "user_id"
+    feature_names: "cms_segid"
+    ...
+    feature_names: "tag_brand_list"
+    wide_deep: DEEP
+  }
+  backbone {
+    blocks {
+      name: "mlp"
+      inputs {
+        feature_group_name: "all"
+      }
+      keras_layer {
+        class_name: 'MLP'
+        mlp {
+          hidden_units: [512, 256]
+        }
+      }
+    }
+  }
+  model_params {
+    task_towers {
+      tower_name: "ctr"
+      label_name: "clk"
+      loss_type: CLASSIFICATION
+      metrics_set: {
+        auc {}
+      }
+      dnn {
+        hidden_units: [256, 128]
+      }
+      use_ait_module: true
+      weight: 1.0
+    }
+    task_towers {
+      tower_name: "cvr"
+      label_name: "buy"
+      losses {
+        loss_type: CLASSIFICATION
+      }
+      losses {
+        loss_type: ORDER_CALIBRATE_LOSS
+      }
+      metrics_set: {
+        auc {}
+      }
+      dnn {
+        hidden_units: [256, 128]
+      }
+      relation_tower_names: ["ctr"]
+      use_ait_module: true
+      ait_project_dim: 128
+      weight: 1.0
+    }
+    l2_regularization: 1e-6
+  }
+  embedding_regularization: 5e-6
+}
+```
+
+- model_name: 任意自定义字符串，仅有注释作用
+
+- model_class: 'MultiTaskModel', 不需要修改, 通过组件化方式搭建的多目标排序模型都叫这个名字
+
+- feature_groups: 配置一组特征。
+
+- backbone: 通过组件化的方式搭建的主干网络，[参考文档](../component/backbone.md)
+
+  - blocks: 由多个`组件块`组成的一个有向无环图（DAG），框架负责按照DAG的拓扑排序执行个`组件块`关联的代码逻辑，构建TF Graph的一个子图
+  - name/inputs: 每个`block`有一个唯一的名字（name），并且有一个或多个输入(inputs)和输出
+  - keras_layer: 加载由`class_name`指定的自定义或系统内置的keras layer，执行一段代码逻辑；[参考文档](../component/backbone.md#keraslayer)
+  - mlp: MLP模型的参数，详见[参考文档](../component/component.md#id1)
+
+- model_params: AITM相关的参数
+
+  - task_towers 根据任务数配置task_towers
+    - tower_name
+    - dnn deep part的参数配置
+      - hidden_units: dnn每一层的channel数目，即神经元的数目
+    - use_ait_module: if true 使用`AITM`模型；否则，使用[DBMTL](dbmtl.md)模型
+    - ait_project_dim: 每个tower对应的表征向量的维度，一般设为最后一个隐藏的维度即可
+    - 默认为二分类任务，即num_class默认为1，weight默认为1.0，loss_type默认为CLASSIFICATION，metrics_set为auc
+    - loss_type: ORDER_CALIBRATE_LOSS 使用目标依赖关系校正预测结果的辅助损失函数，详见原始论文
+    - 注：label_fields需与task_towers一一对齐。
+  - embedding_regularization: 对embedding部分加regularization，防止overfit
+
+### 示例Config
+
+- [AITM_demo.config](https://github.com/alibaba/EasyRec/blob/master/samples/model_config/aitm_on_taobao.config)
+
+### 参考论文
+
+[AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
diff --git a/docs/source/models/loss.md b/docs/source/models/loss.md
@@ -19,6 +19,7 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 | PAIRWISE_LOGISTIC_LOSS                     | pair粒度的logistic loss, 支持自定义pair分组                          |
 | JRC_LOSS                                   | 二分类 + listwise ranking loss                                |
 | F1_REWEIGHTED_LOSS                         | 可以调整二分类召回率和准确率相对权重的损失函数，可有效对抗正负样本不平衡问题                     |
+| ORDER_CALIBRATE_LOSS                       | 使用目标依赖关系校正预测结果的辅助损失函数，详见[AITM](aitm.md)模型                  |
 
 - 说明：SOFTMAX_CROSS_ENTROPY_WITH_NEGATIVE_MINING
   - 支持参数配置，升级为 [support vector guided softmax loss](https://128.84.21.199/abs/1812.11317) ，
@@ -71,9 +72,9 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 
   - f1_beta_square: 大于1的值会导致模型更关注recall，小于1的值会导致模型更关注precision
   - F1 分数，又称平衡F分数（balanced F Score），它被定义为精确率和召回率的调和平均数。
-    - ![f1 score](../images/other/f1_score.svg)
+    - ![f1 score](../../images/other/f1_score.svg)
   - 更一般的，我们定义 F_beta 分数为:
-    - ![f_beta score](../images/other/f_beta_score.svg)
+    - ![f_beta score](../../images/other/f_beta_score.svg)
   - f1_beta_square 即为 上述公式中的 beta 系数的平方。
 
 - PAIRWISE_FOCAL_LOSS 的参数配置
@@ -159,3 +160,4 @@ EasyRec支持两种损失函数配置方式：1）使用单个损失函数；2
 
 - 《 Multi-Task Learning Using Uncertainty to Weigh Losses for Scene Geometry and Semantics 》
 - 《 [Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning](https://arxiv.org/abs/2111.10603) 》
+- [AITM: Modeling the Sequential Dependence among Audience Multi-step Conversions with Multi-task Learning in Targeted Display Advertising](https://arxiv.org/pdf/2105.08489.pdf)
diff --git a/docs/source/models/multi_target.rst b/docs/source/models/multi_target.rst
@@ -7,5 +7,6 @@
    esmm
    mmoe
    dbmtl
+   aitm
    ple
    simple_multi_task