Skip to content

Commit

Permalink
initial commit
Browse files Browse the repository at this point in the history
  • Loading branch information
chengmengli06 committed Aug 23, 2021
1 parent 76fae0e commit 75b1599
Show file tree
Hide file tree
Showing 297 changed files with 33,521 additions and 1,931 deletions.
50 changes: 50 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -1 +1,51 @@
data/** filter=lfs diff=lfs merge=lfs -text
data filter=lfs diff=lfs merge=lfs -text
data/test/inference/lookup_export/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_multi/variables/variables.index filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_export/assets/pipeline.config filter=lfs diff=lfs merge=lfs -text
data/test/latest_ckpt_test/model.ckpt-500.meta filter=lfs diff=lfs merge=lfs -text
data/test/tb_data/taobao_test_data filter=lfs diff=lfs merge=lfs -text
data/test/test.csv filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_placeholder_rename_export/assets/pipeline.config filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_export/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
data/test/tb_data_with_time/taobao_test_data_with_time filter=lfs diff=lfs merge=lfs -text
data/test/latest_ckpt_test/model.ckpt-500.index filter=lfs diff=lfs merge=lfs -text
data/test/lookup_data.csv filter=lfs diff=lfs merge=lfs -text
data/test/criteo_sample.tfrecord filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_valid.csv filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_train_feature.txt filter=lfs diff=lfs merge=lfs -text
data/test/tb_data/taobao_train_data filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_single/variables/variables.index filter=lfs diff=lfs merge=lfs -text
data/test/inference/lookup_data_test80.csv filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_export/variables/variables.index filter=lfs diff=lfs merge=lfs -text
data/test/export/data.csv filter=lfs diff=lfs merge=lfs -text
data/test/embed_data.csv filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_fg_pred.out filter=lfs diff=lfs merge=lfs -text
data/test/dwd_avazu_ctr_deepmodel_10w.csv filter=lfs diff=lfs merge=lfs -text
data/test/tb_data/taobao_train_data_kd filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_single/saved_model.pb filter=lfs diff=lfs merge=lfs -text
data/test/inference/lookup_export/variables/variables.index filter=lfs diff=lfs merge=lfs -text
data/test/tag_kv_data.csv filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_train_input.txt filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_test_bucketize_feature.txt filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_placeholder_rename_export/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_single/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_multi/assets/pipeline.config filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_multi/variables/variables.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
data/test/inference/lookup_export/assets/pipeline.config filter=lfs diff=lfs merge=lfs -text
data/test/inference/lookup_export/saved_model.pb filter=lfs diff=lfs merge=lfs -text
data/test/inference/taobao_infer_data.txt filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_placeholder_rename_export/saved_model.pb filter=lfs diff=lfs merge=lfs -text
data/test/tb_data/taobao_test_data_kd filter=lfs diff=lfs merge=lfs -text
data/test/hpo_test/eval_val/events.out.tfevents.1597889819.j63d04245.sqa.eu95 filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_placeholder_rename_export/variables/variables.index filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_single/assets/pipeline.config filter=lfs diff=lfs merge=lfs -text
data/test/inference/fg_export_multi/saved_model.pb filter=lfs diff=lfs merge=lfs -text
data/test/inference/tb_multitower_export/saved_model.pb filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_test_input.txt filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_test_feature.txt filter=lfs diff=lfs merge=lfs -text
data/test/test_with_quote.csv filter=lfs diff=lfs merge=lfs -text
data/test/tb_data_with_time/taobao_train_data_with_time filter=lfs diff=lfs merge=lfs -text
data/test/latest_ckpt_test/model.ckpt-500.data-00000-of-00001 filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_valid_feature.txt filter=lfs diff=lfs merge=lfs -text
data/test/rtp/taobao_train_bucketize_feature.txt filter=lfs diff=lfs merge=lfs -text
29 changes: 24 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,12 +12,12 @@ EasyRec致力于成为容易上手的工业界深度学习推荐算法框架,

### Run everywhere

- [MaxCompute](https://help.aliyun.com/product/27797.html) / [DataScience](https://help.aliyun.com/document_detail/170836.html) / [DLC](https://www.alibabacloud.com/help/zh/doc-detail/165137.htm?spm=a2c63.p38356.b99.79.4c0734a4bVav8D) / Local
- Local / [MaxCompute](https://help.aliyun.com/product/27797.html) / [DataScience](https://help.aliyun.com/document_detail/170836.html) / [DLC](https://www.alibabacloud.com/help/zh/doc-detail/165137.htm?spm=a2c63.p38356.b99.79.4c0734a4bVav8D)
- TF1.12-1.15 / TF2.x / PAI-TF

### Diversified input data

- MaxCompute Table
- [MaxCompute Table](https://help.aliyun.com/document_detail/27819.html?spm=a2c4g.11186623.6.554.91d517bazK7nTF)
- HDFS files
- [OSS files](https://help.aliyun.com/product/31815.html?spm=5176.7933691.1309819.8.5bb52a66ZQOobj)
- Kafka Streams
Expand All @@ -32,7 +32,7 @@ EasyRec致力于成为容易上手的工业界深度学习推荐算法框架,
### It is smart

- EarlyStop / Best Checkpoint Saver
- Hyper Parameter Search / AutoFeatureCross
- [Hyper Parameter Search](docs/source/automl/hpo_pai.md) / [AutoFeatureCross](docs/source/automl/auto_cross_emr.md)
- In development: NAS, Knowledge Distillation, MultiModal

### Large scale and easy deployment
Expand All @@ -44,10 +44,29 @@ EasyRec致力于成为容易上手的工业界深度学习推荐算法框架,

### A variety of models

- DeepFM / MultiTower / Deep Interest Network / DSSM / MMoE / ESMM
- [DeepFM](docs/source/models/deepfm.md) / [MultiTower](docs/source/models/multi_tower.md) / [Deep Interest Network](docs/source/models/din.md) / [DSSM](docs/source/models/dssm.md) / [MMoE](docs/source/models/mmoe.md) / [ESMM](docs/source/models/esmm.md)
- More models in development

### Easy to customize

- Easy to implement customized models
- Easy to implement [customized models](docs/source/models/user_define.md)
- Not need to care about data pipelines

### Get Started

- Download
```
git clone https://github.com/AlibabaPAI/EasyRec.git
wget https://easyrec.oss-cn-beijing.aliyuncs.com/data/easyrec_data_20210818.tar.gz
```

- [EasyRec Framework](https://easyrec.oss-cn-beijing.aliyuncs.com/docs/EasyRec.pptx)

- [Run](docs/source/quick_start/local_tutorial.md)

- [PAI-DSW DEMO](https://dsw-dev.data.aliyun.com/#/?fileUrl=http://easyrec.oss-cn-beijing.aliyuncs.com/dsw/easy_rec_demo.ipynb&fileName=EasyRec_DeepFM.ipynb)
(Rember to select Python 3 kernel)

- [Develop](docs/source/develop.md)

- [Doc](https://easyrec.readthedocs.io/en/latest/)
Binary file added docs/images/faq/field_type2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/autoint.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/dcn.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/dssm_neg_sampler.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/fm.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/mind.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/rocket_launching.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/simple_multi_task.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/models/wide_and_deep.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/Role0.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/Role1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/Role2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/Role3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/log1.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/log2.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/log3.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/log4.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/log5.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/other/log6.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/images/quick_start/image.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
10 changes: 10 additions & 0 deletions docs/source/api/easy_rec.python.core.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,3 +8,13 @@ easy\_rec.python.core.learning\_schedules
:members:
:undoc-members:
:show-inheritance:

.. automodule:: easy_rec.python.core.metrics
:members:
:undoc-members:
:show-inheritance:

.. automodule:: easy_rec.python.core.sampler
:members:
:undoc-members:
:show-inheritance:
16 changes: 16 additions & 0 deletions docs/source/api/easy_rec.python.layers.rst
Original file line number Diff line number Diff line change
Expand Up @@ -48,3 +48,19 @@ easy\_rec.python.layers.seq\_input\_layer
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.layers.multihead\_attention\_layer
------------------------------------------------

.. automodule:: easy_rec.python.layers.multihead_attention
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.layers.mmoe
------------------------------------------------

.. automodule:: easy_rec.python.layers.mmoe
:members:
:undoc-members:
:show-inheritance:
40 changes: 40 additions & 0 deletions docs/source/api/easy_rec.python.model.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,14 @@ easy\_rec.python.model.fm
:undoc-members:
:show-inheritance:

easy\_rec.python.model.wide_and_deep
--------------------------------

.. automodule:: easy_rec.python.model.wide_and_deep
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.model.deepfm
------------------------------------

Expand All @@ -41,6 +49,30 @@ easy\_rec.python.model.multi\_tower
:undoc-members:
:show-inheritance:

easy\_rec.python.model.dcn
------------------------------------------

.. automodule:: easy_rec.python.model.dcn
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.model.autoint
------------------------------------------

.. automodule:: easy_rec.python.model.autoint
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.model.dbmtl
------------------------------------------

.. automodule:: easy_rec.python.model.dbmtl
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.model.multi\_tower\_bst
-----------------------------------------------

Expand All @@ -65,6 +97,14 @@ easy\_rec.python.model.dssm
:undoc-members:
:show-inheritance:

easy\_rec.python.model.mind
----------------------------------

.. automodule:: easy_rec.python.model.mind
:members:
:undoc-members:
:show-inheritance:

easy\_rec.python.model.multi\_task\_model
------------------------------------------------

Expand Down
20 changes: 8 additions & 12 deletions docs/source/automl/auto_cross_emr.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,8 +5,8 @@
输入一般是csv格式的文件。 如下所示,列之间用,分割

- 示例数据(小数据集):
- train: [ctr\_train.csv](https://yuguang-test.oss-cn-beijing.aliyuncs.com/fe/data/ctr_train.csv)
- test: [ctr\_test.csv](https://yuguang-test.oss-cn-beijing.aliyuncs.com/fe/data/ctr_test.csv)
- train: [ctr_train.csv](https://easyrec.oss-cn-beijing.aliyuncs.com/data/autocross/ctr_train.csv)
- test: [ctr_test.csv](https://easyrec.oss-cn-beijing.aliyuncs.com/data/autocross/ctr_test.csv)
- 数据示例:

```
Expand All @@ -23,18 +23,15 @@ hadoop fs -put ctr_test.csv hdfs:///user/fe/data/

### AutoCross

AutoCross使用请参考文档 [AutoCross EMR](https://yuque.antfin-inc.com/pai/automl/cicak6)

- AutoCross yaml配置文件:[ctr\_autocross.yaml](https://yuguang-test.oss-cn-beijing.aliyuncs.com/fe/configs/ctr_autocross.yaml)[配置文件解析](https://yuque.antfin-inc.com/pai/automl/cicak6)
- alink环境配置文件,另存为a[link.env](https://yuguang-test.oss-cn-beijing.aliyuncs.com/fe/configs/alink.env)
- AutoCross yaml配置文件:[ctr_autocross.yaml](https://easyrec.oss-cn-beijing.aliyuncs.com/data/autocross/ctr_autocross.yaml)
- alink环境配置文件,另存为a[link.env](https://easyrec.oss-cn-beijing.aliyuncs.com/data/autocross/alink.env)

```bash
userId=default
alinkServerEndpoint=http://localhost:9301
hadoopHome=/usr/lib/hadoop-current
hadoopUserName=hadoop
token=ZSHTIeEkwrtZJJsN1ZZmCJJmr5jaj1wO

```

- 使用 pai-automl-fe 提交任务
Expand All @@ -43,10 +40,10 @@ token=ZSHTIeEkwrtZJJsN1ZZmCJJmr5jaj1wO
pai-automl-fe run -e configs/alink.env --config configs/ctr_autocross.yaml --mode emr
```

### 对接easy\_rec
### 对接EasyRec

Easy\_rec使用请参考文档 [EMR Tutorial](https://yuque.antfin.com/pai/arch/zucdp3)
以下说明AutoCross后的数据对接easy\_rec的配置([ctr\_deepmodel\_ac.config](https://yuguang-test.oss-cn-beijing.aliyuncs.com/fe/configs/ctr_deepmodel_ac.config)
EasyRec使用请参考文档 [EMR Train](../train.md)
以下说明AutoCross后的数据对接easy_rec的配置([ctr_deepmodel_ac.config](https://easyrec.oss-cn-beijing.aliyuncs.com/data/autocross/ctr_deepmodel_ac.config)

#### 数据据相关

Expand Down Expand Up @@ -210,7 +207,6 @@ model_config:{
feature_names: "cross_2"
wide_deep:DEEP
}
```

使用el\_submit提交训练即可,请参照 [EMR Tutorial](https://yuque.antfin.com/pai/arch/zucdp3)
使用el_submit提交训练即可,请参照 [EMR Train](../train.md)
34 changes: 17 additions & 17 deletions docs/source/automl/hpo_emr.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
- 下载安装automl包

```bash
wget http://easy-rec.oss-cn-hangzhou.aliyuncs.com/releases/pai_automl-0.0.1rc1-py3-none-any.whl
wget http://easyrec.oss-cn-beijing.aliyuncs.com/releases/pai_automl-0.0.1rc1-py3-none-any.whl
pip install pai_automl-0.0.1rc1-py3-none-any.whl
```

Expand All @@ -17,19 +17,19 @@ python -m easy_rec.python.hpo.emr_hpo --hyperparams hyperparams.json --config_p

### 参数说明

- \--config\_path easy-rec训练配置文件
- \--exp\_dir 调优实验目录
- \--debug 保留本地临时目录
- \--metric\_name  调优的指标,默认是auc,其它可选指标[参考](https://yuque.antfin.com/pai/arch/moxgm5)
- \--max\_parallel   同一时刻可以并行跑的实验数目,默认4
- \--total\_trial\_num  总共跑多少组实验,默认6
- \--el\_submit\_params el\_submit指定PS/Worker资源的一些参数,包括-t x -m x \[-pn x -pc x -pm x\] -wn x -wc x -wm x -wg x 默认值
- --config_path easyrec训练配置文件
- --exp_dir 调优实验目录
- --debug 保留本地临时目录
- --metric_name  调优的指标,默认是auc,其它可选指标\[参考../eval.md)
- --max_parallel   同一时刻可以并行跑的实验数目,默认4
- --total_trial_num  总共跑多少组实验,默认6
- --el_submit_params el_submit指定PS/Worker资源的一些参数,包括-t x -m x \[-pn x -pc x -pm x\] -wn x -wc x -wm x -wg x 默认值

```bash
-t standalone -m local -wn 1 -wc 6 -wm 20000 -wg 1
```

- \--hyperparams 参数空间配置空间
- --hyperparams 参数空间配置空间

#### hyperparams设置

Expand All @@ -43,11 +43,11 @@ python -m easy_rec.python.hpo.emr_hpo --hyperparams hyperparams.json --config_p
]
```

- name: easy\_rec pipeline\_config里面的参数名称,注意要用全路径
- name: easy_rec pipeline_config里面的参数名称,注意要用全路径

feature\_configs\[**input\_names\[0\]=field\_name1**\].embedding\_dim
feature_configs\[**input_names\[0\]=field_name1**\].embedding_dim

- 由于feature\_configs是一个数组,所以需要用到选择器,根据**属性值**选择部分特征:
- 由于feature_configs是一个数组,所以需要用到选择器,根据**属性值**选择部分特征:

![image.png](../../images/automl/pai_field.png)

Expand All @@ -66,9 +66,9 @@ python -m easy_rec.python.hpo.emr_hpo --hyperparams hyperparams.json --config_p

- 关联参数设置

有些参数的值是关联的,比如对于deepfm算法,所有的embedding\_dim必须是一样的
有些参数的值是关联的,比如对于deepfm算法,所有的embedding_dim必须是一样的

- name里面可以指定多个要调整的参数名称,用";"分割feature\_configs\[input\_names\[0\]=field1\].embedding\_dim;feature\_configs\[input\_names\[0\]=field20\].embedding\_dim
- name里面可以指定多个要调整的参数名称,用";"分割feature_configs\[input_names\[0\]=field1\].embedding_dim;feature_configs\[input_names\[0\]=field20\].embedding_dim
- 如果name里面包含了多个参数名称,那么candidates也需要有多个参数值,用";"分割如"32;32"
- candidates: 候选值
- type: 候选值类型, 支持Categorical, Integer, Real
Expand Down Expand Up @@ -99,13 +99,13 @@ python -m easy_rec.python.hpo.emr_hpo --hyperparams hyperparams.json --config_p
- LOG信息

![image.png](../../images/automl/emr_log.png)
一共做了5组实验,可以看到embedding\_dim越小越好
一共做了5组实验,可以看到embedding_dim越小越好

- 实验目录信息(exp\_dir): hdfs:///user/easy\_rec\_test/experiment/hpo\_test\_v8
- 实验目录信息(exp_dir): hdfs:///user/easy_rec_test/experiment/hpo_test_v8

![image.png](../../images/automl/emr_exp.png)

- 如果设置了--debug,那么将会保留本地临时目录: /tmp/emr\_easy\_rec\_hpo\_1600519258
- 如果设置了--debug,那么将会保留本地临时目录: /tmp/emr_easy_rec_hpo_1600519258

rewrite\_\[0-4\].json定义了每组实验的参数
![image.png](../../images/automl/emr_json.png)
Expand Down
Loading

0 comments on commit 75b1599

Please sign in to comment.