Multi task #112

fangkuann · 2019-09-12T09:09:31Z

support Multi-Task Learning using estimator.multi_head

googlebot · 2019-09-12T09:09:34Z

Thanks for your pull request. It looks like this may be your first contribution to a Google open source project (if not, look below for help). Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

📝 Please visit https://cla.developers.google.com/ to sign.

Once you've signed (or fixed any issues), please reply here with @googlebot I signed it! and we'll verify it.

What to do if you already signed the CLA

Individual signers

It's possible we don't have your GitHub username or you're using a different email address on your commit. Check your existing CLA data and verify that your email is set on your git commits.

Corporate signers

Your company has a Point of Contact who decides which employees are authorized to participate. Ask your POC to be added to the group of authorized contributors. If you don't know who your Point of Contact is, direct the Google project maintainer to go/cla#troubleshoot (Public version).
The email used to register you as an authorized contributor must be the email used for the Git commit. Check your existing CLA data and verify that your email is set on your git commits.
The email used to register you as an authorized contributor must also be attached to your GitHub account.

ℹ️ Googlers: Go here for more info.

fangkuann · 2019-09-12T09:13:14Z

@googlebot I signed it!

googlebot · 2019-09-12T09:13:16Z

We found a Contributor License Agreement for you (the sender of this pull request), but were unable to find agreements for all the commit author(s) or Co-authors. If you authored these, maybe you used a different email address in the git commits than was used to sign the CLA (login here to double check)? If these were authored by someone else, then they will need to sign a CLA as well, and confirm that they're okay with these being contributed to Google.
In order to pass this check, please resolve this problem and then comment @googlebot I fixed it.. If the bot doesn't comment, it means it doesn't think anything has changed.

ℹ️ Googlers: Go here for more info.

fangkuann · 2019-09-12T09:14:41Z

@googlebot I fixed it.

fangkuann · 2019-09-12T09:29:01Z

@googlebot I fixed it.

googlebot · 2019-09-12T09:29:07Z

CLAs look good, thanks!

ℹ️ Googlers: Go here for more info.

xuanhuiwang · 2019-09-15T21:12:50Z

tensorflow_ranking/python/model.py

@@ -396,6 +402,94 @@ def _compute_logits_impl(self, context_features, example_features, labels,
    return logits


+class _MultiTaskGroupwiseRankingModel(_GroupwiseRankingModel):


This is a very nice proposal. Indeed, we had a proposal internally and have been working on some implementation. Your PR is a very good reference for us to build a polished version.

Since this touches the core code, we would like this to be as concise as possible. We will also have multiple small changes to achieve the goal of multi-task learning.

Stay tuned.

xuanhuiwang · 2019-09-19T05:08:21Z

@fangkuann, I've updated the github with the multi-task support in tf-ranking. Please see the latest few commits here: https://github.com/tensorflow/ranking/commits/master.

I also included a simple demo on how to use the multi-head in this commit: 1d91c5f. You may be able to adapt to your use case.

The main design is to have logits and labels as dicts that map head names to Tensors.

Please let us know if you find anything wrong.

xuanhuiwang · 2019-09-19T17:44:42Z

Closing this PR and let us know if you have other questions.

fangkuann · 2019-09-23T06:21:32Z

@xuanhuiwang
Thanks for this great support!
I will study the code and try this feature in my project.

xuanhuiwang · 2019-09-23T16:55:14Z

@fangkuann, let us know how your project goes and we are happy to provide more help.

fangkuann · 2019-09-24T02:08:34Z

@fangkuann, let us know how your project goes and we are happy to provide more help.

@xuanhuiwang

I work as a search engineer mainly focused on optimizing search result for our product, which is the largest question-answer community like Quora in China.
Recently, we are endeavoring to switch our ranking model from traditional GBDT to NN based model.
The TF-Ranking library greatly simplified our work. We use this library to train a DNN model and after a period's parameter tuning, the NN baseline beat the GBDT and was running on the fly very well till now.

Currently, we are trying several directions to further improve model performance,

multi-task learning
The baseline model was trained on user engagement like click. While we want to improve user satisfactory metrics(eg, upvote, collect, reading duration...) while maintaining user engagement performance.
We have experimented shared bottom multi-task structure but did not achieve substantial improvement. We are going to try the MMOE model in the next stage.
unbiased learning to rank
The TF-Ranking support weight_feature_name parameter to assign each instance sample weight. We use Position-Biased Click Model to obtain weight for each position. And trained an unbiased model using this sample weight manner.
However, this manner degrades model performance seriously in offline evaluation. We conjectured that this is mainly because user clicks in the low position may contain much more noise, and enhance these sample's weight could let model miss the correct part of the training set.
We are going to try another unbiased training method which factorizes user click into the biased and unbiased part, like this paper https://dl.acm.org/citation.cfm?id=3298689.3347033.
For unbiased learning, I am confusing how to properly evaluate model performance in offline. Since user cilck data collected is a biased dataset. Does evaluate unbiased ranking using biased dataset is appropriate?
model ensemble
For tabular dataset(mainly dense features, very less embedding features), the GBDT works quite well and NN may not beat GBDT. There are many papers tries to ensemble GBDT and NN.
We have tried using Adanet, like this issue Does adanet support GBDT as subnetwork? adanet#121, but met some problem for integrating adanet and TF-Ranking. Therefore this direction is suspended.

xuanhuiwang · 2019-09-25T00:04:24Z

@fangkuann, thanks a lot for your feedback. We are very glad to know that tf-ranking is used in your production!

Re: multi-task. Thanks for sharing the results. MMOE sounds good for the next step.
Re: unbiased learning-to-rank. Did you compare the models based on weighted metrics such as weighted MRR or weighted DCG? These are the better metrics than unweighted ones.
Re: model ensemble. We haven't tried anything with AdaNet yet. So far, we haven't planned on this yet.

fangkuann · 2019-09-25T08:48:30Z

Thanks for your advice!
With weighted DCG metrics, we have much better offline performance now. We will continue trying this excellent library for future development.

fangkuann added 2 commits September 12, 2019 15:37

multi_task libsvm example

816da89

add test for tf_libsvm_multi_task

32d5c1a

googlebot added the cla: no label Sep 12, 2019

googlebot added cla: yes and removed cla: no labels Sep 12, 2019

xuanhuiwang reviewed Sep 15, 2019

View reviewed changes

xuanhuiwang closed this Sep 19, 2019

fangkuann deleted the multi_task branch September 24, 2019 03:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Multi task #112

Multi task #112

fangkuann commented Sep 12, 2019

googlebot commented Sep 12, 2019

fangkuann commented Sep 12, 2019

googlebot commented Sep 12, 2019

fangkuann commented Sep 12, 2019

fangkuann commented Sep 12, 2019

googlebot commented Sep 12, 2019

xuanhuiwang Sep 15, 2019 •

edited

Loading

xuanhuiwang commented Sep 19, 2019

xuanhuiwang commented Sep 19, 2019

fangkuann commented Sep 23, 2019

xuanhuiwang commented Sep 23, 2019

fangkuann commented Sep 24, 2019 •

edited

Loading

xuanhuiwang commented Sep 25, 2019

fangkuann commented Sep 25, 2019

		@@ -396,6 +402,94 @@ def _compute_logits_impl(self, context_features, example_features, labels,
		return logits


		class _MultiTaskGroupwiseRankingModel(_GroupwiseRankingModel):

Multi task #112

Multi task #112

Conversation

fangkuann commented Sep 12, 2019

googlebot commented Sep 12, 2019

What to do if you already signed the CLA

Individual signers

Corporate signers

fangkuann commented Sep 12, 2019

googlebot commented Sep 12, 2019

fangkuann commented Sep 12, 2019

fangkuann commented Sep 12, 2019

googlebot commented Sep 12, 2019

xuanhuiwang Sep 15, 2019 • edited Loading

Choose a reason for hiding this comment

xuanhuiwang commented Sep 19, 2019

xuanhuiwang commented Sep 19, 2019

fangkuann commented Sep 23, 2019

xuanhuiwang commented Sep 23, 2019

fangkuann commented Sep 24, 2019 • edited Loading

xuanhuiwang commented Sep 25, 2019

fangkuann commented Sep 25, 2019

xuanhuiwang Sep 15, 2019 •

edited

Loading

fangkuann commented Sep 24, 2019 •

edited

Loading