Skip to content
This repository has been archived by the owner on Sep 2, 2024. It is now read-only.

inconsistent implementation of update for task specific weights with the description in paper #12

Open
bsaint opened this issue Jun 18, 2019 · 3 comments
Assignees

Comments

@bsaint
Copy link

bsaint commented Jun 18, 2019

Hi, thanks for releasing this awesome code! Currently, i am working on reproducing the result on cityscapes in paper. I found that in paper the description of mtl update equation say the weights of task specific subnetwork should be updated with original learning rate, then the shared weights of network is updated with the MGDA algorithm. But i didnt find the corresponding implementation in code where both the shared weights and task specific weights are updated consistently by timing loss of different task with a weight factor determined by MGDA. Am i missing something here, or is this a implemention trick?

@milos-popovic
Copy link

milos-popovic commented Jul 23, 2019

Hi @ozansener, I'm also trying to reproduce and utilize the method. And the above also confuses me a bit as well. Algorithm 2 line 2 shows the that the task specific params are updated without any scaling factor, then line 4 would be replaced with the solver using your approximation and alphas would be calculated using gradients of Lt with respect Z. Then in line 5, only the shared parameters are updated with the alpha weighed sum of losses.

However, your implementation uses only one optimizer on both shared and task specific parameters: https://github.com/intel-isl/MultiObjectiveOptimization/blob/1c6d0d503ccf33cc83d5b6c356ca2fc2bf255606/multi_task/train_multi_task.py#L57-L65

and updates all of them with alpha weighed gradients:
https://github.com/intel-isl/MultiObjectiveOptimization/blob/1c6d0d503ccf33cc83d5b6c356ca2fc2bf255606/multi_task/train_multi_task.py#L174-L185

These two approaches don't seem to be equivalent, is this an unintended change in the implementation or is there something @bsaint and I are missing?

@ozansener
Copy link
Collaborator

ozansener commented Jul 23, 2019

@bsaint and @milos-popovic Thanks for raising the issue. You are right. There is a discrepancy between the paper and the code. We used this code to get all the results so please use the codebase. I will run some experiments and will update the paper if necessary.

@ozansener ozansener self-assigned this Jul 23, 2019
@liyangliu
Copy link

Hi, @bsaint, @milos-popovic,
Have you figured out the discrepancy and reproduced the results on MultiMNIST, CelebA or CityScapes? Thank you.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants