-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
One question about the gif of DPD Architecture #1
Comments
Hi! Thank you for your message, and sorry about the late reply. Step 4 indeed highlighted the D2P output together with the CRINGE Loss. The intention was that the CRINGE loss is applied if D2P produces an incorrect prediction, rather than as a loss between the outputs of D2P and P2D. I highlighted the D2P output tokens to mark that it's a piece of information needed for step 4. Apologies for any confusion it might have caused. Step 4 can be understood as follows:
|
That's a good explanation for me. So I think it's an indirect way to discourage D2P from generating unexpected output, isn't it? One more intriguing thing which I discuss with my workmates recently is that why you employ such an indirect approach instead of optimizing D2P network directly like apply CRINGE loss on D2P output. |
I thought of it more like discouraging P2D from generating certain output, but I agree that, indirectly, it might help discourage D2P from generating certain output. I think one difference here is that directly discouraging D2P from generating certain output is only possible if the gold protoform is available. However, if P2D learns to better discern correct vs. incorrect protoform generated by D2P on labeled data, it might have a better chance to, in some sense, discourage D2P from generating incorrect output even when the gold protoform is unavailable. |
Thank you for opening source your remarkable work and congrats for the best paper award! Recently I've been reading your paper and trying to run your code, however, I notice you've given a gif of DPD and in the step4 you highlight the output of D2P and the CRINGE Loss, which makes me a little confused: did you compute the CRINGE Loss between the output of D2P and P2D? As far as I know, the CRINGE Loss is calculated between the negative incorrectly predicted tokens and positive tokens sampled from the model's top-k predictions in a contrastive way. Maybe you want to express discouraging the D2P from predicting false protoform with the CRINGE Loss in the step4 of this gif? Anyway, I'm still unsure about the exact meaning of this step. Hoping for your response and answer.
The text was updated successfully, but these errors were encountered: