Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

One question about the gif of DPD Architecture #1

Open
XqZeppelinhead0702 opened this issue Sep 16, 2024 · 3 comments
Open

One question about the gif of DPD Architecture #1

XqZeppelinhead0702 opened this issue Sep 16, 2024 · 3 comments

Comments

@XqZeppelinhead0702
Copy link

Thank you for opening source your remarkable work and congrats for the best paper award! Recently I've been reading your paper and trying to run your code, however, I notice you've given a gif of DPD and in the step4 you highlight the output of D2P and the CRINGE Loss, which makes me a little confused: did you compute the CRINGE Loss between the output of D2P and P2D? As far as I know, the CRINGE Loss is calculated between the negative incorrectly predicted tokens and positive tokens sampled from the model's top-k predictions in a contrastive way. Maybe you want to express discouraging the D2P from predicting false protoform with the CRINGE Loss in the step4 of this gif? Anyway, I'm still unsure about the exact meaning of this step. Hoping for your response and answer.

@chaosarium
Copy link
Member

Hi! Thank you for your message, and sorry about the late reply.

Step 4 indeed highlighted the D2P output together with the CRINGE Loss. The intention was that the CRINGE loss is applied if D2P produces an incorrect prediction, rather than as a loss between the outputs of D2P and P2D.

I highlighted the D2P output tokens to mark that it's a piece of information needed for step 4. Apologies for any confusion it might have caused.

Step 4 can be understood as follows:

protoform_prediction <- D2P(reflexes)
if a protoform label (call it gold_protoform) is available:
    if protoform_prediction != gold_protoform:
        apply CRINGE Loss on P2D's output, treating correctly predicted reflexes as negative examples
    else:
        apply normal cross-entropy loss on P2D's output

@XqZeppelinhead0702
Copy link
Author

Hi! Thank you for your message, and sorry about the late reply.

Step 4 indeed highlighted the D2P output together with the CRINGE Loss. The intention was that the CRINGE loss is applied if D2P produces an incorrect prediction, rather than as a loss between the outputs of D2P and P2D.

I highlighted the D2P output tokens to mark that it's a piece of information needed for step 4. Apologies for any confusion it might have caused.

Step 4 can be understood as follows:

protoform_prediction <- D2P(reflexes)
if a protoform label (call it gold_protoform) is available:
    if protoform_prediction != gold_protoform:
        apply CRINGE Loss on P2D's output, treating correctly predicted reflexes as negative examples
    else:
        apply normal cross-entropy loss on P2D's output

That's a good explanation for me. So I think it's an indirect way to discourage D2P from generating unexpected output, isn't it? One more intriguing thing which I discuss with my workmates recently is that why you employ such an indirect approach instead of optimizing D2P network directly like apply CRINGE loss on D2P output.

@chaosarium
Copy link
Member

I thought of it more like discouraging P2D from generating certain output, but I agree that, indirectly, it might help discourage D2P from generating certain output.

I think one difference here is that directly discouraging D2P from generating certain output is only possible if the gold protoform is available. However, if P2D learns to better discern correct vs. incorrect protoform generated by D2P on labeled data, it might have a better chance to, in some sense, discourage D2P from generating incorrect output even when the gold protoform is unavailable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants