One question about the gif of DPD Architecture #1

XqZeppelinhead0702 · 2024-09-16T08:13:06Z

Thank you for opening source your remarkable work and congrats for the best paper award! Recently I've been reading your paper and trying to run your code, however, I notice you've given a gif of DPD and in the step4 you highlight the output of D2P and the CRINGE Loss, which makes me a little confused: did you compute the CRINGE Loss between the output of D2P and P2D? As far as I know, the CRINGE Loss is calculated between the negative incorrectly predicted tokens and positive tokens sampled from the model's top-k predictions in a contrastive way. Maybe you want to express discouraging the D2P from predicting false protoform with the CRINGE Loss in the step4 of this gif? Anyway, I'm still unsure about the exact meaning of this step. Hoping for your response and answer.

chaosarium · 2024-09-30T22:49:29Z

Hi! Thank you for your message, and sorry about the late reply.

Step 4 indeed highlighted the D2P output together with the CRINGE Loss. The intention was that the CRINGE loss is applied if D2P produces an incorrect prediction, rather than as a loss between the outputs of D2P and P2D.

I highlighted the D2P output tokens to mark that it's a piece of information needed for step 4. Apologies for any confusion it might have caused.

Step 4 can be understood as follows:

protoform_prediction <- D2P(reflexes)
if a protoform label (call it gold_protoform) is available:
    if protoform_prediction != gold_protoform:
        apply CRINGE Loss on P2D's output, treating correctly predicted reflexes as negative examples
    else:
        apply normal cross-entropy loss on P2D's output

XqZeppelinhead0702 · 2024-10-11T03:15:02Z

Hi! Thank you for your message, and sorry about the late reply.

Step 4 indeed highlighted the D2P output together with the CRINGE Loss. The intention was that the CRINGE loss is applied if D2P produces an incorrect prediction, rather than as a loss between the outputs of D2P and P2D.

I highlighted the D2P output tokens to mark that it's a piece of information needed for step 4. Apologies for any confusion it might have caused.

Step 4 can be understood as follows:
protoform_prediction <- D2P(reflexes)
if a protoform label (call it gold_protoform) is available:
    if protoform_prediction != gold_protoform:
        apply CRINGE Loss on P2D's output, treating correctly predicted reflexes as negative examples
    else:
        apply normal cross-entropy loss on P2D's output

That's a good explanation for me. So I think it's an indirect way to discourage D2P from generating unexpected output, isn't it? One more intriguing thing which I discuss with my workmates recently is that why you employ such an indirect approach instead of optimizing D2P network directly like apply CRINGE loss on D2P output.

chaosarium · 2024-10-20T03:43:31Z

I thought of it more like discouraging P2D from generating certain output, but I agree that, indirectly, it might help discourage D2P from generating certain output.

I think one difference here is that directly discouraging D2P from generating certain output is only possible if the gold protoform is available. However, if P2D learns to better discern correct vs. incorrect protoform generated by D2P on labeled data, it might have a better chance to, in some sense, discourage D2P from generating incorrect output even when the gold protoform is unavailable.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

One question about the gif of DPD Architecture #1

One question about the gif of DPD Architecture #1

XqZeppelinhead0702 commented Sep 16, 2024

chaosarium commented Sep 30, 2024

XqZeppelinhead0702 commented Oct 11, 2024

chaosarium commented Oct 20, 2024

One question about the gif of DPD Architecture #1

One question about the gif of DPD Architecture #1

Comments

XqZeppelinhead0702 commented Sep 16, 2024

chaosarium commented Sep 30, 2024

XqZeppelinhead0702 commented Oct 11, 2024

chaosarium commented Oct 20, 2024