Modifying DPLossFastGradientClipping to add support for generative tasks with ghost clipping #716

EnayatUllah · 2025-01-16T18:19:50Z

Summary:

Generative tasks for NLP output predictions of shape (B,T,C) i.e., (batch_size, sequence_length, vocab_size). To compute the cross-entropy loss in this case, usually the predictions are reshaped to (BxT, C) and targets to (BxT). This creates an issue with Ghost Clipping per sample loss computation as BxT is seen as the batch_size. In particular, the current implementation of Ghost Clipping results in loss_per_sample, coeff variables to have a shape of BxT and B respectively. This causes a shape mismatch error. This diff fixes that error by collapsing the loss_per_sample variable to shape B i.e., the loss across the sequence_length dim is averaged/summed.
ignore_index is also needed for the generative tasks as the task sometimes needs to ignore specific dummy tokens.

Differential Revision: D68047256

…sks with ghost clipping Summary: 1. Generative tasks for NLP output predictions of shape (B,T,C) i.e., (batch_size, sequence_length, vocab_size). To compute the cross-entropy loss in this case, usually the predictions are reshaped to (BxT, C) and targets to (BxT). This creates an issue with Ghost Clipping per sample loss computation as BxT is seen as the batch_size. In particular, the current implementation of Ghost Clipping results in loss_per_sample, coeff variables to have a shape of BxT and B respectively. This causes a shape mismatch error. This diff fixes that error by collapsing the loss_per_sample variable to shape B i.e., the loss across the sequence_length dim is averaged/summed. 2. ignore_index is also needed for the generative tasks as the task sometimes needs to ignore specific dummy tokens. Differential Revision: D68047256

facebook-github-bot · 2025-01-16T18:20:11Z

This pull request was exported from Phabricator. Differential Revision: D68047256

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jan 16, 2025

facebook-github-bot added the fb-exported label Jan 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Modifying DPLossFastGradientClipping to add support for generative tasks with ghost clipping #716

Modifying DPLossFastGradientClipping to add support for generative tasks with ghost clipping #716

EnayatUllah commented Jan 16, 2025

facebook-github-bot commented Jan 16, 2025

Modifying DPLossFastGradientClipping to add support for generative tasks with ghost clipping #716

Are you sure you want to change the base?

Modifying DPLossFastGradientClipping to add support for generative tasks with ghost clipping #716

Conversation

EnayatUllah commented Jan 16, 2025

facebook-github-bot commented Jan 16, 2025